Special characters in the XML response not represented correctly

Hello team,
We are using tapir in a project, I am a newbie to tapir and I have an issue with XML where the Norwegian characters are not represented correctly in the XML response, the codec for the response is defined as below:

    .id(CodecFormat.Xml(), Schema.string[String])
    .mapDecode(d => DecodeResult.fromOption(decodeOrException(responseDecoder.decode(d))))(responseEncode(_).mkString)
    .schema(implicitly[Schema[PersonResponse]])```

The expected XML should include

<name>MIDTBØ</name>

but it returns as
<name>MIDTBÃ</name>,
the name in the class instance comes correctly but it is miss-encoded in the codec.
I have tried to change the charset is in:
responseDecoder.decode(d, charset = "utf-16")
responseDecoder.decode(d, charset = "iso-8859-1")
but that didn’t work either. Any idea how to get the right characters?
Thanks in advance :slightly_smiling_face:

Hi @LinaHAyoub ,

Thanks for your message. Since we have only a snippet of code it is quite hard to say what is wrong. Could you please provide a fully working example showcasing unexpected behavior?

Thanks! :slightly_smiling_face:

Hello @rafalambrozewicz,

Happy new year and thanks for your reply!

I have create a runnable version of the coded below:

package personHit

import personHit.person.PersonHitListExtractRenderer
import ru.tinkoff.phobos.decoding.{DecodingError, ElementDecoder, XmlDecoder}
import ru.tinkoff.phobos.derivation.semiauto.{deriveElementDecoder, deriveXmlDecoder}
import spray.json.{DeserializationException, _}
import sttp.tapir._
import sttp.tapir.generic.auto.schemaForCaseClass

import java.lang.reflect.Field
import scala.collection.immutable
import scala.collection.immutable.ListMap
import scala.xml.{Elem, NodeSeq, XML}

sealed trait PersonExtract


case class BasicInformation(name: Option[String])

case class PersonHitListExtract(basicInformation: BasicInformation) extends PersonExtract

case class PersonResponse(extract: PersonExtract)

object testNo extends App {
   val basicInformation = BasicInformation(name = Some("MIDTBØ"))

   val personHit = PersonHitListExtract(basicInformation = basicInformation)

   val response = PersonResponse(extract = personHit)

   implicit val biFormat: ElementDecoder[BasicInformation]                           = deriveElementDecoder
   implicit val phlFormat: ElementDecoder[PersonHitListExtract]                      = deriveElementDecoder
   implicit val prExtract: ElementDecoder[PersonExtract]                       = deriveElementDecoder
   implicit val responseDecoder: XmlDecoder[PersonResponse] = deriveXmlDecoder("response")

   private def decodeOrException[T](in: Either[DecodingError, T]): Option[T] = in match {
      case Left(err)    => throw DeserializationException(err.text, fieldNames = err.history)
      case Right(value) => Option(value)
   }

   implicit val responseXmlCodec: Codec.XmlCodec[PersonResponse] = Codec
     .id(CodecFormat.Xml(), Schema.string[String])
     .mapDecode(d => DecodeResult.fromOption(decodeOrException(responseDecoder.decode(d))))(responseEncode(_).mkString)
     .schema(implicitly[Schema[PersonResponse]])


   // format: off
   private def responseEncode(r: PersonResponse): Elem = {
      val e = r.extract
      <response>
         <data>
            {e.toTag}
         </data>
      </response>
   // format: on
   }


   println(responseEncode(response))

   println(lookupResponseXmlCodec.encode(response))

}

I will add the build.sbt and the code helping to create the XML tag at the end.

The implicit codec val (responseXmlCodec) is used as the out of the endpoint as follows:

private val xmlBodyResponse = xmlBody[PersonResponse].description("Response Structure")

endpoint.post
      .out(oneOfBody(jsonBodyResponse, xmlBodyResponse))

Interestingly any attempt to print the response (even in the logs) prints the right value which is

MIDTBØ

Only in the API response, the value of the name is coming wrong as:

MIDTBÃ

This leads me to think that the problem is not even in the codec but in the endpoint definition, is there anything in the endpoint definition in Tapir that could affect the encoding?

Note: the same endpoint is built using AKKA http and is working fine on the same server.

Many thanks in advance!
Lina


Here is the build.sbt:

name := "testNorwegianLetters"

version := "0.1"

scalaVersion := "2.13.10"


val sttpCoreVersion                  = "3.7.6"
val tapirVersion                     = "1.0.6"
val tapirOpenApiVersion              = "0.2.1"
val phobosVersion                    = "0.16.0"
val scalaXmlVersion                  = "1.3.0"

val serviceDeps = Seq(
  "org.scala-lang.modules" %% "scala-xml"     % scalaXmlVersion,
  "ru.tinkoff"             %% "phobos-core"   % phobosVersion
)

val tapirDeps = Seq(
  "com.softwaremill.sttp.tapir"   %% "tapir-core"               % tapirVersion,
  "com.softwaremill.sttp.tapir"   %% "tapir-akka-http-server"   % tapirVersion,
  "com.softwaremill.sttp.tapir"   %% "tapir-json-circe"         % tapirVersion,
  "com.softwaremill.sttp.tapir"   %% "tapir-json-spray"         % tapirVersion,
  "com.softwaremill.sttp.tapir"   %% "tapir-prometheus-metrics" % tapirVersion,
  "com.softwaremill.sttp.tapir"   %% "tapir-openapi-docs"       % tapirVersion,
  "com.softwaremill.sttp.apispec" %% "openapi-model"            % tapirOpenApiVersion,
  "com.softwaremill.sttp.apispec" %% "openapi-circe"            % tapirOpenApiVersion,
  "com.softwaremill.sttp.apispec" %% "openapi-circe-yaml"       % tapirOpenApiVersion
).map(_ exclude ("com.typesafe.akka", "akka-stream_2.12")).map(_ exclude ("com.typesafe.akka", "akka-http_2.12"))


val coreDeps = Seq(
  "com.softwaremill.sttp.client3" %% "core"                             % sttpCoreVersion,
  "com.softwaremill.sttp.client3" %% "akka-http-backend"                % sttpCoreVersion,
  "com.softwaremill.sttp.client3" %% "async-http-client-backend-future" % sttpCoreVersion)


val fullCoreDeps = coreDeps ++ tapirDeps ++ serviceDeps

libraryDependencies ++= fullCoreDeps

And below is the code helping to create the XML tag:

package personHit

import spray.json.{JsArray, JsBoolean, JsNull, JsNumber, JsObject, JsString, JsValue}

import java.lang.reflect.Field
import java.text.NumberFormat
import java.time.Instant
import java.util.{Locale, UUID}
import scala.collection.immutable
import scala.collection.immutable.ListMap
import scala.concurrent.duration.Duration
import scala.xml._

package object person {

  //TODO Set appropriate Locale
  val format = NumberFormat.getInstance(Locale.GERMAN)

  implicit def nodeSeqToString(ns: NodeSeq): String = ns.headOption.map(_.text.trim).getOrElse("")

  implicit def nodeSeqToOptionalString(ns: NodeSeq): Option[String] = ns.headOption.map(_.text.trim)

  implicit def nodeSeqToUUID(ns: NodeSeq): UUID = UUID.fromString(ns: String)

  implicit def nodeSeqToOptionalUUID(ns: NodeSeq): Option[UUID] = (ns: Option[String]).map(UUID.fromString)

  implicit def nodeSeqToBoolean(ns: NodeSeq): Boolean = (ns: String).toBoolean

  implicit def nodeSeqToOptionalBoolean(ns: NodeSeq): Option[Boolean] = (ns: Option[String]).map(_.toBoolean)

  implicit def nodeSeqToOptionalInstant(ns: NodeSeq): Option[Instant] = (ns: Option[String]).map(Instant.parse)

  implicit def nodeSeqToLong(ns: NodeSeq): Long = (ns: String).toLong

  implicit def nodeSeqToFloat(ns: NodeSeq): Float = format.parse(ns).floatValue()

  implicit def javaDurationToScalaDuration(duration: java.time.Duration): Duration = Duration.fromNanos(duration.toNanos)

  implicit class EnhancedXmlElement(elem: Elem) {
    def %(attrs: Map[String, String]): Elem = {
      val seq = for {
        (n, v) <- attrs
        if Option(v).isDefined
      } yield new UnprefixedAttribute(n, v, Null)

      seq.foldLeft(elem)(_ % _)
    }
  }

  implicit class CompanyHitListExtractRenderer(val extract: PersonHitListExtract) extends DomainRenderer[PersonHitListExtract]

  implicit class PersonHitListExtractRenderer(val extract: Object) extends DomainRenderer[Object]

  trait DomainRenderer[T] extends InstanceIntrospector[T] {
    def toMap: ListMap[String, Any] = {
      val lm = immutable.ListMap.newBuilder[String, Any]
      for (x <- fields) lm += x
      lm.result()
    }

    def fieldValues: List[Any] = toMap.values.toList

    def toTag: NodeSeq = toTag("extract")

    def toTag(name: String): NodeSeq = {
      val body = toMap
        .map {
          case (k, v: Seq[AnyRef]@unchecked) => s"<$k>${v.map(_.toTag(k)).mkString("")}</$k>"
          case (k, v: Int) => s"<$k>$v</$k>"
          case (k, v: Double) => s"<$k>$v</$k>"
          case (k, v: Float) => s"<$k>$v</$k>"
          case (k, v: Boolean) => s"<$k>$v</$k>"
          case (k, v: AnyRef) if !v.isInstanceOf[String] => v.toTag(k)
          case (k, v) => Option(v).map(_ => s"<$k>$v</$k>").getOrElse("")
        }
        .mkString("")

      def encode(s: String) = if (s.contains("amp;")) s else s.replaceAll("&", "&amp;")

      XML.loadString(s"<$name>${encode(body)}</$name>")
    }

    def toJsonExtract: JsValue = {

      def convertObj(obj: Object): JsValue = {
        val fields = obj.getClass.getDeclaredFields
        fields.foreach(_.setAccessible(true))
        val jsValueMap = fields.map(f => f.getName -> toJsValue(f.get(obj))).filterNot(jsNull).toMap
        if (jsValueMap.isEmpty) {
          JsNull
        } else {
          JsObject(jsValueMap)
        }
      }

      def toJsValue(obj: Any): JsValue = {
        obj match {
          case Some(x) => toJsValue(x)
          case None => JsNull
          case x: String if x.nonEmpty => JsString(x.toString)
          case x: String => JsNull
          case x: Boolean => JsBoolean(x)
          case x: Int => JsNumber(x)
          case x: Long => JsNumber(x)
          case x: Float => JsNumber(x)
          case x: Seq[Object]@unchecked =>
            JsArray((x map {
              case v: String => JsString(v)
              case x => convertObj(x)
            }).filterNot(seqJsNull).toVector)
          case x: Object => convertObj(x)
          case x => throw new IllegalArgumentException(s"Unrecognized type for json mapping of value $x")
        }
      }

      def jsNull(keyValue: (String, JsValue)) = keyValue match {
        case (_, v) => v == JsNull
      }

      def seqJsNull(keyValue: JsValue) = keyValue == JsNull

      val fields = extract.getClass.getDeclaredFields
      fields.foreach(_.setAccessible(true))
      val jsValueMap = fields.map(f => f.getName -> toJsValue(f.get(extract))).filterNot(jsNull).toMap

      JsObject("extract" -> JsObject(jsValueMap))
    }
  }

  trait InstanceIntrospector[T] {
    def extract: T

    def fields: Array[(String, Any)] = {
      val fields = extract.getClass.getDeclaredFields
      fields.foreach(_.setAccessible(true))
      fields.map(f => camelToSnakeCase(f.getName) -> fieldValue(f))
    }

    private def fieldValue(f: Field) = f.get(extract) match {
      case Some(x) => x
      case None => null
      case x => x
    }

    def camelToSnakeCase(name: String): String = "[(A-Z)]|(\\d+)".r.replaceAllIn(name, m => "_" + m.group(0).toLowerCase())
  }

  def count(nodeSeq: NodeSeq) = nodeSeq.size

  def sum(nodeSeq: NodeSeq) = nodeSeq.foldRight(0)((node, t) => node.text.toInt + t)
}

Hi @LinaHAyoub

Happy New Year to you too! :fireworks:

Thanks for your input. As far as I can see one of the tokens i.e. lookupResponseXmlCodec is missing, so I cannot compile nor run the code. Could you share this example on the git hosting service of your choosing (for example, GitHub)? :slightly_smiling_face:

I am not aware of any tweaks at endpoint definitions, that might affect encoding. Everything should be performed at the level of codec. Since everything seems to be fine at that i.e. codec phase (checked by printing to the console), perhaps the problem lies in tools used to perform HTTP requests or some heders get appended, which makes the tool interpret the letter ‘Ø’ incorrectly? Still, I’m guessing here :thinking: