How to define custom Mutipart codec?

suppose we have such endpoint for uploading file.


// post /file/<id>
// form format:
//    uploader: name text
//    file: binary
case class FileForm(uploader: String, file: Part[File])

val uploadEndpoint =
    endpoint
      .post
      .in("file")
      .in(path[String]("id"))
      .in(multipartBody[FileForm])
      .out(stringBody)
      .tag("File")

How can I use custom codec filenameEncoded like this one ?

import sttp.model.Part
import sttp.model.Part.FileNameDispositionParam
import sttp.tapir.CodecFormat.OctetStream
import sttp.tapir.macros.MultipartCodecMacros
import sttp.tapir.{Codec, MultipartCodec, PartCodec, RawBodyType}

import java.net.URLEncoder
import java.nio.charset.StandardCharsets
import scala.collection.immutable.ListMap


object CustomMultipartCodec extends MultipartCodecMacros {
  private val arrayBytePartListCodec: Codec[List[Part[Array[Byte]]], List[Part[Array[Byte]]], OctetStream] =
    implicitly[Codec[List[Part[Array[Byte]]], List[Part[Array[Byte]]], OctetStream]]

  def encode(s: String) = URLEncoder.encode(s, StandardCharsets.UTF_8)
  
  val filenameEncoded: MultipartCodec[Seq[Part[Array[Byte]]]] =
    Codec
      .multipart(Map(), Some(PartCodec(RawBodyType.ByteArrayBody, arrayBytePartListCodec)))
      // we know that all parts will end up as byte arrays; also, removing/restoring the by-name grouping of parts
      .map(_.values.toSeq.flatMap(_.asInstanceOf[List[Part[Array[Byte]]]]))(l =>
        ListMap(l.groupBy(_.name).toList.map {
          case (name, parts) =>
            if (name == FileNameDispositionParam) name -> parts.toList.map(_.fileName.map(encode)) else name -> parts.toList
        }: _*)
      )
}

I want to encode filename part to UTF8 to avoid the problem which will be trigger when upload file with non-ascii name.

This seems not the right way to use

      .in(multipartBody[FileForm](filenameEncoded))

Is there any way to let the auto generated schema use my custom codec, or is there any way to provide codec explictly ?

It’s impossible on the level of MultipartCodec, at this stage it’s already too late.

What Tapir does is (simplified):

  1. Tapir first decodes org.http4s.Media[F] into org.http4s.multipart.Multipart[F] using implicit org.http4s.EntityDecoder[F, Multipart[F]]
  2. It then converts the Multipart[F] into a set of sttp.model.Part
  3. The codec transforms this set of Part into a your case classes.

Plugging a custom codec happens at point 3. I guess the problem happens in step 1, where the EntityDecoder reads the Content-Disposition header. I don’t see how this could be intercepted. Even if you decided to write your custom EntityDecoder[F, Multipart[F]], there’s no way to make Tapir use it instead of the default one, summoned in Tapir internals from imports (namely in sttp.tapir.server.http4s.Http4sRequestBody).

1 Like