File download only started when full content is read

Hello!
I’m trying to setup an endpoint to download a file, I have this as an example:

package com.softwaremill

import cats.effect.{ExitCode, IO, IOApp}
import cats.syntax.all._
import com.comcast.ip4s.Host
import fs2.{Chunk, Stream}
import org.http4s.ember.server.EmberServerBuilder
import org.http4s.server.Router
import sttp.capabilities.fs2.Fs2Streams
import sttp.model.HeaderNames
import sttp.tapir._
import sttp.tapir.server.http4s.Http4sServerInterpreter

import scala.concurrent.duration._

object Main extends IOApp {

  override def run(args: List[String]): IO[ExitCode] = {

    val downloadFileEndpoint = endpoint.get
      .in("file")
      .out(header[Long](HeaderNames.ContentLength))
      .out(header[String](HeaderNames.ContentDisposition))
      .out(streamBinaryBody(Fs2Streams[IO])(CodecFormat.OctetStream()))
      .serverLogicSuccess { _ =>
        val size = 100L
        Stream
          .emit(List[Char]('a', 'b', 'c', 'd'))
          .repeat
          .flatMap(list => Stream.chunk(Chunk.seq(list)))
          .metered[IO](100.millis)
          .take(size)
          .covary[IO]
          .map(_.toByte)
          .pure[IO]
          .map(s => (size, s"attachment; filename=test", s))
      }

    val routes = Http4sServerInterpreter[IO]().toRoutes(downloadFileEndpoint)

    EmberServerBuilder
      .default[IO]
      .withHost(Host.fromString("localhost").get)
      .withHttpApp(Router("/" -> routes).orNotFound)
      .build
      .use { server =>
        for {
          _ <- IO.println(s"Server started at http://localhost:${server.address.getPort}. Press ENTER key to exit.")
          _ <- IO.readLine
        } yield ()
      }
      .as(ExitCode.Success)
  }
}

I was expecting when hitting the endpoint with Chrome that the response will be immediate and the file being downloaded during 10s.
Instead the response is taking 10s then the file is downloaded in a few ms.
image

Is there a way to stream the download?

Thanks!

I’m not sure how these metrics exactly work, but I tested the code above (well, almost, I’ve used Blaze not Ember - maybe that’s something to double-check) and it seems to work:

  1. first, using curl -vN http://localhost:8080/file incrementaly shows the progress of downloading the data
  2. second, using Chrome - the file is downloaded over 10 seconds, with a progress bar showing how much data has been received so far

Also, for serving files, maybe the following will work for you: Serving static content — tapir 1.x documentation

There’s also a dedicated fileBody endpoint input/output which is optimized for serving files.

Thanks Adam,
Running with Blaze seems to work:

# Blaze
curl -w "@curl-format.txt" -vN http://localhost:8080/file
*   Trying 127.0.0.1:8080...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /file HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Disposition: attachment; filename=test
< Content-Type: application/octet-stream
< Date: Fri, 28 Jul 2023 08:07:41 GMT
< Content-Length: 100
<
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd* Connection #0 to host localhost left intact
  time_namelookup:  0.000424s                                                                                                                    
        time_connect:  0.000553s                                                                                                                 
     time_appconnect:  0.000000s                                                                                                                 
    time_pretransfer:  0.000633s                                                                                                                 
       time_redirect:  0.000000s                                                                                                                 
  time_starttransfer:  1.127303s                                                                                                                 
                     ----------                                                                                                                  
          time_total:  11.007692s         
		  
# Ember		  
curl -w "@curl-format.txt" -vN http://localhost:8080/file
*   Trying 127.0.0.1:8080...                       
* TCP_NODELAY set                                  
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /file HTTP/1.1                               
> Host: localhost:8080                             
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Fri, 28 Jul 2023 08:09:33 GMT
< Connection: keep-alive
< Content-Length: 100
< Content-Disposition: attachment; filename=test
< Content-Type: application/octet-stream
<
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd* Connection #0 to host localhost left intact
  time_namelookup:  0.000326s
        time_connect:  0.000568s
     time_appconnect:  0.000000s
    time_pretransfer:  0.000739s
       time_redirect:  0.000000s
  time_starttransfer:  10.884396s
                     ----------
          time_total:  10.884517s

As you can see, the time_starttransfer is 1s for Blaze when it’s 10s for Ember, should I open a bug in ember server?

Regarding serving files, thanks for the doc but in our use case we’re not serving static files but stream of files data we’re creating.

Hm that’s interesting. I think it would be best to try to write a really simple http4s route directly (without tapir), to rule out tapir’s involvement. Although, we don’t even depend on blaze/ember, yet alone differentiate between the two when interpreting. Still, a pure-http4s example, if the problem is reproduced, would be what’s needed to open an issue in http4s itself

Hi Adam,
I’ve done more tests, one with a simple htt4ps with ember and it’s working as expected.
I noticed that the response header “Transfer-Encoding=chunked” was automatically added, so I’ve tried to add it in the tapir endpoint and it made it work too!
Here’s the code:

package com.softwaremill

import cats.effect.{ExitCode, IO, IOApp}
import cats.syntax.all._
import com.comcast.ip4s._
import fs2.{Chunk, Stream}
import org.http4s._
import org.http4s.dsl.io._
import org.http4s.ember.server.EmberServerBuilder
import org.http4s.headers.{`Content-Disposition`, `Content-Length`, `Content-Type`}
import org.http4s.server.Router
import org.typelevel.ci._
import sttp.capabilities.fs2.Fs2Streams
import sttp.model.HeaderNames
import sttp.tapir._
import sttp.tapir.server.http4s.Http4sServerInterpreter

import scala.concurrent.duration._

object Main extends IOApp {

  private val size = 100L
  private val dataStream = Stream
    .emit(List[Char]('a', 'b', 'c', 'd'))
    .repeat
    .flatMap(list => Stream.chunk(Chunk.seq(list)))
    .metered[IO](100.millis)
    .take(size)
    .covary[IO]
    .map(_.toByte)

  override def run(args: List[String]): IO[ExitCode] = {
    (tapirServer(), http4sServer()).tupled
      .use(_ => IO.println(s"Servers started. Press ENTER key to exit.") >> IO.readLine)
      .as(ExitCode.Success)
  }

  private def tapirServer() = {
    val streamEndpoint = endpoint.get
      .out(header[Long](HeaderNames.ContentLength))
      .out(header[String](HeaderNames.ContentDisposition))
      .out(streamBinaryBody(Fs2Streams[IO])(CodecFormat.OctetStream()))

    // Not streaming properly
    val downloadFileEndpoint = streamEndpoint
      .in("file")
      .serverLogicSuccess(_ => dataStream.pure[IO].map(s => (size, s"attachment; filename=test", s)))

    // Streaming properly
    val downloadFileChunkedEndpoint = streamEndpoint
      .out(header[String](HeaderNames.TransferEncoding))
      .in("file-chunked")
      .serverLogicSuccess(_ => dataStream.pure[IO].map(s => (size, s"attachment; filename=test-chunked", s, "chunked")))

    val routes = Http4sServerInterpreter[IO]().toRoutes(List(downloadFileEndpoint, downloadFileChunkedEndpoint))

    emberServer(port"8080", routes)
  }

  private def http4sServer() = {
    val routes = HttpRoutes.of[IO] { case GET -> Root / "file-chunked" =>
      Ok(
        dataStream,
        `Content-Length`(size),
        `Content-Disposition`("attachment", Map(ci"filename" -> "test")),
        `Content-Type`(MediaType.application.`octet-stream`)
      )
    }
    emberServer(port"8081", routes)
  }

  private def emberServer(port: Port, routes: HttpRoutes[IO]) = EmberServerBuilder
    .default[IO]
    .withPort(port)
    .withHttpApp(Router("/" -> routes).orNotFound)
    .build
}

And the calls:

# Not streaming
curl -w "@curl-format.txt" -N http://localhost:8080/file
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
time_starttransfer: 10.009549s, time_total: 10.009619s

# Streaming
curl -w "@curl-format.txt" -N http://localhost:8080/file-chunked
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
time_starttransfer: 0.016075s, time_total: 10.020849s

# Streaming
curl -w "@curl-format.txt" -N http://localhost:8081/file-chunked
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
time_starttransfer: 0.148120s, time_total: 10.151464s

Do you think it’s a Tapir bug? Or maybe it’s a requirement to have transfer-encoding=chunked?

[deleted previous answer]
EDIT: I double-checked and ran some more tests. After all my answer above doesn’t apply because we’re not dealing with files. The example you gave uses streamBinaryBody, which is handled with empty content length, so it’s probably not Tapir who selects between chunked and non-chunked type of response. I think the same issue should happen on running a pure http4s+ember combo and returning a stream, so it doesn’t look like an issue on Tapir side.

I think tapir does the right thing, as transfer-encoding and content-length are mutually exclusive (you use transfer-encoding: chunked because you don’t know the length of the body, see here).

Extending the experiments I get the following results, on the /file endpoint (so chunking is never explicitly enabled):

  • tapir + ember: no streaming, only C-L set
  • http4s vanilla + ember: streaming, both T-E and C-L headers set
  • tapir + blaze: streaming, only C-L set
  • http4s vanilla + blaze: streaming, only T-E set

The only combination which behaves according to the user’s specification is tapir+blaze. vanilla+blaze discards the provided C-L info and chunks the response. I have no idea why tapir + ember doesn’t do streaming :slight_smile:

This is pretty strange, as it’s clearly against the specs.