Use @validate before decoding phase

Decoding of values is done before validation. So when using enums and a Json library like Circe, if a value in an HTTP request is not of of the values in the enum, a decoding error is generated by Circe.

This error is also “leaked” to the response body, e.g. for validation

  @validate(Validator.derivedEnumeration[Currency])
  currency: Currency

The following response error is presented:

Invalid value for: body ('USD' is not a member of enum x.y.z.Currency$@6515ebbe at '[0].currency'

While expected would be something like:

Invalid value for: body ('USD' is not a member of enum Currency')

Using .mapValidate validation can be done before decoding. However I could not get this to work with the validation annotation @validate.

Any idea how to deal with this?

The decoding is actually done by circe, so that’s where the error messages are produced.

If you take a look at TapirJsonCirce, any circe errors are translated into JsonDecodeExceptions, which are then formatted here: tapir/DecodeFailureHandler.scala at master · softwaremill/tapir · GitHub.

So to improve the error message, I’d look at how you can customise circe’s decoding exceptions, and/or adjusting the error formatting done by tapir.

Thanks for the quick reply @adamw!

This approach would solve the problem, indeed. However, it would be more elegant if @validate could be used to also validate the enum.

What happens now is that with all of the validations used in for instance:

  @validate(Validator.derivedEnumeration[Currency])
  currency: Currency
  @validate(Validator.min(0))
  age: Int
  @validate(Validator.pattern("[0-9]+"))
  someNumber: String,

we get a nice aggregation of the incorrect values:

expected age to be greater than or equal to 0, but got -1,
expected someNumber to match: [0-9]+, but got: \"ABCD\"

But the validation error for the enum will not be in that message. If I understand correctly, the decoding will done before validation, and will produce a separate error.

This also makes me wonder what the use of Validator.derivedEnumeration[Currency] is, apart from documentation. It will be executed after Circe’s decoding, which already proves that the value sent by the user is one of the values of the enum.

So ideally, something like this would be the error message:

expected currency to be one of ["EUR", "USD"], but got "NZL",
expected age to be greater than or equal to 0, but got -1,
expected someNumber to match: [0-9]+, but got: \"ABCD\"

Indeed enumeration validators are used mainly for documentation.

To be precise, the validator is run, however it is run on a decoded value. However, it’s really a no-op, unless the enumeration constraints a broader type (such as an Int). In the case of an enum type such as Currency, there’s no way to run the validator before, as it requires an instance of type … Currency. Which can only be created given valid values.

The low level (or “raw”) representation is entirely opaque to tapir. There’s no way for the codec to introspect it and validate. Hence we can’t report validation errors.

Thanks for the explanation. From a technical perspective it makes perfectly sense.

From a user perspective it is a bit odd to see validation errors for all types in one message, except for enums. I was hoping to also put the enum validation error alongside the other validation errors. This will give users of the REST API a better experience.

Do you have any idea how we could elegantly solve this? .mapValidate might help us here, by first validating the low-level represenation. A bit hacky, but at least the user has a consistent experience.

I agree that would be best from a UX perspective. .mapValidate works for top-level enums, i.e. when the enum is a value of a query parameter or a path component. Here, we have values nested inside of JSON bodies (as I understand). To uniformly validate all values, we would need first to parse the JSON to some intermediate representation, validate some of the fields, and only then create the final value.

Maybe it would be doable by parsing the json into circe’s Json, then traversing the schema alongside the intermediate json, looking for validators and collection validation errors, and reporting that. Still, there would be some problem in getting the low-level representation of the enum values, but probably the encode could be used for that.

But I’m not sure if this can be fixed in the general case. Tapir’s architecture on one hand allows integrating with any library for encoding/decoding bodies (providing schema documentation on top of that), but on the other this constraints how we can inspect the internal representation, as is the case here.

Got it, this won’t work because of the design.

Using an HCursor in Circe we could traverse the low-level representations circe: Traversing and modifying JSON . First we validate and then decode. This seems a bit cumbersome though.

For now we’ll have to live with the decoding error retured by Circe, which we could improve not to disclose internal system information.

Maybe a bit farfetched, but this could be used to fingerprint the API as a Tapir API. When the user gets a separate validation error for only enums it might be a Tapir API. Not sure how other APIs would handle enum validation, maybe they do the same :wink:

Re: fingerprinting - true :). No idea how other frameworks solve this issue. I expect anything that integrates with a chosen JSON library to face the same tradeoff. Circe is one example of a library which does have an intermediate representation, but others (e.g. jsoniter) do not. In such case, you’d need to have dedicated validation support inside the json library.

1 Like