Custom co-recursive schemas

Seen a couple examples of a recursive Schema and those have been tough enough to grok, but I seem to have gone further and have a co-recursive case. I have not seen that discussed, but please point me to where to look if I missed it.

Real case has about a dozen cases and I believe 3 pairwise co-recursions. Maybe it’s even worse than that as I have HelperText which contains Text and HelperText appears in some Components. Anyhow, let’s start with the reduced case and get to the bottom of the key concepts.

Compiles fine, but when I try to execute main (which simply prints the componentSchema to force the evaluation of the lazy value) I get an infinite loop.

  1. How does one debug this sort of infinite loop? I get what’s happening roughly, but not sure how to dig in to understand the details. e.g. would love to figure out where the auto derivation put the SRefs to try to break the cycles to see if the choices seem sane.
  2. What are the proper approaches to work through this as I expect to be making this a much larger case once we resolve this minimized recreation?
  3. I’ve tried writing the Schema by hand, but couldn’t get that quite right given the documents and posts I was able to find. I believe the key is where SRefs get inserted and would prefer to have auto derivation do that rather than doing it manually purely due to maintenance concerns. If you do think manual schema specification is the path forward, where are good resources for defining larger custom schemas and figuring out where to put the SRefs (is it simply max flow/min cut?)?
  4. Any way to cause the failure to occur at compile time to avoid runtime surprises?

Thank you in advance!
PS First time poster with small ego so open to feedback.

Apologies for replying to myself, but I’ve managed to further simplify an example. I’m not sure if this is one of several problems or the problem, but reducing to the following also infinite loops:

My best guess is that a Vector of the sealed trait may lead to a case where the auto derivation doesn’t know where to insert an SRef. That’s my best guess though. I wish I knew how to inspect the generated schema, but then again it seems that reifying said schema leads to the infinite loop

After trying pickler (interesting IntelliJ warning about infinite loop when trying to simply summon[Pickler[Vector[Component]]]) and numerous other things, I decided to try hand writing the schema. It works, but maintaining this will not be great so I’d love an at least partially automated option. Here’s a link to what I have for the “simple” case of recursion with a collection interceding:

Going to proceed in two directions:

  1. Expanding to more cases
  2. Adding back the co-recursion
    Still… this seems like a workable approach now.

Well… that worked. Manually created the schema for the entire Component Schema using SRefs and then exposed that Schema. The SRefs refer to a version of Component that I had derived automatically (but marked private). That was sufficient to make it all work. It’s got some dozen cases, so curious if there’s another approach. In any case, I have a path forward.

Which version of Scala are you using? I’m trying this:

  sealed trait Component
  case class Button(text: String) extends Component
  case class Div(elements: List[Component]) extends Component

  given Schema[Component] = Schema.derived[Component]
  println(summon[Schema[Component]])

And I think I get the correct result:

Schema(SCoproduct(List(Schema(SProduct(List(SProductField(FieldName(text,text),Schema(SString(),None,false,None,None,None,None,false,false,All(List()),AttributeMap(Map()))))),Some(SName(sttp.tapir.examples.helloWorldNettySyncServer$package.helloWorldNettySyncServer.Button,List())),false,None,None,None,None,false,false,All(List()),AttributeMap(Map())), Schema(SProduct(List(SProductField(FieldName(elements,elements),Schema(SArray(Schema(SRef(SName(sttp.tapir.examples.helloWorldNettySyncServer$package.helloWorldNettySyncServer.Component,List())),None,false,None,None,None,None,false,false,All(List()),AttributeMap(Map()))),None,true,None,None,None,None,false,false,All(List()),AttributeMap(Map()))))),Some(SName(sttp.tapir.examples.helloWorldNettySyncServer$package.helloWorldNettySyncServer.Div,List())),false,None,None,None,None,false,false,All(List()),AttributeMap(Map()))),None),Some(SName(sttp.tapir.examples.helloWorldNettySyncServer$package.helloWorldNettySyncServer.Component,List())),false,None,None,None,None,false,false,All(List()),AttributeMap(Map()))

That’s using Scala 3.3

Although … InlineText is problematic:

sealed trait InlineText
  case class Span(content: String) extends InlineText
  case class Link(content: List[Span], url: String) extends InlineText

  given Schema[InlineText] = Schema.derived[InlineText]

That’s because the recursion points to a child class, not the root (InlineText), for which the schema is defined using given Schema[InlineText]

A work-around is something like this:

val temporaryInlineTextScheme: Schema[InlineText] = {
    import sttp.tapir.generic.auto.*
    summon[Schema[InlineText]]
  }
  given Schema[InlineText] = temporaryInlineTextScheme

However, trying to derive the “big” Component schema later fails. The recursion seems too deep.

Maybe it’s time to revisit the new macro for schema generation. I had it in the works some time ago … can you create an issue on GitHub, with the Component example as a failing test case?

Maybe we should have sth like Schema.derivedAuto, which would auto-derive all schemas reachable from the given root. That would be perfect for this use-case (deriving Component). I think jsoniter macros work in a similar way.

Thank you @adamw Let me try to respond to all your questions and comments:

  1. I’m on Scala 3.4.2
  2. I’d really like InlineText’s Link’s contents be Text not Vector[Span], but that makes InlineText corecursive with Component as Text is in actually a case of Component. Not sure if that changes your thinking.
  3. Your proposed fix. First can you kindly explain the thinking? Are you simply limiting the scope of the auto import (both to allow using auto derivation and to ensure where it is not used) or is there more going on? Second, would this allow a link to contain a link? I’d want to make sure a link cannot contain a link, but a link or span can contain any number of spans.

I hear you regarding the depth of recursion. I’m happy to dive pretty deep here and could use a bit of help on the tools to do so:

  1. How do you identify where the loop actually is? If by inspection, that helps too, but I don’t see how to retrieve the Schema’s internals if it fails. If the derivation succeeds one can print it, but on failure I see nothing.
  2. Relatedly, is there some intuition you can provide as to how schema derivation decides where to cut loops by inserting SRefs? Happy to read the code, but not sure where I should be looking.
  3. I’m hoping one outcome of this is a bit more documentation. It seems straightforward that recursion with a sibling is an issue, but it’s also not obvious (at least to me) a priori. I’d like to have a comment stating that added somewhere and some discussion of alternatives. Similarly, I’m unclear if adding a vector (or other collection) are an issue or not (I suspect not) and think that warrants half a sentence either way.
  4. I have made it work by writing the schema for Component (and Vector[Component] – simply component’s Schema.asIterable[Vector]) so I’m no longer blocked. That said, I’m not following how other solutions are working.
  5. I’m pretty clear all of this discussion is implicitly saying that co-recursion (even multiple simultaneous co-recursions) are not inherently a problem. e.g. InlineText ↔ Component.Text while several Component contain references to both Component and Vector[Component]. I believe that too should simply be stated. i.e. “There’s no fundamental limitation in the Schema design that prevents it from supporting recursion, co-recurstion etc. although those cases are often the most difficult for the tooling to automate.”
  6. I’m going to go try recreating a combined InlineText + Component example with a small number of cases. I’m going to try as well, but may I ask which approach to focus on? I’m not sure if the future is pickler, auto derivation, or the macro you mentioned.
  7. Perhaps unrelated, but unclear… Can you kindly articulate the preference for given over implicit val? I think for recursion you recommend implicit lazy val (presumably because there’s no lazy given) but not following what givens provide over implicit vals.

Happy to file a GitHub, but think I’m not yet clear on exactly what issue we are highlighting and I believe issues are best when they’re exceedingly clear.

Again, appreciate your time and help. If there’s any way I can be of assistance, please do let me know.

As for the first part:

  1. Yes, but I think if Component is the root, then by deriving the schema for component, we would derive the schema for any child classes as well. InlineText would be just more complex structure underneath.
  2. Yes, I want to limit the scope of auto-derivation, to still derive the schema once, but creating schemas for any classes reachable from the root (here - Component) automatically. Since we would derive schemas for all cases classes at once, we would be able to properly re-use them on recursion and use $refs. But that wouldn’t allow you to ensure that there’s no link-link recursion, at the level of deriving schemas. Not sure if that’s a kind of property that should be guaranteed by the derivation mechanism, though.

As for the second part:

  1. The Magnolia-driven derivation has a local cache (implemented in quite a primitive way, as a thread local mutable list). If it encounteres a type that has been covered before, it creates an $ref, instead of a nested schema: tapir/core/src/main/scala-3/sttp/tapir/generic/auto/SchemaMagnoliaDerivation.scala at ba35f6bf6fe67784e1c0c8924ae377ffe16ed01d · softwaremill/tapir · GitHub
  2. The way this works now, is that every schema is derived individually. So deriving a schema for X and Y, if they are mutually recursive, will derive both of their schemas twice. That’s not necessarily a bad thing, and might work fine. The references will occur only when there’s a X->X recursion or Y->Y one. Note that when translating the schemas to OpenAPI ones, there’s an additional level of de-duplication, so that there’s only one X schema and one Y schema in the resulting spec.
  3. I’m not 100% sure why this doesn’t work yet. Some problems seem to arise from limitations of the derivation process - and the max-inlines setting, which is reached quite quickly. This comes from the fact the we are using Magnolia, which simplifies the implementation, but does add some overhead when it comes to code generation. Cutting out this middle-man might simplify things, but it’s not an easy task, which I once tried but paused/abandoned for lack of time.
  4. Pickler re-uses the schema derivation - it’s another level of abstraction, not applicable here. What pickler does is that it tries to create a common configuration for both Schema and JSON derviation. However the Schema/JSON derivation mechanisms work the same way as when without using pickler. Auto derivation uses the macro-based derivation as well, it’s just that it defines an implicit value for any type T, which tries to dervie the schema using the Magnolia macro.
  5. Aren’t they the same? given is Scala 3’s way of saying implicit lazy val in Scala 2. I think the “Scala 3 way” would be to use only given & using instead of implicit

Thank you for taking the time to answer this in detail. I want to make sure I generate anything you’ve asked for. I believe I just got my case working with your technique. So, not blocked AND still interested in helping.

A few quick follow ups and then onto how to help. New numbering as several issues reflect multiple of your answers:

  1. If I understand correctly assigning a schema would trigger the dedup and potentially SRef insertion, thus private val foo: Schema[???] = Schema.derived, given bar: Schema[???] = foo, would mean that foo has the full schema and bar is an SRef. Is my understanding correct?
  2. RE: givens vs. implicity lazy, I am asking honestly as I have not found a clear “the difference is X” although I see given & implicit lazy val, in use. I’ve been searching and yet to find a definitive statement on the matter.

So now… how can I help? You’ve asked me to open a Github ticket. I’ve mentioned creating a test case with Component & InlineText. It seems the latter doesn’t help you… so if I understand correctly, the simple component case which does NOT refer back to the parent is the base case you’d like? Trying to make sure I’m following.

Also happy to dig in and help out so just let me know.

Thanks again!

  1. Hm not sure I understand. Every Schema[T] is always self-contained. So there will be duplication, if you have schemas for several related types. This duplication is then resolved when generating OpenAPI docs. SRefs are only generated when there is a recursive reference within such a self-contained schema.
  2. I think there’s simply no difference between given and implicit lazy val, that is, it’s just syntactic sugar. But I might be wrong. Maybe asking here would help to resolve this problem?

I think a PR with a possibly minimal test case is a great start. It’s always something we can use to base future work. And an issue, as that’s our tasklist, and it allows others to upvote. That’s how we can discover the most problematic issues :slight_smile: