Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New json serializer serializes empty complex types as {}, which fails to deserialize with new deserializer #2235

Closed
almostchristian opened this issue Sep 13, 2022 · 8 comments
Assignees

Comments

@almostchristian
Copy link
Contributor

The new System.Text.Json serializer serializes empty complex types as empty JSON objects {} which fail to deserialize with the System.Text.Json deserializer. From what I understand, the new deserializer is supposed to fail on empty objects and empty lists, so the serializer should have treated these as null like in the older serializer.

To Reproduce

var serializerOptions = new JsonSerializerOptions().ForFhir();

var patient = new Patient
{
   MaritalStatus = new CodeableConcept(),
};

// {"resourceType":"Patient","maritalStatus":{}}
var patientJson = JsonSerializer.Serialize(patient, serializerOptions);

// throws Hl7.Fhir.Serialization.DeserializationFailedException
var deserialized = JsonSerializer.Deserialize<Patient>(patientJson, serializerOptions);

Version used:

  • FHIR Version: R4B
  • Version: 4.2.1
@ewoutkramer
Copy link
Member

ewoutkramer commented Sep 13, 2022

That's right. If have documented this behaviour here: https://docs.fire.ly/projects/Firely-NET-SDK/parsing/system-text-json-serialization.html#serialization-with-pocos-and-system-text-json. The serializer does not do any validation and can easily be (misused) to generate incorrect json when pushed to do so.

The serializer will serialize what you have added to the POCO, in this case an empty CodeableConcept. Trying to avoid serializing empty objects would cost extra computation time, e.g. to first traverse the children to see whether the object is empty or not, before starting to serialize the object. Basically, making serialization slower in the general case to fix this specific case. The idea is that it is easier (and cheaper) for the caller of the serializer to avoid adding empty objects in the first place...

But I am all ears to hear why this is a wrong assumption.

@ewoutkramer
Copy link
Member

ewoutkramer commented Sep 13, 2022

I have been thinking about this a bit. The problem of optimizing for empty objects is that we only know an object is not empty when we hit the first property that is set (which could possibly be buried deeply in nested "almost empty" objects). Whether we ever find that first property or not, we will already have written the opening brackets to the Utf8JsonWriter for each (nested) object by then.

A possible solution would be to delay writing the opening brackets (basically caching them up in some kind of serialization state) until we hit the first property, and then flush out all the brackets. If we never encounter a single property, we just clear that cached set of brackets.

@almostchristian
Copy link
Contributor Author

Yeah, I am reading the FhirJsonPocoSerializer, and I think there may be places here to avoid empty arrays and objects by using writer.Reset() as you are alluding to. Not sure if the performance cost here is an issue, but we can always write benchmarks to test this.

We found this issue because in our web application, our developer wrote code during development to modify a resource before saving it to the document db. Naturally, it was able to serialize to db but failed during deserialization.

@ghost
Copy link

ghost commented Nov 12, 2022

Why not set the DefaultIgnoreCondition in serializerOptions to WhenWritingDefault ? An object with no properties set is a default object.

https://learn.microsoft.com/en-us/dotnet/api/system.text.json.serialization.jsonignorecondition?view=net-6.0

image

@ewoutkramer
Copy link
Member

Though this looks nice, I think this is only applicable when you do the "built-in" object -> json serialization. FHIR Json serialization is complex enough that we wrote our own serialization (so using the Utf8JsonWriter directly) - I cannot imagine this setting is applicable to that lower-level API.

@ghost
Copy link

ghost commented Nov 18, 2022

@ewoutkramer
Copy link
Member

So, to finalize this discussion:

On the serializer

It is a feature that the serializer is able to serialize incorrect data, for three reasons:

  • The POCO model can contain incorrect data by design (e.g. unrecognized enum values, incorrect date values), so we don't lose information. It allows you to parse incorrect data, inspect the errors, and correct them. At least, we can roundtrip these kind of errors. Which is nice if you have a database of incorrect data.
  • Adding validation to serialization would mean slowing down the serializer. If you really want that, you can run the attribute validation yourself before running serialization.
  • Serialization is not supposed to fail or throw exceptions, users will just not expect that to happen.

This behaviour is also documented.

On the parser

  • The FHIR spec states on the xml and json pages that empty objects are not allowed, so this is something we should detect.
  • The old parsers could be tuned to be strict or lenient. In the latter case, empty arrays are acceptable, so it is easy to fix the problem of not being able to parse the serialized json by using a lenient parser.
  • This was also a point of contention: what if we don't want it to be lenient for other errors? Your "lenient" is not my "lenient". It proved finding the sweet spot was hard,
  • So, the newer parsers now detect everything that deviates from the spec in a syntactic way (so we're not a full profile validator, just the basic structural errors). The issues found are now coded. The parser will continue parsing, even if it finds issues. The returned exception (DeserializationFailedException) contains a PartialResult property that contains all data we were able to store in the POCO, regardless of the issues found.
  • In the new parser, the "empty array" error has its own code, so you can chose to use the PartialResult is this is the only error that occurs. Like this, you can exactly determine how "lenient" you want the parsing to be.

So, I think we have all ingredients to handle these situations. The only thing that is missing is that it's not obvious that all of this is possible. It's documented, but the API does not make it obvious or easy to ignore specific errors.

What I will do is create a new feature request for the next update of the SDK to improve the interface so this becomes easy to do.

@ewoutkramer
Copy link
Member

See also:

I'd love to get feedback on the proposal before we implement it, to see if it fits your usecase!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants