Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document type requires multiple serialization / deserialization for usage with serde_json::Value-based APIs #2702

Open
andrewbanchich opened this issue May 15, 2023 · 18 comments

Comments

@andrewbanchich
Copy link

Because Document uses HashMap for objects instead of serde_json::Map, there needs to be a second pass at converting that HashMap to a serde_json::Map when using libraries which work with serde_json::Values, and then another for turning it back to a HashMap / Document for use in Smithy output types.

serde_json::Value is the standard for the Rust ecosystem, so I expect a lot of crates will need to do this.

@andrewbanchich
Copy link
Author

It would be really nice if Smithy codegen could use T: Serialize for input types and T: Deserialize for output types instead of concrete values so that users could choose.

@jdisanti
Copy link
Collaborator

@thomas-k-cameron has done a lot of work to add Serialize/Deserialize support, and there are a number of PRs open to add it to other types in aws-smithy-types (#2645, #2646, #2647).

The same pattern can be followed to also add them to Document. It just needs to be behind #[cfg_attr(aws-sdk-unstable, feature = "serialize")] or #[cfg_attr(aws-sdk-unstable, feature = "deserialize")] conditional compilation flags for now.

We won't get to this any time soon, but you're welcome to contribute and work with @thomas-k-cameron to get this done.

@andrewbanchich
Copy link
Author

To clarify, this is more about the Document type using a HashMap instead of a serde_json::Map. Even if Document did impl Serialize / Deserialize, many users would still need to converted a Document to a Value and back. This conversion is trivial for other variants of documents because their types line up with the same inner types of serde_json::Values, so one can just unwrap them from Document and wrap them in Value. Document::Object wouldn't be able to do that though.

@thomas-k-cameron
Copy link
Contributor

thomas-k-cameron commented May 18, 2023

serde_json::Map uses BTreeMap or IndexMap and this is selected by conditional compilation and they used to use other implementation (e.g. LinkedHashMap).

Since they are not making any guarantee, I don't know if it would be a good idea.

@thomas-k-cameron
Copy link
Contributor

thomas-k-cameron commented May 19, 2023

Once RFC30 is implemented you should be able to do something like this.

    let doc = aws_smithy_types::Document::Object(HashMap::new());
    let serde_json_value = serde_json::to_value(doc).unwrap();
    let target_datatype = serde_json::from_value(serde_json_value).unwrap();

If your concern is for the UX then I think we could add a function like this.

impl Document {
   pub fn to_serde_json(&self) -> serde_json::Value {
       // convert to Value
   }
}

@andrewbanchich
Copy link
Author

My concern isn't about implementing Serialize or Deserialize, and it isn't the API for doing so.

I'm saying that 99.9% of the Rust ecosystem uses serde_json::Value for representing JSON, not Smithy's Document type. For inputs, users will have API handlers which accept a Document and output a Document, but in between the input and output, basically everyone will need to convert this type manually to serde_json::Value.

Converting between map types is not trivial. Users will need to pay a performance cost for arbitrarily large map conversions.

serde_json is flexible enough to allow the user to choose what type they want to deserialize to; if I want a HashMap then I can have it deserialize to a HashMap. If I want a serde_json::Map, that works too.

I'm saying IMO forcing one map type onto all users is not ideal. It would be nice to allow users to specify the type like serde_json does.

I guess my concern also applies to Document::Array - is Smithy going to make users manually map over arrays and call some custom convert_to_x function?

Other than Array and Document, the other variant types are all standard types, so can easily be used throughout the ecosystem. However, Vec and HashMap both use generics, which is why I believe they are problematic, and that you should pass the control of the concrete type through to the user.

@jdisanti
Copy link
Collaborator

jdisanti commented May 19, 2023

I understand the concern, but we don't want to depend on any serde libraries out of box. I could be persuaded to represent Document with serde_json::Value behind the scenes #[cfg_attr(aws-sdk-unstable)] switch though, and also add to/from value conversions.

Our concern is with maintainability. The SDK will have to be maintained without breaking changes for many years, so exposing separate libraries in the public API is a risk.

@andrewbanchich
Copy link
Author

aws-sdk-unstable sounds great to me! Thank you.

Any thoughts on handling this with a generic and allowing the user to specify the concrete type? I'd think that would be the best of both worlds.

@thomas-k-cameron
Copy link
Contributor

thomas-k-cameron commented May 20, 2023

Every data type that uses Document would need to accept generic parameters, so I'm pretty sure you would need to make a lot of changes to the code gen logic.

Also, I'm not quite sure how maintain-able it is to introduce a new interface that needs to be implemented.

I think it's more simpler/practical to do what the serde_json is doing.

@andrewbanchich
Copy link
Author

Every data type that uses Document would need to accept generic parameters

Why? I'm suggesting to do what serde_json does here: https://docs.rs/serde_json/latest/serde_json/fn.from_reader.html

This would give users to choice to use Document or Value or HashMap or anything else.

@thomas-k-cameron
Copy link
Contributor

I thought you meant something like this.

enum Document<T: GenericMap = HashMap<String, Document>>

I don't think I understand you correctly, would you mind giving me an example?

@andrewbanchich
Copy link
Author

That would be nice to have a default generic parameter to allow users to choose the kind of map, but when I said T: Deserialize I meant that in place of Document altogether.

There are so many parallels to serde_json::Value here, I'm wondering why we couldn't allow users to choose the entire type themselves without a need for a wrapper enum like Document or Value.

If I use a Smithy structure, I get a Smithy type which is a struct. If I use a Smithy enum, I get a Smithy type.

If I choose String or Number or bool, however, I get a standard Rust type.

I'm wondering if a Smithy model Document type could be more of a generic itself than a concrete type, and deserialize to a type I have defined in my own crate rather than one Smithy defined.

This would allow users to choose the specific map implementation used and have the benefit of not being subject to the orphan rule.

E.g. the Smithy model type is generic T (fake keyword I just invented). The codegen for Rust would produce T: Deserialize if used in an input type in the model.

The route handler for the server, for example, would accept a function which accepts an argument of any type which implements Deserialize.

Then, as a user, I can write my handler fn handler(input: Whatever) -> Result<...> and it will be deserialize to a struct Whatever { .. }.

@andrewbanchich
Copy link
Author

To phrase it slightly differently, the point of serde_json is to provide polymorphism over anything which can be serialized into or deserialized from JSON.

The Value enum is only one form of this polymorphism. Generics are another, more powerful and flexible form.

If Smithy is providing the enum, could it also provide the generic?

@andrewbanchich
Copy link
Author

andrewbanchich commented Jul 28, 2023

This would be fixed with #2858 , which is what I was asking for and more 😄

@david-perez
Copy link
Contributor

@andrewbanchich Why does #2858 solve this? In #2858, server operation handlers would take in types that implement FromRequest, but FromRequest should only be implemented by the generated SDK on the Rust struct corresponding to the operation input.

@andrewbanchich
Copy link
Author

I had spoken with @hlbarber about this and it sounded like we'd be able to impl FromRequest for our custom types as well. Maybe I misunderstood.

@david-perez
Copy link
Contributor

I don't think we should let users implement FromRequest for their own types. In fact, I think we should doc-hide and seal the trait.

If we allow users to implement FromRequest, and #2858 lands, users could easily violate their Smithy model.

@andrewbanchich
Copy link
Author

andrewbanchich commented Oct 14, 2023

I think exposing this trait would be very beneficial.

Smithy cannot serve as a source of truth for all validation rules for inputs, so currently our validation is split between our Smithy-generated types and our custom server code, with the server code performing the validation which Smithy cannot express. Because of this, the code we work with has no single source of truth for validation, which is not ideal. I would argue that the inverse is already true: Smithy allows applications to violate their "true" set of validation rules since some cannot be expressed by Smithy.

In this context, I think it would be better to have a single source of truth defined by application code than to have two sources of truth.

Exposing the trait would also mean we can avoid having a ton of code for converting between types, as well as running into the orphan rule when trying to add abstractions over Smithy types which don't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants