Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deserialize maybe-borrowed maybe-copied data? #914

Closed
sunshowers opened this issue May 2, 2017 · 6 comments
Closed

How to deserialize maybe-borrowed maybe-copied data? #914

sunshowers opened this issue May 2, 2017 · 6 comments
Labels

Comments

@sunshowers
Copy link

sunshowers commented May 2, 2017

For example, serde-json can sometimes return borrowed data and sometimes return copied data, depending on whether it was reading from a Read or a slice and whether it had to escape any data: https://docs.serde.rs/src/serde_json/de.rs.html#993

How can this be handled in a way that gives us zero-copy if possible but can also do a copy if needed?

I think the current API always forces us to allocate, but ideally this can be handled using a Cow<'a, str> or similar, can't it?

@dtolnay dtolnay added the support label May 3, 2017
@dtolnay
Copy link
Member

dtolnay commented May 3, 2017

As you suspected, this can be handled using Cow<str> or similar.

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use std::borrow::Cow;

#[derive(Deserialize)]
struct Sid0<'a> {
    #[serde(borrow)]
    cow: Cow<'a, str>,
}

fn main() {
    let a = serde_json::from_str::<Sid0>("{\"cow\":\"A\"}").unwrap();
    match a.cow {
        Cow::Borrowed(s) => println!("borrowed: {}", s),
        Cow::Owned(s) => println!("copied: {}", s),
    }

    let b = serde_json::from_str::<Sid0>("{\"cow\":\"\\u0042\"}").unwrap();
    match b.cow {
        Cow::Borrowed(s) => println!("borrowed: {}", s),
        Cow::Owned(s) => println!("copied: {}", s),
    }
}

@sunshowers
Copy link
Author

sunshowers commented May 3, 2017

Ah, the thing I was missing was that wrapping Cow in Vec doesn't appear to work (does this even make sense?)

#[derive(Deserialize)]
struct Sid0<'a> {
    #[serde(borrow)]
    cow: Vec<Cow<'a, str>>,
}

fn main() {
    let a = serde_json::from_str::<Sid0>("{\"cow\":[\"A\"]}").unwrap();
    match a.cow[0] {
        Cow::Borrowed(ref s) => println!("borrowed: {}", s),
        Cow::Owned(ref s) => println!("copied: {}", s),
    }

    let b = serde_json::from_str::<Sid0>("{\"cow\":[\"\\u0042\"]}").unwrap();
    match b.cow[0] {
        Cow::Borrowed(ref s) => println!("borrowed: {}", s),
        Cow::Owned(ref s) => println!("copied: {}", s),
    }
}

prints out

copied: A
copied: B

@dtolnay
Copy link
Member

dtolnay commented May 3, 2017

Oh working with collections + attributes is currently obnoxious because we don't yet have a way to apply attributes to the content of a collection. This is tracked in #723.

It is possible to make it work but unless you really need it, I would recommend sticking to owned types in collections (just like you would have in any previous version of Serde).

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use std::borrow::Cow;
use std::fmt;

use serde::de::{Deserializer, Visitor, SeqAccess};

#[derive(Deserialize)]
struct Sid0<'a> {
    #[serde(borrow, deserialize_with = "vec_cow")]
    cow: Vec<Cow<'a, str>>,
}

fn vec_cow<'de, D>(deserializer: D) -> Result<Vec<Cow<'de, str>>, D::Error>
    where D: Deserializer<'de>
{
    struct VecCow;

    impl<'de> Visitor<'de> for VecCow {
        type Value = Vec<Cow<'de, str>>;

        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
            formatter.write_str("an array")
        }

        fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
            where A: SeqAccess<'de>
        {
            #[derive(Deserialize)]
            struct Wrapper<'a>(#[serde(borrow)] Cow<'a, str>);

            let mut vec = Vec::new();
            while let Some(wrapper) = seq.next_element::<Wrapper>()? {
                vec.push(wrapper.0);
            }
            Ok(vec)
        }
    }

    deserializer.deserialize_seq(VecCow)
}

fn main() {
    let a = serde_json::from_str::<Sid0>("{\"cow\":[\"A\"]}").unwrap();
    match a.cow[0] {
        Cow::Borrowed(ref s) => println!("borrowed: {}", s),
        Cow::Owned(ref s) => println!("copied: {}", s),
    }

    let b = serde_json::from_str::<Sid0>("{\"cow\":[\"\\u0042\"]}").unwrap();
    match b.cow[0] {
        Cow::Borrowed(ref s) => println!("borrowed: {}", s),
        Cow::Owned(ref s) => println!("copied: {}", s),
    }
}

@sunshowers
Copy link
Author

Ah, thanks! That makes sense!

Would it be worth adding a note somewhere to http://serde.rs talking about Cow?

@dtolnay
Copy link
Member

dtolnay commented May 3, 2017

Good call. It is mentioned in https://serde.rs/borrow.html but I filed serde-rs/serde-rs.github.io#57 to follow up with a more in-depth explanation.

@sunshowers
Copy link
Author

Thanks!

facebook-github-bot pushed a commit to facebookarchive/mononoke that referenced this issue Jan 28, 2020
Summary:
We're deserializing JSON, and some data will be borrowed (if it can be accessed
straight from the JSON), but encoded strings (e.g. in `"foo": "\"quoted\""`)
won't. Using `str` works for the former, but fails for the latter. Using
`String` works for both, but it allocates a String even when we don't need one.
Using `Cow` does the right thing in both cases.

For reference: serde-rs/serde#914

Reviewed By: StanislavGlebik

Differential Revision: D19577791

fbshipit-source-id: a4eee7a9c1d771a2b0760daeaa6bf8dd0c6b8fbb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants