Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Deprecate RPC binary strings #9422

Open
hinto-janai opened this issue Aug 5, 2024 · 8 comments
Open

[Proposal] Deprecate RPC binary strings #9422

hinto-janai opened this issue Aug 5, 2024 · 8 comments

Comments

@hinto-janai
Copy link
Contributor

hinto-janai commented Aug 5, 2024

What

This is a proposal to deprecate binary string usage in monerod's daemon RPC interface.

Binary strings are JSON strings that appear in monerod's daemon RPC output that contain raw binary data.

There are 2 JSON-RPC methods whose responses contain these binary strings:

Example

For example, the output of get_txpool_backlog:

curl \
    http://127.0.0.1:18081/json_rpc \
    -d '{"jsonrpc":"2.0","id":"0","method":"get_txpool_backlog"}' \
    -H 'Content-Type: application/json' \
    --output -

is:

{
  "id": "0",
  "jsonrpc": "2.0",
  "result": {
    "backlog": "\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\r@S\t;\n��������mK\n�������N6\n������",
    "credits": 0,
    "status": "OK",
    "top_hash": "",
    "untrusted": false
  }
}

The backlog field is a JSON string containing raw binary data.

Problem

The main problem with these binary strings is that when represented as a string, they do not always contain valid Unicode or escape characters. This is against the JSON specification.

Any sane JSON parser will reject this mixed format, for example:

jq
# jq: parse error: Invalid escape at line 1, column 418

echo '{"backlog": "�\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\r@S\t;\n��������mK\n�������N6\n������"}' | jq
Python
# Traceback (most recent call last):
#   File "/tmp/a/a.py", line 4, in <module>
#     json.loads(backlog)
#   File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
#     return _default_decoder.decode(s)
#            ^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/usr/lib/python3.12/json/decoder.py", line 337, in decode
#     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
#                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/usr/lib/python3.12/json/decoder.py", line 353, in raw_decode
#     obj, end = self.scan_once(s, idx)
#                ^^^^^^^^^^^^^^^^^^^^^^
# json.decoder.JSONDecodeError: Invalid control character at: line 1 column 15 (char 14)

import json

backlog = '{"backlog": "�\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\r@S\t;\n��������mK\n�������N6\n������"}'
json.loads(backlog)
C++
/// main: rapidjson/document.h:1359:
///     rapidjson::GenericValue<Encoding, Allocator>::MemberIterator
/// 	rapidjson::GenericValue<Encoding, Allocator>::FindMember(const rapidjson::GenericValue<Encoding, SourceAllocator>&)
/// 	[
/// 		with
/// 		    SourceAllocator = rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>;
/// 			Encoding = rapidjson::UTF8<>;
/// 			Allocator = rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>;
/// 			MemberIterator = rapidjson::GenericMemberIterator<
/// 			    false,
/// 				rapidjson::UTF8<>,
/// 				rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>
/// 			>
/// 	]: Assertion `IsObject()' failed.
///
/// Aborted (core dumped)

#include "rapidjson/document.h"

using namespace rapidjson;

int main() {
    const char* json = "{\"backlog\": \"\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\
r@S\t;\n��������mK\n�������N6\n������\"}";

    Document document;
    document.Parse(json);
    document["backlog"].GetString();
}
Rust
/// thread 'main' panicked at src/main.rs:11:46:
/// called `Result::unwrap()` on an `Err` value: Error("invalid escape", line: 2, column: 54)
/// note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

#[derive(serde::Deserialize, serde::Serialize)]
struct Json {
    backlog: String,
}

const JSON: &str = r#"{"backlog": "�\r��5G\n�������\v��9E\n�������\b`k*6\n������@�8\n������S6\n������jdL\n��������~mE\n�������\b�~*A\n�������\b���\n;\n�������@��B\n�������\b@�6\n������:\r@S\t;\n��������mK\n�������N6\n������"}"#;

fn main() {
    serde_json::from_str::<Json>(JSON).unwrap();
}

Mixing binary and JSON in this manner means that users of these RPC calls must essentially implement a custom JSON parser themselves that can parse this format (including Monero itself).

Replacements

I propose that Monero replaces these binary strings with one of the following options:

  1. Full JSON
  2. Full binary
  3. Hexadecimal encoded binary within JSON

Full JSON

RPC output that have binary strings can be replaced with regular JSON output, for example:

{
  "backlog": [
    {
      "blob_size": 0,
      "fee": 0,
      "time_in_pool": 0
    }
  ]
}

Full JSON could be viable depending on how often these RPC call are used. It may not be if these methods are called frequently.

Full binary

The current JSON-RPC methods can be deprecated in favor of .bin endpoints (e.g. /get_txpool_backlog.bin) that expect full binary rather than the mixed format.

It is worth noting that an un-documented binary version of get_output_distribution is already used by wallet2, in favor of the mixed version.

Hexadecimal encoded binary within JSON

Similar to other methods/endpoints, Monero could encode this binary information in hexadecimal before serialization instead of embedding raw binary, for example:

{
  "backlog": "efbfbd...defbfbd"
}

Steps

If this proposal moves forward, the steps towards completion would be:

  1. Modify monerod's RPC (de)serialization
  2. Update any call-sites (e.g. wallet2)
  3. Update monero-site documentation
@kayabaNerve
Copy link

The custom binary string is encoded how exactly? I'd guess as raw binary, yet then it'd have a 1/256 chance of being ", which would mark the termination of its field. Is there some custom encoding to avoid ", or how is it intended to detect the end of the field?

@iamamyth
Copy link

iamamyth commented Aug 7, 2024

If memory serves me, other endpoints previously had this behavior and were converted to "[h]exadecimal encoded binary within JSON". So, that should probably be done with these endpoints.

@hinto-janai
Copy link
Contributor Author

The custom binary string is encoded how exactly?

For get_txpool_backlog, raw binary.

For get_output_distribution, depending on if the binary and compress flags are used:

if (this_ref.binary)
{
if (is_store)
{
if (this_ref.compress)
{
const_cast<std::string&>(this_ref.compressed_data) = compress_integer_array(this_ref.data.distribution);
KV_SERIALIZE(compressed_data)

it'll be raw binary or a compressed format. If {"binary":false} is given as a parameter, get_output_distribution will output a JSON array of the bytes (not a distribution object), although this is undocumented and not the default.

1/256 chance of being "

The bytes can be any Unicode code point (valid or not) instead of just ASCII, so I believe the chances would be much lower than this.

how is it intended to detect the end of the field?

I can't assert how binary string deserialization works.

@iamamyth
Copy link

iamamyth commented Aug 7, 2024

For get_output_distribution, depending on if the binary and compress flags are used:

Then it would seem get_output_distribution may be one of the endpoints I mentioned which was subsequently fixed: The binary flag lets all new callers request and receive valid data, but the default maintains backwards compatibility (as it should) by returning the possibly broken JSON. It might make sense to eventually eliminate the broken JSON option here by introducing a new version of the endpoint and establishing a deprecation timeline.

@j-berman
Copy link
Collaborator

j-berman commented Aug 7, 2024

I propose that Monero replaces these binary strings with one of the following options:

  1. Full JSON
  2. Full binary
  3. Hexadecimal encoded binary within JSON

How about following this model:

A new get_txpool_backlog_v2 endpoint returns full JSON by default, and hexadecimal encoded binary within JSON if request param as_hex is set to true.

get_txpool_backlog.bin returns full binary.

get_txpool_backlog can be set to be deprecated on a timeline of 12 months

@hinto-janai
Copy link
Contributor Author

Sounds good but is there a reason to add as_hex alongside a binary version? I would assume most users would use the JSON version and Monero would use .bin internally.

Would the same model apply to get_output_distribution? Make get_output_distribution_v2 output JSON, leave the current /get_output_distribution.bin as is, deprecate get_output_distribution in 12 months?

@j-berman
Copy link
Collaborator

j-berman commented Aug 8, 2024

Sounds good but is there a reason to add as_hex alongside a binary version?

Agree, would seem like a nice to have if someone requests it, not a required feature, especially considering the .bin endpoint.

Would the same model apply to get_output_distribution? Make get_output_distribution_v2 output JSON, leave the current /get_output_distribution.bin as is, deprecate get_output_distribution in 12 months?

Seems a reasonable approach to me

@hinto-janai
Copy link
Contributor Author

Update: there's an draft impl in Cuprate/cuprate#349 although after re-reading monero-project/meta#1053 I remembered that these endpoints will be changing/deprecated with FCMP++ so Cuprate will be awaiting on changes in monerod.

Consensus in that meeting seemed to be in favor of something like #9422 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants