Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demonstrate a way to get rid of the cadence-idl repo #5197

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Apr 7, 2023

What changed

Cadence-server proto type X now no longer conflicts with cadence-client proto type X, because the server package name is implicitly rewritten to a different namespace.

This intentionally allows client and server protobufs to diverge, e.g. when developing a new server API that is not yet ready for client-side use, or when using different code-generator configuration.

Why?

This is a proof-of-concept for solving the issue found in cadence-workflow/cadence-idl#94 by doing the opposite resolution.

There are definitely some things to like about centralizing code-gen... quite a lot actually... but it sometimes causes a lot of unnecessary friction. It requires client and server to update their (client-)APIs in lock-step, even though new APIs may not yet be ready for use, and that might force upgrades that are not yet ready.

It also doesn't accurately represent how RPC works - on-the-wire binary compatibility has literally nothing to do with code compatibility, but this (and gogoproto's global registration in general) ties them together as if they are identical. It also implies that literally all code generation for proto file X is identical to every other - the very fact that their code-gen is configurable makes that unambiguously wrong.

So let's just stop doing that.

Tests

E2E tests exercise both thrift (unaffected) and proto, but I should mechanically assert that grpc client + server, and both client->server and server<->server are tested.

Potential risks

Three of them probably:

  1. google.protobuf.Any, which I do not believe is an issue
  2. grpc/yarpc incompatibilities due to service name leakages
  3. code-gen changes that could require new monkey-patches in the future

google.protobuf.Any

This works by essentially encoding

message Any {
  string type = 1;
  binary contents = 2;
}

where the type value is selected by the creator of the Any when encoding, and checked by the recipient when decoding... which this now changes, so the client-type will not be .Any-compatible with the server-type.

There are simple ways to work around this, e.g. by looking up the type and manually decoding: https://github.com/gogo/protobuf/blob/f67b8970b736e53dbd7d0a27146c8f1ac52f74e5/types/any.go#L127
It's somewhat annoying that it has to be done, but entirely possible. And a reimplementation of ^ that method but with our customized comparison would be utterly trivial and would allow receiving Any essentially normally. Sending an Any from server to client is similarly trivial but slightly manual.

That said, personally I don't think we'll run into this. Any is almost never an accurate modeling of a domain: a field which contains literally any proto type, but not thrift or any other binary blob, and also not just protos we define and control, is... basically nonsense. Particularly for a multi-protocol system like Cadence.
We can get the same capabilities without the limitations by using a custom Any that supports thrift/json/etc, and basically every other scenario has strictly more correct options (true request forwarding, a OneOf, etc).

grpc/yarpc issues

I believe E2E tests show this continues to work.

code-gen changes in the future

Inevitably annoying if this happens, but we pin our generator versions + yarpc's code-gen is very stable in practice. We can probably also convince them to allow modifying server names via config args, as in principle the proto files used have no technical relation whatsoever to how they are used - it's just commonly the same everywhere.

Follow-ups

Since this allows dropping cadence-idl use for server builds, where both types are used, we can also drop it in the client. This will need to be a v2 thing as it changes import paths, but it's not hard to re-introduce the necessary code-gen over there.

When fully complete though, this will turn into:

  • when you are the server contacting the server, use github.com/uber/cadence/... like any other server code
  • when you are using the client library to contact the server, use go.uber.org/cadence/... like any other client code

Which leaves things in a clear state for what depends on what, and at what version.

@Groxx
Copy link
Member Author

Groxx commented Apr 7, 2023

aaaagh dangit:

2023-04-07T04:03:35.128Z	FATAL	loggerimpl/logger.go:143	Start workflow with err	{"error": "code:unimplemented message:unrecognized procedure \"uber.cadence.api.v1.WorkflowAPI::StartWorkflowExecution\" for service \"cadence-frontend\"", "logging-call-at": "client_integration_test.go:235"}

that's probably going to be fatal to this.

@coveralls
Copy link

coveralls commented Apr 7, 2023

Pull Request Test Coverage Report for Build 01875e2f-959c-4c4d-87af-1d7805759bcc

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 38 unchanged lines in 10 files lost coverage.
  • Overall coverage increased (+0.1%) to 57.178%

Files with Coverage Reduction New Missed Lines %
service/history/queue/timer_queue_processor_base.go 1 78.81%
service/history/queue/transfer_queue_processor_base.go 1 77.62%
common/peerprovider/ringpopprovider/config.go 2 81.58%
common/task/weightedRoundRobinTaskScheduler.go 2 89.64%
service/history/task/transfer_standby_task_executor.go 2 87.24%
service/matching/matcher.go 2 91.46%
common/persistence/nosql/nosqlplugin/cassandra/workflow.go 3 60.0%
service/history/execution/mutable_state_task_refresher.go 3 58.86%
common/persistence/nosql/nosqlplugin/cassandra/workflowUtils.go 8 78.16%
common/persistence/nosql/nosqlplugin/cassandra/workflowParsingUtils.go 14 87.16%
Totals Coverage Status
Change from base Build 01874e2d-13b5-4a33-b7c8-2d7da3311e7b: 0.1%
Covered Lines: 85396
Relevant Lines: 149351

💛 - Coveralls

@Groxx
Copy link
Member Author

Groxx commented Apr 7, 2023

well. it requires rewriting both proto and generated code, because yarpc is insufficiently configurable, but it's pretty simple and it does seem to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants