You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.
Compilation times have become significantly worse due to the fact that the worker loads all processors, which by their very nature all depend on different crates, ballooning the total number of dependencies to 400+ crates.
The server doesn't do anything with these processors, other than needing to know their type definitions so that it knows how to expose the processor configurations via GraphQL.
I think it makes sense to start splitting all of this up, but I haven't yet come to a good design.
The worker and server depend on the processors for different reasons:
server depends on all processors to know their GraphQL input types
worker depends on all processors to run them
Aside from that, both server and worker need typed information about the database, as the server uses it to push new jobs, and fetch job status based on GraphQL requests, and the worker fetches pending jobs and pushes job updates to the database.
So if we were to split all that up into separate crates, so that they don't have to be re-compiled all the time, you'd have to end up with something like:
server – GraphQL API
storage – types related to storing and fetching data
worker – run jobs
processor-types – a set of processor type signatures
processor-shell-command/.../... – all the different processor implementations
In such as situation, the server would depend on storage and processor-types, but not on all the actual processor implementations, and also not on the worker.
The worker would depend on everything except the server itself.
This is still not great, as now creating a new processor involves not just writing one, but also adding it to the list of processor types in another crate, making it less easy to create a processor and add it to your set-up (which isn't possible right now either, since processors are compiled into the server/worker binaries).
Another thought I had was to actually change processors to become binaries, and have the server and worker communicate over RPC, which would solve most of these issues (and would allow processors to not be built in Rust), but would add an extra layer of complexity to the cross-binary communication, it would also reduce type safety.
One way to do that would be to have something like this:
when starting the server, you pass in a set of strings, representing the processor binaries you want to "enable"
on start-up, the server runs these binaries with some kind of signature argument to ask each processor for their type signature
the server then uses this type signature to configure GraphQL (however, the current GraphQL library we use doesn't support dynamic schemas), and to serialise the data before storing it in the database
When the worker needs a processor to do some work, it runs its binary with the correct data passed in.
this would already work reasonably easily, since we're only passing in JSON-serialised data to the processors as input, and get a string (or error) as output back, which translates well to passing in some JSON-formatted string to the binary, getting back an exit code + string output.
Still, it's quite some work, and there are still some gaps (such as dynamic schemas).
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Compilation times have become significantly worse due to the fact that the worker loads all processors, which by their very nature all depend on different crates, ballooning the total number of dependencies to 400+ crates.
The server doesn't do anything with these processors, other than needing to know their type definitions so that it knows how to expose the processor configurations via GraphQL.
I think it makes sense to start splitting all of this up, but I haven't yet come to a good design.
The worker and server depend on the processors for different reasons:
Aside from that, both server and worker need typed information about the database, as the server uses it to push new jobs, and fetch job status based on GraphQL requests, and the worker fetches pending jobs and pushes job updates to the database.
So if we were to split all that up into separate crates, so that they don't have to be re-compiled all the time, you'd have to end up with something like:
server
– GraphQL APIstorage
– types related to storing and fetching dataworker
– run jobsprocessor-types
– a set of processor type signaturesprocessor-shell-command/.../...
– all the different processor implementationsIn such as situation, the server would depend on storage and processor-types, but not on all the actual processor implementations, and also not on the worker.
The worker would depend on everything except the server itself.
This is still not great, as now creating a new processor involves not just writing one, but also adding it to the list of processor types in another crate, making it less easy to create a processor and add it to your set-up (which isn't possible right now either, since processors are compiled into the server/worker binaries).
Another thought I had was to actually change processors to become binaries, and have the server and worker communicate over RPC, which would solve most of these issues (and would allow processors to not be built in Rust), but would add an extra layer of complexity to the cross-binary communication, it would also reduce type safety.
One way to do that would be to have something like this:
signature
argument to ask each processor for their type signatureStill, it's quite some work, and there are still some gaps (such as dynamic schemas).
The text was updated successfully, but these errors were encountered: