-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul saved process output #138
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #136
Okay... this PR kind of grew out of control. From a high level, there are 3 major parts that make up this PR:
stdout.txt
andstderr.txt
files that were used beforebrioche jobs logs
command. This parses the format from (1) and prints it in a human-readable formatAnyway, here's more details on each of these:
1. New format for saving process outputs
Currently, when we bake a process, we save the process's stdout to a file named
stdout.txt
, and the stderr to a file namedstderr.txt
. This is all well and good, but saving the output in this way is pretty lossy, as #136 describes.To fix this, I came up with a small, pcap-style format, which records a stream of process "events", including a JSON description of the process when it starts, each write to stdout/stderr, and the process's exit code. The events also include pretty detailed timing info: the first event includes a start timestamp, and the following events all include an elapsed duration from this initial timestamp. The actual event serializing is pretty boring, but one interesting thing is that event lengths are recorded at both the start and end of the event, which allows seeking forwards and backwards through each event (assuming the underlying reader supports seeking).
2. The new
brioche jobs logs
commandSince events are now written in a bespoke binary format, we need a way to decode those events.
brioche jobs logs
does just that: it takes a file path to the events file (still printed on failure, replacing thestdout.txt
andstderr.txt
paths that were printed previously), and pretty-prints each event. Here's an example of the output:The command also has some extra flags:
--reverse
: Prints events starting from the end (as mentioned, this can be pretty fast because the underlying event stream includes offsets)--limit <n>
: Outputs at most<n>
events. Also works with--reverse
, so you can see the last few logs before a process failed!--follow
: Uses thenotify
crate to support watching as new events are written to the events file, similar totail -f
. This makes it easy to watch the output of a still-running processThe input file can also be
-
for stdin (which doesn't support--reverse
or--follow
). I don't think there'd ever be a reason to read from stdin, so this is more of just a nicety3. Compressing events with zstd
This is where things took a turn. The event stream can obviously get pretty big, especially since there's a huge JSON blob printed at the start! So, I felt it would be valuable to compress the events file when writing to disk.
First, I considered just compressing each event individually. I decided against this, since each individual write to stdout / stderr is recorded separately, with a separate timestamp, meaning each one could be only a few bytes long! Making a "super event" merging multiple events would also be doable, but would make the format itself more complicated.
Compressing the entire event stream with the
zstd
crate seemed like the obvious choice then. The zstd format doesn't natively support seeking within a compressed stream, so that would mean the--reverse
option would need to decompress the entire file first. Not the end of the world, but not ideal.Then, I found out that there's a seekable format for zstd (TL;DR: it chunks the the file into fixed-size frames, then writes a "seek table" in a section at the end that gets ignored by the normal decompressor). It's currently not implemented by the
zstd
crate itself (see gyscos/zstd-rs#272), but thezstd-seekable
crate provides low-level bindings. However, this has the downside that it no longer works with partially-complete files, which means--follow
basically wouldn't work!So here's what I came up with:
brioche jobs logs
gets a filename, check for magic bytes for either the uncompressed events format or a compressed zstd streamzstd-seekable
(which will fail if there's no seek table at the end)zstd
The
zstd-seekable
-based reader is calledZstdSeekableDecoder
, and is a pretty direct mapping to the low-level zstd functions.The fallback zstd reader is called
ZstdLinearDecoder
, and it was specially written to account for all the different features needed inbrioche jobs logs
:--follow
)--reverse
)