Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul saved process output #138

Merged
merged 45 commits into from
Nov 4, 2024
Merged

Conversation

kylewlacy
Copy link
Member

Closes #136

Okay... this PR kind of grew out of control. From a high level, there are 3 major parts that make up this PR:

  1. Implement a new format for saving process stdout/stderr along with extra metadata. This fully replaces the stdout.txt and stderr.txt files that were used before
  2. Add new brioche jobs logs command. This parses the format from (1) and prints it in a human-readable format
  3. Implement custom readers / writers to compress and decompress the files from (1). The custom implementation here enabled extra features for (2) that wouldn't have been possible with the pre-existing Rust zstd implementations.

Anyway, here's more details on each of these:


1. New format for saving process outputs

Currently, when we bake a process, we save the process's stdout to a file named stdout.txt, and the stderr to a file named stderr.txt. This is all well and good, but saving the output in this way is pretty lossy, as #136 describes.

To fix this, I came up with a small, pcap-style format, which records a stream of process "events", including a JSON description of the process when it starts, each write to stdout/stderr, and the process's exit code. The events also include pretty detailed timing info: the first event includes a start timestamp, and the following events all include an elapsed duration from this initial timestamp. The actual event serializing is pretty boring, but one interesting thing is that event lengths are recorded at both the start and end of the event, which allows seeking forwards and backwards through each event (assuming the underlying reader supports seeking).

2. The new brioche jobs logs command

Since events are now written in a bespoke binary format, we need a way to decode those events. brioche jobs logs does just that: it takes a file path to the events file (still printed on failure, replacing the stdout.txt and stderr.txt paths that were printed previously), and pretty-prints each event. Here's an example of the output:

process stack trace:
- file:///home/kyle/Development/Brioche/brioche-packages/packages/std/core/recipes/process.bri:196:14
- file:///home/kyle/Development/Brioche/brioche-packages/packages/openssl/project.bri:25:6
- file:///home/kyle/Development/Brioche/brioche-packages/packages/curl/project.bri:27:36

[0.00s] [spawned process with pid 253577, preparation took 0.01s]
[0.22s] Configuring OpenSSL version 3.3.1 for target linux-x86_64
...
[13m38s] [process exited with code 0]

The command also has some extra flags:

  • --reverse: Prints events starting from the end (as mentioned, this can be pretty fast because the underlying event stream includes offsets)
  • --limit <n>: Outputs at most <n> events. Also works with --reverse, so you can see the last few logs before a process failed!
  • --follow: Uses the notify crate to support watching as new events are written to the events file, similar to tail -f. This makes it easy to watch the output of a still-running process

The input file can also be - for stdin (which doesn't support --reverse or --follow). I don't think there'd ever be a reason to read from stdin, so this is more of just a nicety

3. Compressing events with zstd

This is where things took a turn. The event stream can obviously get pretty big, especially since there's a huge JSON blob printed at the start! So, I felt it would be valuable to compress the events file when writing to disk.

First, I considered just compressing each event individually. I decided against this, since each individual write to stdout / stderr is recorded separately, with a separate timestamp, meaning each one could be only a few bytes long! Making a "super event" merging multiple events would also be doable, but would make the format itself more complicated.

Compressing the entire event stream with the zstd crate seemed like the obvious choice then. The zstd format doesn't natively support seeking within a compressed stream, so that would mean the --reverse option would need to decompress the entire file first. Not the end of the world, but not ideal.

Then, I found out that there's a seekable format for zstd (TL;DR: it chunks the the file into fixed-size frames, then writes a "seek table" in a section at the end that gets ignored by the normal decompressor). It's currently not implemented by the zstd crate itself (see gyscos/zstd-rs#272), but the zstd-seekable crate provides low-level bindings. However, this has the downside that it no longer works with partially-complete files, which means --follow basically wouldn't work!

So here's what I came up with:

  • When brioche jobs logs gets a filename, check for magic bytes for either the uncompressed events format or a compressed zstd stream
    • If uncompressed, use it without decompression
    • If it has the zstd magic bytes
      • Try opening it with a custom reader that uses zstd-seekable (which will fail if there's no seek table at the end)
      • Fallback to opening it with a custom reader that uses zstd

The zstd-seekable-based reader is called ZstdSeekableDecoder, and is a pretty direct mapping to the low-level zstd functions.

The fallback zstd reader is called ZstdLinearDecoder, and it was specially written to account for all the different features needed in brioche jobs logs:

  • It supports resuming decoding after hitting EOF (needed for --follow)
  • It keeps a buffer with a (partial) zstd decompressed frame as it goes. Since this is meant to be used with seekable streams (made of 1MB frames), buffering isn't too bad for this use-case
  • It keeps track of the start and end offsets for each decompressed frame
  • It implements seeking (needed for --reverse)
    • Seeking always succeeds if the target position is within the buffer
    • Seeking backwards uses the recorded offsets to seek back to the start of the frame containing the target position, then decompresses until it reaches the target position
    • Seeking forward just keeps decoding data until it reaches the target position (with some smarts to jump to the last-known frame, if we had previously seeked backwards)

@kylewlacy kylewlacy merged commit 59fb443 into main Nov 4, 2024
5 checks passed
@kylewlacy kylewlacy deleted the capture-more-detailed-process-output branch November 4, 2024 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Capture more detailed process output
1 participant