Overhaul saved process output #138

kylewlacy · 2024-11-03T10:42:33Z

Closes #136

Okay... this PR kind of grew out of control. From a high level, there are 3 major parts that make up this PR:

Implement a new format for saving process stdout/stderr along with extra metadata. This fully replaces the stdout.txt and stderr.txt files that were used before
Add new brioche jobs logs command. This parses the format from (1) and prints it in a human-readable format
Implement custom readers / writers to compress and decompress the files from (1). The custom implementation here enabled extra features for (2) that wouldn't have been possible with the pre-existing Rust zstd implementations.

Anyway, here's more details on each of these:

1. New format for saving process outputs

Currently, when we bake a process, we save the process's stdout to a file named stdout.txt, and the stderr to a file named stderr.txt. This is all well and good, but saving the output in this way is pretty lossy, as #136 describes.

To fix this, I came up with a small, pcap-style format, which records a stream of process "events", including a JSON description of the process when it starts, each write to stdout/stderr, and the process's exit code. The events also include pretty detailed timing info: the first event includes a start timestamp, and the following events all include an elapsed duration from this initial timestamp. The actual event serializing is pretty boring, but one interesting thing is that event lengths are recorded at both the start and end of the event, which allows seeking forwards and backwards through each event (assuming the underlying reader supports seeking).

2. The new `brioche jobs logs` command

Since events are now written in a bespoke binary format, we need a way to decode those events. brioche jobs logs does just that: it takes a file path to the events file (still printed on failure, replacing the stdout.txt and stderr.txt paths that were printed previously), and pretty-prints each event. Here's an example of the output:

process stack trace:
- file:///home/kyle/Development/Brioche/brioche-packages/packages/std/core/recipes/process.bri:196:14
- file:///home/kyle/Development/Brioche/brioche-packages/packages/openssl/project.bri:25:6
- file:///home/kyle/Development/Brioche/brioche-packages/packages/curl/project.bri:27:36

[0.00s] [spawned process with pid 253577, preparation took 0.01s]
[0.22s] Configuring OpenSSL version 3.3.1 for target linux-x86_64
...
[13m38s] [process exited with code 0]

The command also has some extra flags:

--reverse: Prints events starting from the end (as mentioned, this can be pretty fast because the underlying event stream includes offsets)
--limit <n>: Outputs at most <n> events. Also works with --reverse, so you can see the last few logs before a process failed!
--follow: Uses the notify crate to support watching as new events are written to the events file, similar to tail -f. This makes it easy to watch the output of a still-running process

The input file can also be - for stdin (which doesn't support --reverse or --follow). I don't think there'd ever be a reason to read from stdin, so this is more of just a nicety

3. Compressing events with zstd

This is where things took a turn. The event stream can obviously get pretty big, especially since there's a huge JSON blob printed at the start! So, I felt it would be valuable to compress the events file when writing to disk.

First, I considered just compressing each event individually. I decided against this, since each individual write to stdout / stderr is recorded separately, with a separate timestamp, meaning each one could be only a few bytes long! Making a "super event" merging multiple events would also be doable, but would make the format itself more complicated.

Compressing the entire event stream with the zstd crate seemed like the obvious choice then. The zstd format doesn't natively support seeking within a compressed stream, so that would mean the --reverse option would need to decompress the entire file first. Not the end of the world, but not ideal.

Then, I found out that there's a seekable format for zstd (TL;DR: it chunks the the file into fixed-size frames, then writes a "seek table" in a section at the end that gets ignored by the normal decompressor). It's currently not implemented by the zstd crate itself (see gyscos/zstd-rs#272), but the zstd-seekable crate provides low-level bindings. However, this has the downside that it no longer works with partially-complete files, which means --follow basically wouldn't work!

So here's what I came up with:

When brioche jobs logs gets a filename, check for magic bytes for either the uncompressed events format or a compressed zstd stream
- If uncompressed, use it without decompression
- If it has the zstd magic bytes
  - Try opening it with a custom reader that uses zstd-seekable (which will fail if there's no seek table at the end)
  - Fallback to opening it with a custom reader that uses zstd

The zstd-seekable-based reader is called ZstdSeekableDecoder, and is a pretty direct mapping to the low-level zstd functions.

The fallback zstd reader is called ZstdLinearDecoder, and it was specially written to account for all the different features needed in brioche jobs logs:

It supports resuming decoding after hitting EOF (needed for --follow)
It keeps a buffer with a (partial) zstd decompressed frame as it goes. Since this is meant to be used with seekable streams (made of 1MB frames), buffering isn't too bad for this use-case
It keeps track of the start and end offsets for each decompressed frame
It implements seeking (needed for --reverse)
- Seeking always succeeds if the target position is within the buffer
- Seeking backwards uses the recorded offsets to seek back to the start of the frame containing the target position, then decompresses until it reaches the target position
- Seeking forward just keeps decoding data until it reaches the target position (with some smarts to jump to the last-known frame, if we had previously seeked backwards)

…ial output

…d content

…time

…table

… starts

…erated

… platforms

kylewlacy added 30 commits October 24, 2024 00:45

Add new process_events module

5ad4f4f

Update process events to limit max length of process output events

cddb769

Update process baking to write process output events

68e3e75

Reorganize process_events module

6b18468

Fix JobOutputContents.pop_contents() not reducing total_bytes

4698df0

Update JobOutputContents to use a VecDeque for contents field

4cd1b84

Update JobOutputContents.pop_contents() to pop the oldest content

6523361

Refactor JobOutputContents into more generic OutputBuffer type

40f756e

Add more tests around OutputBuffer

67a674e

Update OutputBuffer to support unlimited capacity

5bfc8bd

Add OutputBuffer.prepend() method

db46fb6

Add doc comment to OutputBuffer type

ccd3abd

Update OutputBuffer to use BTreeMap again and fix truncating part…

f8e09d5

…ial output

Update OutputBuffer to use a separate buffer for partially prepende…

debf287

…d content

Clean up OutputBuffer

8eca224

Rename OutputBuffer.pop_contents() to .pop_front()

e181e38

Add OutputBuffer.pop_back() method

e1af6b6

Fix panic in ProcessEventReader.try_read_fill()

58c5f23

Add ProcessEventReader.skip_next_event() and .skip_previous_event()

3f5f55e

Add new process_events::display module to print process events

44aa7f3

Implement brioche inspect-process subcommand

97db4cc

Rename inspect-process subcommand to jobs log

468700c

Tweak process event display to show timestamps relative to the spawn …

33d5953

…time

Add help text for jobs subcommand

edce491

Update ProcessEventReader to use synchronous I/O

f36bf86

Add custom zstd encoder/decoder types based on zstd-seekable

a635932

Update process events to use seekable zstd encoder/decoder

c94cc35

Add test case to validate ZstdSeekableDecoder requires the seeking …

2511da1

…table

Add new ReadTracker helper type

4c97033

Add ZstdLinearDecoder type

549837b

kylewlacy added 15 commits November 2, 2024 02:58

Add test for resuming ZstdLinearDecoder after EOF

1816a66

Update ZstdLinearDecoder to support seeking

e34e01b

Add TrySeek trait and NotSeekable type

9ec9927

Update ZstdLinearDecoder to support seeking for non-seekable types

4c5a988

Update ZstdLinearDecoder to remove extra output buffer

f9d0ae5

Update utils::zstd module with comments

21d335a

Impl std::io::Seek on NotSeekable type and remove TrySeek trait

0768e6d

Add ZstdSmartDecoder type to pick between seekable/linear decoders

553b2f1

Tweak ZstdLinearDecoder to keep last decoded frame until next frame…

7e50887

… starts

Update brioche jobs logs to use ZstdSmartDecoder

6860b0d

Update brioche jobs logs to support compressed or decompressed events

3b0dde4

Add brioche jobs logs --follow flag

f1a3e3b

Update .gitattributes to mark *.proptest-regressions files as gen…

51689eb

…erated

Update sandbox::ExitStatus to better support more exotic exit statuses

26103c4

Tweak process_events tests to avoid failure for macOS / unsupported…

669c2f9

… platforms

kylewlacy merged commit 59fb443 into main Nov 4, 2024
5 checks passed

kylewlacy deleted the capture-more-detailed-process-output branch November 4, 2024 08:57

kylewlacy mentioned this pull request Nov 17, 2024

Minor tweaks and improvements when recording process events #139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul saved process output #138

Overhaul saved process output #138

kylewlacy commented Nov 3, 2024

Overhaul saved process output #138

Overhaul saved process output #138

Conversation

kylewlacy commented Nov 3, 2024

1. New format for saving process outputs

2. The new brioche jobs logs command

3. Compressing events with zstd

2. The new `brioche jobs logs` command