Skip to content

Commit

Permalink
Get rid of RackBody
Browse files Browse the repository at this point in the history
It was there for API compatibility and it is just a subset of OutputEnumerator anyway
  • Loading branch information
julik committed Mar 1, 2024
1 parent cc7f9d3 commit 69f58bd
Show file tree
Hide file tree
Showing 13 changed files with 362 additions and 314 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
## 6.0

* Remove `RackBody` because it is just `OutputEnumerator`. Add a convenience method for Rack response generation.
* Rebirth as zip_kit
* Adopt MIT license. The changes from 5.x get grandfathered in. The base for the fork is the 4.x version which was still MIT-licensed.
* Bump minimum Ruby version to 2.6
Expand Down
34 changes: 9 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,17 +118,15 @@ since you do not know how large the compressed data segments are going to be.

## Send a ZIP from a Rack response

zip_kit provides a `RackBody` object which will yield the binary chunks piece
by piece, and apply some amount of buffering as well. Make sure to also wrap your `RackBody` in a chunker
zip_kit provides an `OutputEnumerator` object which will yield the binary chunks piece
by piece, and apply some amount of buffering as well. Make sure to also wrap your `OutputEnumerator` in a chunker
by calling `#to_chunked` on it. Return it to your webserver and you will have your ZIP streamed!
The block that you give to the `RackBody` receive the {ZipKit::Streamer} object and will only
The block that you give to the `OutputEnumerator` receive the {ZipKit::Streamer} object and will only
start executing once your response body starts getting iterated over - when actually sending
the response to the client (unless you are using a buffering Rack webserver, such as Webrick).

```ruby
require 'time'

body = ZipKit::RackBody.new do | zip |
body = ZipKit::OutputEnumerator.new do | zip |
zip.write_file('mov.mp4') do |sink|
File.open('mov.mp4', 'rb'){|source| IO.copy_stream(source, sink) }
end
Expand All @@ -137,15 +135,8 @@ body = ZipKit::RackBody.new do | zip |
end
end

headers = {
"Last-Modified" => Time.now.httpdate, # disables Rack::ETag
"Content-Type" => "application/zip",
"Content-Encoding" => "identity", # disables Rack::Deflater
"Transfer-Encoding" => "chunked",
"X-Accel-Buffering" => "no" # disables buffering in nginx/GCP
}

[200, headers, body.to_chunked]
headers, streaming_body = body.to_rack_response_headers_and_body(env)
[200, headers, streaming_body]
```

## Send a ZIP file of known size, with correct headers
Expand All @@ -160,22 +151,15 @@ bytesize = ZipKit::SizeEstimator.estimate do |z|
end

# Prepare the response body. The block will only be called when the response starts to be written.
zip_body = ZipKit::RackBody.new do | zip |
zip_body = ZipKit::OutputEnumerator.new do | zip |
zip.add_stored_entry(filename: "myfile1.bin", size: 9090821, crc32: 12485)
zip << read_file('myfile1.bin')
zip.add_stored_entry(filename: "myfile2.bin", size: 458678, crc32: 89568)
zip << read_file('myfile2.bin')
end

headers = {
"Last-Modified" => Time.now.httpdate, # disables Rack::ETag
"Content-Type" => "application/zip",
"Content-Encoding" => "identity", # disables Rack::Deflater
"Content-Length" => bytesize.to_s,
"X-Accel-Buffering" => "no" # disables buffering in nginx/GCP
}

[200, headers, zip_body]
headers, streaming_body = body.to_rack_response_headers_and_body(env, content_length: bytesize)
[200, headers, streaming_body]
```

## Writing ZIP files using the Streamer bypass
Expand Down
2 changes: 1 addition & 1 deletion examples/rack_application.rb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def call(env)

# Create a suitable Rack response body, that will support each(),
# close() and all the other methods. We can then return it up the stack.
zip_response_body = ZipKit::Streamer.output_enum do |zip|
zip_response_body = ZipKit::OutputEnumerator.new do |zip|
# We are adding only one file to the ZIP here, but you could do that
# with an arbitrary number of files of course.
zip.add_stored_entry(filename: filename, size: f.size, crc32: crc32)
Expand Down
4 changes: 3 additions & 1 deletion lib/zip_kit.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module ZipKit
autoload :RackBody, File.dirname(__FILE__) + "/zip_kit/rack_body.rb"
autoload :OutputEnumerator, File.dirname(__FILE__) + "/zip_kit/rack_body.rb"
autoload :RailsStreaming, File.dirname(__FILE__) + "/zip_kit/rails_streaming.rb"
autoload :ZipWriter, File.dirname(__FILE__) + "/zip_kit/zip_writer.rb"
autoload :RemoteIO, File.dirname(__FILE__) + "/zip_kit/remote_io.rb"
Expand All @@ -19,4 +19,6 @@ module ZipKit
autoload :BlockWrite, File.dirname(__FILE__) + "/zip_kit/block_write.rb"
autoload :WriteBuffer, File.dirname(__FILE__) + "/zip_kit/write_buffer.rb"
autoload :WriteShovel, File.dirname(__FILE__) + "/zip_kit/write_shovel.rb"
autoload :RackChunkedBody, File.dirname(__FILE__) + "/zip_kit/rack_chunked_body.rb"
autoload :RackTempfileBody, File.dirname(__FILE__) + "/zip_kit/rack_tempfile_body.rb"
end
103 changes: 102 additions & 1 deletion lib/zip_kit/output_enumerator.rb
Original file line number Diff line number Diff line change
@@ -1,15 +1,71 @@
# frozen_string_literal: true

require "time" # for .httpdate

# The output enumerator makes it possible to "pull" from a ZipKit streamer
# object instead of having it "push" writes to you. It will "stash" the block which
# writes the ZIP archive through the streamer, and when you call `each` on the Enumerator
# it will yield you the bytes the block writes. Since it is an enumerator you can
# use `next` to take chunks written by the ZipKit streamer one by one. It can be very
# convenient when you need to segment your ZIP output into bigger chunks for, say,
# uploading them to a cloud storage provider such as S3.
#
# Another use of the `OutputEnumerator` is as a Rack response body - since a Rack
# response body object must support `#each` yielding successive binary strings.
# Which is exactly what `OutputEnumerator` does.
#
# The enumerator can provide you some more conveinences for HTTP output - correct streaming
# headers and a body with chunked transfer encoding.
#
# iterable_zip_body = ZipKit::OutputEnumerator.new do | streamer |
# streamer.write_file('big.csv') do |sink|
# CSV(sink) do |csv_writer|
# csv_writer << Person.column_names
# Person.all.find_each do |person|
# csv_writer << person.attributes.values
# end
# end
# end
# end
#
# Either as a `Transfer-Encoding: chunked` response (if your webserver supports it),
# which will give you true streaming capability:
#
# headers, chunked_or_presized_rack_body = iterable_zip_body.to_headers_and_rack_response_body(env)
# [200, headers, chunked_or_presized_rack_body]
#
# or it will wrap your output in a `TempfileBody` object which buffers the ZIP before output. Buffering has
# benefits if your webserver does not support anything beyound HTTP/1.0, and also engages automatically
# in unit tests (since rack-test and Rails tests do not do streaming HTTP/1.1).
class ZipKit::OutputEnumerator
DEFAULT_WRITE_BUFFER_SIZE = 64 * 1024
# Creates a new OutputEnumerator.

# Creates a new OutputEnumerator enumerator. The enumerator can be read from using `each`,
# and the creation of the ZIP is in lockstep with the caller calling `each` on the returned
# output enumerator object. This can be used when the calling program wants to stream the
# output of the ZIP archive and throttle that output, or split it into chunks, or use it
# as a generator.
#
# For example:
#
# # The block given to {output_enum} won't be executed immediately - rather it
# # will only start to execute when the caller starts to read from the output
# # by calling `each`
# body = ::ZipKit::OutputEnumerator.new(writer: CustomWriter) do |streamer|
# streamer.add_stored_entry(filename: 'large.tif', size: 1289894, crc32: 198210)
# streamer << large_file.read(1024*1024) until large_file.eof?
# ...
# end
#
# body.each do |bin_string|
# # Send the output somewhere, buffer it in a file etc.
# # The block passed into `initialize` will only start executing once `#each`
# # is called
# ...
# end
#
# @param kwargs_for_new [Hash] keyword arguments for {Streamer.new}
# @return [ZipKit::OutputEnumerator] the enumerator you can read bytestrings of the ZIP from by calling `each`
#
# @param streamer_options[Hash] options for Streamer, see {ZipKit::Streamer.new}
# @param write_buffer_size[Integer] By default all ZipKit writes are unbuffered. For output to sockets
Expand Down Expand Up @@ -46,4 +102,49 @@ def each
enum_for(:each)
end
end

# Returns a tuple of `headers, body` - headers are a `Hash` and the body is
# an object that can be used as a Rack response body. The method will automatically
# switch the wrapping of the output depending on whether the response can be pre-sized,
# and whether your downstream webserver (like nginx) is configured to support
# the HTTP/1.1 protocol version.
#
# @param rack_env[Hash] the Rack env, which the method may need to mutate (adding a Tempfile for cleanup)
# @param content_length[Integer] the amount of bytes that the archive will contain. If given, no Chunked encoding gets applied.
# @return [Array]
def to_headers_and_rack_response_body(rack_env, content_length: nil)
headers = {
# We need to ensure Rack::ETag does not suddenly start buffering us, see
# https://github.com/rack/rack/issues/1619#issuecomment-606315714
# Set this even when not streaming for consistency. The fact that there would be
# a weak ETag generated would mean that the middleware buffers, so we have tests for that.
"Last-Modified" => Time.now.httpdate,
# Make sure Rack::Deflater does not touch our response body either, see
# https://github.com/felixbuenemann/xlsxtream/issues/14#issuecomment-529569548
"Content-Encoding" => "identity",
# Disable buffering for both nginx and Google Load Balancer, see
# https://cloud.google.com/appengine/docs/flexible/how-requests-are-handled?tab=python#x-accel-buffering
"X-Accel-Buffering" => "no"
}

if content_length
# If we know the size of the body, transfer encoding is not required at all - so the enumerator itself
# can function as the Rack body. This also would apply in HTTP/2 contexts where chunked encoding would
# no longer be required - then the enumerator could get returned "bare".
body = self
headers["Content-Length"] = content_length.to_i.to_s
elsif rack_env["HTTP_VERSION"] == "HTTP/1.0"
# Check for the proxy configuration first. This is the first common misconfiguration which destroys streaming -
# since HTTP 1.0 does not support chunked responses we need to revert to buffering. The issue though is that
# this reversion happens silently and it is usually not clear at all why streaming does not work. So let's at
# the very least print it to the Rails log.
body = ZipKit::RackTempfileBody.new(rack_env, self)
headers["Content-Length"] = body.size.to_s
else
body = ZipKit::RackChunkedBody.new(self)
headers["Transfer-Encoding"] = "chunked"
end

[headers, body]
end
end
147 changes: 0 additions & 147 deletions lib/zip_kit/rack_body.rb

This file was deleted.

30 changes: 30 additions & 0 deletions lib/zip_kit/rack_chunked_body.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# A body wrapper that emits chunked responses, creating valid
# Transfer-Encoding::Chunked HTTP response body. This is copied from Rack::Chunked::Body,
# because Rack is not going to include that class after version 3.x
# Rails has a substitute class for this inside ActionController::Streaming,
# but that module is a private constant in the Rails codebase, and is thus
# considered "private" from the Rails standpoint. It is not that much code to
# carry, so we copy it into our code.
class ZipKit::RackChunkedBody
TERM = "\r\n"
TAIL = "0#{TERM}"

# @param body[#each] the enumerable that yields bytes, usually a `OutputEnumerator`
def initialize(body)
@body = body
end

# For each string yielded by the response body, yield
# the element in chunked encoding - and finish off with a terminator
def each
term = TERM
@body.each do |chunk|
size = chunk.bytesize
next if size == 0

yield [size.to_s(16), term, chunk.b, term].join
end
yield TAIL
yield term
end
end
Loading

0 comments on commit 69f58bd

Please sign in to comment.