Get rid of RackBody

It was there for API compatibility and it is just a subset of OutputEnumerator anyway
julik · Mar 1, 2024 · 69f58bd · 69f58bd
1 parent cc7f9d3
commit 69f58bd
Show file tree

Hide file tree

Showing 13 changed files with 362 additions and 314 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,6 @@
 ## 6.0
 
+* Remove `RackBody` because it is just `OutputEnumerator`. Add a convenience method for Rack response generation.
 * Rebirth as zip_kit
 * Adopt MIT license. The changes from 5.x get grandfathered in. The base for the fork is the 4.x version which was still MIT-licensed.
 * Bump minimum Ruby version to 2.6

diff --git a/README.md b/README.md
@@ -118,17 +118,15 @@ since you do not know how large the compressed data segments are going to be.
 
 ## Send a ZIP from a Rack response
 
-zip_kit provides a `RackBody` object which will yield the binary chunks piece
-by piece, and apply some amount of buffering as well. Make sure to also wrap your `RackBody` in a chunker
+zip_kit provides an `OutputEnumerator` object which will yield the binary chunks piece
+by piece, and apply some amount of buffering as well. Make sure to also wrap your `OutputEnumerator` in a chunker
 by calling `#to_chunked` on it. Return it to your webserver and you will have your ZIP streamed!
-The block that you give to the `RackBody` receive the {ZipKit::Streamer} object and will only
+The block that you give to the `OutputEnumerator` receive the {ZipKit::Streamer} object and will only
 start executing once your response body starts getting iterated over - when actually sending
 the response to the client (unless you are using a buffering Rack webserver, such as Webrick).
 
 ```ruby
-require 'time'
-
-body = ZipKit::RackBody.new do | zip |
+body = ZipKit::OutputEnumerator.new do | zip |
   zip.write_file('mov.mp4') do |sink|
     File.open('mov.mp4', 'rb'){|source| IO.copy_stream(source, sink) }
   end
@@ -137,15 +135,8 @@ body = ZipKit::RackBody.new do | zip |
   end
 end
 
-headers = {
-  "Last-Modified" => Time.now.httpdate, # disables Rack::ETag
-  "Content-Type" => "application/zip",
-  "Content-Encoding" => "identity", # disables Rack::Deflater
-  "Transfer-Encoding" => "chunked",
-  "X-Accel-Buffering" => "no" # disables buffering in nginx/GCP
-}
-
-[200, headers, body.to_chunked]
+headers, streaming_body = body.to_rack_response_headers_and_body(env)
+[200, headers, streaming_body]
 ```
 
 ## Send a ZIP file of known size, with correct headers
@@ -160,22 +151,15 @@ bytesize = ZipKit::SizeEstimator.estimate do |z|
 end
 
 # Prepare the response body. The block will only be called when the response starts to be written.
-zip_body = ZipKit::RackBody.new do | zip |
+zip_body = ZipKit::OutputEnumerator.new do | zip |
   zip.add_stored_entry(filename: "myfile1.bin", size: 9090821, crc32: 12485)
   zip << read_file('myfile1.bin')
   zip.add_stored_entry(filename: "myfile2.bin", size: 458678, crc32: 89568)
   zip << read_file('myfile2.bin')
 end
 
-headers = {
-  "Last-Modified" => Time.now.httpdate, # disables Rack::ETag
-  "Content-Type" => "application/zip",
-  "Content-Encoding" => "identity", # disables Rack::Deflater
-  "Content-Length" => bytesize.to_s,
-  "X-Accel-Buffering" => "no" # disables buffering in nginx/GCP
-}
-
-[200, headers, zip_body]
+headers, streaming_body = body.to_rack_response_headers_and_body(env, content_length: bytesize)
+[200, headers, streaming_body]
 ```
 
 ## Writing ZIP files using the Streamer bypass

diff --git a/examples/rack_application.rb b/examples/rack_application.rb
@@ -36,7 +36,7 @@ def call(env)
 
     # Create a suitable Rack response body, that will support each(),
     # close() and all the other methods. We can then return it up the stack.
-    zip_response_body = ZipKit::Streamer.output_enum do |zip|
+    zip_response_body = ZipKit::OutputEnumerator.new do |zip|
       # We are adding only one file to the ZIP here, but you could do that
       # with an arbitrary number of files of course.
       zip.add_stored_entry(filename: filename, size: f.size, crc32: crc32)

diff --git a/lib/zip_kit.rb b/lib/zip_kit.rb
@@ -1,7 +1,7 @@
 # frozen_string_literal: true
 
 module ZipKit
-  autoload :RackBody, File.dirname(__FILE__) + "/zip_kit/rack_body.rb"
+  autoload :OutputEnumerator, File.dirname(__FILE__) + "/zip_kit/rack_body.rb"
   autoload :RailsStreaming, File.dirname(__FILE__) + "/zip_kit/rails_streaming.rb"
   autoload :ZipWriter, File.dirname(__FILE__) + "/zip_kit/zip_writer.rb"
   autoload :RemoteIO, File.dirname(__FILE__) + "/zip_kit/remote_io.rb"
@@ -19,4 +19,6 @@ module ZipKit
   autoload :BlockWrite, File.dirname(__FILE__) + "/zip_kit/block_write.rb"
   autoload :WriteBuffer, File.dirname(__FILE__) + "/zip_kit/write_buffer.rb"
   autoload :WriteShovel, File.dirname(__FILE__) + "/zip_kit/write_shovel.rb"
+  autoload :RackChunkedBody, File.dirname(__FILE__) + "/zip_kit/rack_chunked_body.rb"
+  autoload :RackTempfileBody, File.dirname(__FILE__) + "/zip_kit/rack_tempfile_body.rb"
 end
diff --git a/lib/zip_kit/output_enumerator.rb b/lib/zip_kit/output_enumerator.rb
@@ -1,15 +1,71 @@
 # frozen_string_literal: true
 
+require "time" # for .httpdate
+
 # The output enumerator makes it possible to "pull" from a ZipKit streamer
 # object instead of having it "push" writes to you. It will "stash" the block which
 # writes the ZIP archive through the streamer, and when you call `each` on the Enumerator
 # it will yield you the bytes the block writes. Since it is an enumerator you can
 # use `next` to take chunks written by the ZipKit streamer one by one. It can be very
 # convenient when you need to segment your ZIP output into bigger chunks for, say,
 # uploading them to a cloud storage provider such as S3.
+#
+# Another use of the `OutputEnumerator` is as a Rack response body - since a Rack
+# response body object must support `#each` yielding successive binary strings.
+# Which is exactly what `OutputEnumerator` does.
+#
+# The enumerator can provide you some more conveinences for HTTP output - correct streaming
+# headers and a body with chunked transfer encoding.
+#
+#     iterable_zip_body = ZipKit::OutputEnumerator.new do | streamer |
+#       streamer.write_file('big.csv') do |sink|
+#         CSV(sink) do |csv_writer|
+#           csv_writer << Person.column_names
+#           Person.all.find_each do |person|
+#             csv_writer << person.attributes.values
+#           end
+#         end
+#       end
+#     end
+#
+# Either as a `Transfer-Encoding: chunked` response (if your webserver supports it),
+# which will give you true streaming capability:
+#
+#     headers, chunked_or_presized_rack_body = iterable_zip_body.to_headers_and_rack_response_body(env)
+#     [200, headers, chunked_or_presized_rack_body]
+#
+# or it will wrap your output in a `TempfileBody` object which buffers the ZIP before output. Buffering has
+# benefits if your webserver does not support anything beyound HTTP/1.0, and also engages automatically
+# in unit tests (since rack-test and Rails tests do not do streaming HTTP/1.1).
 class ZipKit::OutputEnumerator
   DEFAULT_WRITE_BUFFER_SIZE = 64 * 1024
-  # Creates a new OutputEnumerator.
+
+  # Creates a new OutputEnumerator enumerator. The enumerator can be read from using `each`,
+  # and the creation of the ZIP is in lockstep with the caller calling `each` on the returned
+  # output enumerator object. This can be used when the calling program wants to stream the
+  # output of the ZIP archive and throttle that output, or split it into chunks, or use it
+  # as a generator.
+  #
+  # For example:
+  #
+  #     # The block given to {output_enum} won't be executed immediately - rather it
+  #     # will only start to execute when the caller starts to read from the output
+  #     # by calling `each`
+  #     body = ::ZipKit::OutputEnumerator.new(writer: CustomWriter) do |streamer|
+  #       streamer.add_stored_entry(filename: 'large.tif', size: 1289894, crc32: 198210)
+  #       streamer << large_file.read(1024*1024) until large_file.eof?
+  #       ...
+  #     end
+  #
+  #     body.each do |bin_string|
+  #       # Send the output somewhere, buffer it in a file etc.
+  #       # The block passed into `initialize` will only start executing once `#each`
+  #       # is called
+  #       ...
+  #     end
+  #
+  # @param kwargs_for_new [Hash] keyword arguments for {Streamer.new}
+  # @return [ZipKit::OutputEnumerator] the enumerator you can read bytestrings of the ZIP from by calling `each`
   #
   # @param streamer_options[Hash] options for Streamer, see {ZipKit::Streamer.new}
   # @param write_buffer_size[Integer] By default all ZipKit writes are unbuffered. For output to sockets
@@ -46,4 +102,49 @@ def each
       enum_for(:each)
     end
   end
+
+  # Returns a tuple of `headers, body` - headers are a `Hash` and the body is
+  # an object that can be used as a Rack response body. The method will automatically
+  # switch the wrapping of the output depending on whether the response can be pre-sized,
+  # and whether your downstream webserver (like nginx) is configured to support
+  # the HTTP/1.1 protocol version.
+  #
+  # @param rack_env[Hash] the Rack env, which the method may need to mutate (adding a Tempfile for cleanup)
+  # @param content_length[Integer] the amount of bytes that the archive will contain. If given, no Chunked encoding gets applied.
+  # @return [Array]
+  def to_headers_and_rack_response_body(rack_env, content_length: nil)
+    headers = {
+      # We need to ensure Rack::ETag does not suddenly start buffering us, see
+      # https://github.com/rack/rack/issues/1619#issuecomment-606315714
+      # Set this even when not streaming for consistency. The fact that there would be
+      # a weak ETag generated would mean that the middleware buffers, so we have tests for that.
+      "Last-Modified" => Time.now.httpdate,
+      # Make sure Rack::Deflater does not touch our response body either, see
+      # https://github.com/felixbuenemann/xlsxtream/issues/14#issuecomment-529569548
+      "Content-Encoding" => "identity",
+      # Disable buffering for both nginx and Google Load Balancer, see
+      # https://cloud.google.com/appengine/docs/flexible/how-requests-are-handled?tab=python#x-accel-buffering
+      "X-Accel-Buffering" => "no"
+    }
+
+    if content_length
+      # If we know the size of the body, transfer encoding is not required at all - so the enumerator itself
+      # can function as the Rack body. This also would apply in HTTP/2 contexts where chunked encoding would
+      # no longer be required - then the enumerator could get returned "bare".
+      body = self
+      headers["Content-Length"] = content_length.to_i.to_s
+    elsif rack_env["HTTP_VERSION"] == "HTTP/1.0"
+      # Check for the proxy configuration first. This is the first common misconfiguration which destroys streaming -
+      # since HTTP 1.0 does not support chunked responses we need to revert to buffering. The issue though is that
+      # this reversion happens silently and it is usually not clear at all why streaming does not work. So let's at
+      # the very least print it to the Rails log.
+      body = ZipKit::RackTempfileBody.new(rack_env, self)
+      headers["Content-Length"] = body.size.to_s
+    else
+      body = ZipKit::RackChunkedBody.new(self)
+      headers["Transfer-Encoding"] = "chunked"
+    end
+
+    [headers, body]
+  end
 end
diff --git a/lib/zip_kit/rack_body.rb b/lib/zip_kit/rack_body.rb
diff --git a/lib/zip_kit/rack_chunked_body.rb b/lib/zip_kit/rack_chunked_body.rb
@@ -0,0 +1,30 @@
+# A body wrapper that emits chunked responses, creating valid
+# Transfer-Encoding::Chunked HTTP response body. This is copied from Rack::Chunked::Body,
+# because Rack is not going to include that class after version 3.x
+# Rails has a substitute class for this inside ActionController::Streaming,
+# but that module is a private constant in the Rails codebase, and is thus
+# considered "private" from the Rails standpoint. It is not that much code to
+# carry, so we copy it into our code.
+class ZipKit::RackChunkedBody
+  TERM = "\r\n"
+  TAIL = "0#{TERM}"
+
+  # @param body[#each] the enumerable that yields bytes, usually a `OutputEnumerator`
+  def initialize(body)
+    @body = body
+  end
+
+  # For each string yielded by the response body, yield
+  # the element in chunked encoding - and finish off with a terminator
+  def each
+    term = TERM
+    @body.each do |chunk|
+      size = chunk.bytesize
+      next if size == 0
+
+      yield [size.to_s(16), term, chunk.b, term].join
+    end
+    yield TAIL
+    yield term
+  end
+end