Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set a compression input/output decorator to a SmileFactory #153

Closed
guidomedina opened this issue Nov 27, 2018 · 11 comments
Closed

Comments

@guidomedina
Copy link

guidomedina commented Nov 27, 2018

I have a special need for the riak-java-client which only allows me to use an ObjectMapper to serialize/deserialize key-values, I would like to decorate a SmileFactory with compressors like LZ4, Snappy or GZip but at the moment this is not possible, when I try a mapper like the following:

public static final Charset UTF8=Charset.forName("UTF-8");

public static final ObjectMapper GZIP_JSON_MAPPER=new ObjectMapper(new SmileFactory().disable(ENCODE_BINARY_AS_7BIT)
   .setInputDecorator(new InputDecorator()
   {
     @Override
     public InputStream decorate(IOContext context,InputStream inputStream) throws IOException
     {
       return new GZIPInputStream(inputStream);
     }

     @Override
     public InputStream decorate(IOContext context,byte[] bytes,int offset,int length) throws IOException
     {
       return new GZIPInputStream(new ByteArrayInputStream(bytes,offset,length));
     }

     @Override
     public Reader decorate(IOContext context,Reader reader) throws IOException
     {
       return new InputStreamReader(new GZIPInputStream(new ReaderInputStream(reader)),UTF8);
     }
   })
   .setOutputDecorator(new OutputDecorator()
   {
     @Override
     public OutputStream decorate(IOContext context,OutputStream outputStream) throws IOException
     {
       return new GZIPOutputStream(outputStream);
     }

     @Override
     public Writer decorate(IOContext context,Writer writer) throws IOException
     {
       return new OutputStreamWriter(new GZIPOutputStream(new WriterOutputStream(writer,UTF8)));
     }
   }))
   .disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
   .disable(SerializationFeature.FAIL_ON_EMPTY_BEANS)
   .setSerializationInclusion(JsonInclude.Include.NON_NULL)
   .disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);

This is the exception I get:

Exception in thread "main" java.util.zip.ZipException: Not in GZIP format
	at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
	at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
	at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
	at ...JsonUtils$4.decorate(JsonUtils.java:162)
	at com.fasterxml.jackson.core.JsonFactory._decorate(JsonFactory.java:1459)
	at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:330)
	at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:320)
	at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:29)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3091)

I used Gzip as an example, in reality I'm using both LZ4 and Gzip and both throw exceptions when I try with a SmileFactory, this works perfectly with a JsonFactory, the reason for me to prefer a SmileFactory over a JsonFactory is because it is notice-able faster than the JsonFactory so basically it'll help compensate the price I pay for compression.

@cowtowncoder
Copy link
Member

Thank you for reporting this. I hope to look into this soon -- decoration may not be properly tested for all codec factories, but obviously should work.

@cowtowncoder
Copy link
Member

Ah-ha. Looks like there's "double decoration" for particular parser factory method(s). I'll try to create good regression test here.

@cowtowncoder
Copy link
Member

@guidomedina I found the problem, and it affects createParser() methods that take byte[]: I can fix it for 2.9.8. In the meantime you may want to explicitly construct ByteArrayInputStream, to work around the problem (that factory method does not "double-decorate" as far as I can see).

@cowtowncoder cowtowncoder changed the title [smile] unable to set a compression input/output decorator to a SmileFactory Unable to set a compression input/output decorator to a SmileFactory Nov 29, 2018
@guidomedina
Copy link
Author

Hi @cowtowncoder thanks for the fix, I'll wait for 2.9.8 because I also added converters from one or more mappers to another on the fly for our data so the migration happens seamlessly.

We basically use a mapper based on the content-type to read and write with a target mapper and content-type, for example here is a list of few content-type we are using:

  • application/json; charset=UTF-8
  • application/smile; charset=UTF-8
  • application/compressed-smile; charset=UTF-8 - smile with String value check
  • application/lz4-json; charset=UTF-8
  • application/gzip-json; charset=UTF-8

With this fix now we will be able to add:

  • application/lz4-smile; charset=UTF-8
  • application/gzip-smile; charset=UTF-8

I know this has nothing to do with the issue; I'm just trying to give you another scenario of how your awesome APIs are put to use, keep up the great work you do ;-)

@guidomedina
Copy link
Author

guidomedina commented Nov 30, 2018

@cowtowncoder do you have any ETA for releasing 2.9.8? I have to do some big data migration and I was hoping I could do it during the next weekend, I'm postponing it in order to get the SmileFactory advantage, again; many thanks for your support resolving this one.

Or maybe in the meantime I use the work around but didn't understand from your previous comment how to do it exactly.

@cowtowncoder
Copy link
Member

Hi there! I am quite close to having 2.9.8, but waiting for some other work (related to Java 9+ forward compatibility), but next week may be a stretch from my perspective. Most likely mid-December, before christmas break.

Exciting to hear about usage: I am glad you found this feature useful -- it's one of those things that is bit under-utilized, and this is also why there was that bug (not enough usage to weed out). Funny part is that 3.0 branch (master) had a fix, due to sharing more code across stream factories.

As to workaround, I guess I now realized that if you do not control which createParser / createGenerator call is used by 3rd party, you can't use work-around. You could possibly sub-class SmileFactory and override the method but... that may be tricky, and is not very maintainable.

One other thing is that depending on how you release/deploy, you could perhaps do local build of jackson-dataformat-smile. But that also may not be an option.

@guidomedina
Copy link
Author

guidomedina commented Nov 30, 2018

This is the result with my newly build 2.9.7.5 from 2.9 branch, see how close LZ4 Smile is to Smile time wise and still faster than plain Json, that's what I was talking about:

Json size: 5509kb
Json time: 57.203ms

Smile size: 2120kb
Smile time: 42.102ms

Compressed smile size: 1862kb
Compressed smile time: 39.552ms

LZ4 Json size: 179kb
LZ4 Json time: 64.011ms

Gzip Json size: 431kb
Gzip Json time: 110.644ms

LZ4 Smile size: 384kb
LZ4 Smile time: 46.552ms

Gzip Smile size: 439kb
Gzip Smile time: 84.553ms

@howardem
Copy link

howardem commented Aug 3, 2020

Hi Guido,

I hope you are doing well through this COVID-19 time we are living. We have custom Spring Cloud Stream message converters to handle content negotiation based on the headers of the incoming RabbitMQ messages. LZ4 is one of the compressors we want to use by decorating the ObjectMapper (same thing you did), but couldn't get closer to the numbers you posted. We try 2 different implementations of the algorithm:

  • org.lz4:lz4-java:1.7.1 (compression is better with this library using LZ4 HC)
  • org.apache.commons:commons-compress:1.20

We are using Jackson v2.11.0.

If you don't mind could you share which LZ4 implementation you used?

Best regards,

@guidomedina
Copy link
Author

guidomedina commented Aug 4, 2020

I'm using LZ4 double compression with 32KB block size, also because it is binary and Smile is faster than standard Json when I use some binary compression I'm using Smile straight forward, I'm using:

  • org.lz4:lz4-java:1.7.1
  • com.fasterxml.jackson.dataformat:jackson-dataformat-smile 2.11.1 (or 2.11.2 released yesterday)

Here is the code for LZ4 Json mapper and LZ4 Smile mapper:

import static com.fasterxml.jackson.dataformat.smile.SmileGenerator.Feature.ENCODE_BINARY_AS_7BIT;
import static java.nio.charset.StandardCharsets.UTF_8;
// there are more imports but I'm sure you will be able to figure them out

...
...

  public static final ParameterNamesModule PARAMETER_NAMES_MODULE = new ParameterNamesModule();
  public static final JavaTimeModule JAVA_TIME_MODULE = new JavaTimeModule();
  public static final Jdk8Module JDK_8_MODULE = new Jdk8Module();

  // Just to configure some modules
  public static ObjectMapper configureDefaultObjectMapper(ObjectMapper objectMapper)
  {
    return objectMapper
      .disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
      .disable(SerializationFeature.FAIL_ON_EMPTY_BEANS)
      .setSerializationInclusion(JsonInclude.Include.NON_NULL)
      .disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS)
      .disable(DeserializationFeature.ADJUST_DATES_TO_CONTEXT_TIME_ZONE)
      .registerModule(PARAMETER_NAMES_MODULE)
      .registerModule(JAVA_TIME_MODULE)
      .registerModule(JDK_8_MODULE);
  }

  public static final int LZ4_BLOCK_SIZE = 32 * 1024;

  public static final ObjectMapper LZ4_JSON_MAPPER = configureDefaultObjectMapperForRiak(JsonMapper.builder(new JsonFactoryBuilder()
    .inputDecorator(new InputDecorator()
    {
      @Override
      public InputStream decorate(IOContext context, InputStream inputStream)
      {
        return new LZ4BlockInputStream(new LZ4BlockInputStream(inputStream));
      }

      @Override
      public InputStream decorate(IOContext context, byte[] bytes, int offset, int length)
      {
        return new LZ4BlockInputStream(new LZ4BlockInputStream(new ByteArrayInputStream(bytes, offset, length)));
      }

      @Override
      public Reader decorate(IOContext context, Reader reader)
      {
        return new InputStreamReader(new LZ4BlockInputStream(new LZ4BlockInputStream(new ReaderInputStream(reader, UTF_8))), UTF_8);
      }
    })
    .outputDecorator(new OutputDecorator()
    {
      @Override
      public OutputStream decorate(IOContext context, OutputStream outputStream)
      {
        return new LZ4BlockOutputStream(new LZ4BlockOutputStream(outputStream,
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor()),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor());
      }

      @Override
      public Writer decorate(IOContext context, Writer writer)
      {
        return new OutputStreamWriter(new LZ4BlockOutputStream(new LZ4BlockOutputStream(new WriterOutputStream(writer, UTF_8),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor()),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor())
        );
      }
    }).build()).build());

  public static final ObjectMapper LZ4_SMILE_MAPPER = configureDefaultObjectMapper(JsonMapper.builder(SmileFactory.builder()
    .disable(ENCODE_BINARY_AS_7BIT)
    .inputDecorator(new InputDecorator()
    {
      @Override
      public InputStream decorate(IOContext context, InputStream inputStream)
      {
        return new LZ4BlockInputStream(new LZ4BlockInputStream(inputStream));
      }

      @Override
      public InputStream decorate(IOContext context, byte[] bytes, int offset, int length)
      {
        return new LZ4BlockInputStream(new LZ4BlockInputStream(new ByteArrayInputStream(bytes, offset, length)));
      }

      @Override
      public Reader decorate(IOContext context, Reader reader)
      {
        return new InputStreamReader(new LZ4BlockInputStream(new LZ4BlockInputStream(new ReaderInputStream(reader, UTF_8))), UTF_8);
      }
    })
    .outputDecorator(new OutputDecorator()
    {
      @Override
      public OutputStream decorate(IOContext context, OutputStream outputStream)
      {
        return new LZ4BlockOutputStream(new LZ4BlockOutputStream(outputStream,
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor()),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor());
      }

      @Override
      public Writer decorate(IOContext context, Writer writer)
      {
        return new OutputStreamWriter(new LZ4BlockOutputStream(new LZ4BlockOutputStream(new WriterOutputStream(writer, UTF_8),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor()),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor())
        );
      }
    }).build()).build());

@guidomedina
Copy link
Author

The result will vary, I tried with a Json with many repetitions (an array of objects) which Smile will compress very well before reaching the LZ4 or Gzip compression, also; for Gzip I just used the standard JDK implementation, I kind of avoid using Apache common compressors where possible, I have had bad experience with them.

@howardem
Copy link

howardem commented Aug 4, 2020

Thank so much for posting the code and for your detailed explanation. I'm gonna your configurations and see how it goes!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants