[RFE]: Implement values compression for text and blob types #218

vladzcloudius · 2024-07-19T01:59:40Z

Description

Implement an optional compression of text/ascii/blob cells: this way the DB will have to handle smaller buffers internally.

The compression should only be done for values with sizes above a configurable threshold.

dkropachev · 2024-07-22T13:30:17Z

Column-based blobs compression

General idea

If client stores big blobs of data compressing data that goes into that field will reduce operations select/update footprint on both network and server.
Even with network compression turned on (ClusterConfig.Compressor) and server side data compression server still decompress frame and extracts data and compress it back before storing it to disk, which creates additional CPU and memory load, which will be completely avoided in this feature.

Interoperability issues

Data will be readable only by properly configured gocql driver, cqlsh or other drivers won't be able to read it

Possible implementation of serialization/deserialization

1. Global variable + hack into `marshalVarchar`/`unmarshalVarchar`

# marshal.go

type blobCompressor interface {
	Compress([]byte) ([]byte, error)
	Decompress([]byte) ([]byte, error)
}

var MyGlobalCompressor blobCompressor

func unmarshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if MyGlobalCompressor != nil {
		data, err = BlobCompressor.Decompress(data)
		if err != nil {
			return err
		}
	}
	return unmarshalVarcharRaw(info, data, value)
}

func marshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if MyGlobalCompressor != nil {
		data, err = BlobCompressor.Decompress(data)
		if err != nil {
			return err
		}
	}
	return marshalVarcharRaw(info, data, value)
}

Pros:

Easy to use
Easy to implement

Cons:

As dirty as it gets
No control over cluster/session/column, compression is either globaly on or globaly off

2. Option in `ClusterConfig` + hack into `NativeType`

# marshal.go

type blobCompressor interface {
	Compress([]byte) ([]byte, error)
	Decompress([]byte) ([]byte, error)
}

type ClusterConfig struct {
	...
	BlobCompressor blobCompressor
}

type NativeType struct {
	...
	blobCompressor blobCompressor
}

func getCompressor(info TypeInfo) blobCompressor {
        nt, ok := info.(NativeType)
        if !ok {
                return nil
        }
        return nt.blobCompressor
}

func unmarshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if c := getCompressor(info); c != nil {
		data, err = c.Decompress(data)
		if err != nil {
			return err
		}
	}
	return unmarshalVarcharRaw(info, data, value)
}

func marshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if c := getCompressor(info); c != nil {
		data, err = c.Decompress(data)
		if err != nil {
			return err
		}
	}
	return marshalVarcharRaw(info, data, value)
}

Pros:

Easy to use
Less dirty than Global variable

Cons:

No control over column, compression is either on or off for all the columns in the given session.
Polutes NativeType

3. Custom type

type CompressedType struct{
    compressor *Compressor
    val []byte
}

func (ct *CompressedType) MarshalCQL(info TypeInfo) ([]byte, error) {
    return ct.compressor.Compress(ct.val)
}

func (ct *CompressedType) UnmarshalCQL(info TypeInfo, data []byte) (err error) {
   ct.val, err = ct.Decompress(data)
   return err
}

type Compressor struct{
	...
}

type (c Compressor) Blob() CompressedType

# How to use
...
	err = session.Query(
			`INSERT INTO gocql_test.test_blob_compressor (testuuid, testblob) VALUES (?, ?, ?)`,
			TimeUUID(), compressor.Blob().Set([]byte("my value")),
		).Exec()

Pros:

Maximum control over which column is compressed
No driver polution, could be implemented as part of lz4 package
Not dirty at all

Cons;

Usage a bit cumbersom

3. Custom type with global compressor

type CompressedType []byte

func (ct CompressedType) MarshalCQL(info TypeInfo) ([]byte, error) {
    return globalcompressor.Compress(ct)
}

func (ct *CompressedType) UnmarshalCQL(info TypeInfo, data []byte) (err error) {
   *val, err = globalcompressor.Decompress(data)
   return err
}

type Compressor struct{
	...
}

type (c Compressor) Blob() CompressedType

# How to use
...
	err = session.Query(
			`INSERT INTO gocql_test.test_blob_compressor (testuuid, testblob) VALUES (?, ?, ?)`,
			TimeUUID(), compressor.Blob([]byte("my value")),
		).Exec()

Pros:

Maximum control over which column is compressed
No driver pollution, could be implemented as part of lz4 package
Not dirty at all

dkropachev · 2024-07-22T13:31:50Z

@mykaul , @Lorak-mmk , let's move discussion here

dkropachev · 2024-07-22T13:58:24Z

@mykaul, this is continueation of comment

If client stores big blobs of data compressing data that goes into that field will reduce operations select/update footprint on both network and server.

Network - you have client<->server network compression.
Server - the default is already compressing with LZ4.

Both btw use LZ4 (or snappy) by default - which is less suitable for JSON/TEXT blobs (as it has no entropy compression). I think we should use zstd for it. (Another advantage is that I hope one day we'll do ZSTD client<->server as well...)

I saw that network compression is done on whole frame, which makes me think that when it both feature are working, server decompress frame, extracts column data and then compress it back when writing to the sstable. Am I correct on that ?

You definitely need control for it. You need to have a minimum length where it is even reasonable to do it and a percentage where it makes sense to keep it uncompressed.

If we do implement it, we'd want it for multiple drivers, not just gocql.

It's unclear to me how we determine which BLOBS to compress and which not to. JPEG blobs are not an ideal candidate, for example.

Best case if we leave user to decide (Custom type options), alternatively to collect some stats on compression rate and then conclude from it, until stats is available data is compressed if compression rate is not good, write uncompressed.

mykaul · 2024-07-22T14:04:21Z

@mykaul, this is continueation of comment

If client stores big blobs of data compressing data that goes into that field will reduce operations select/update footprint on both network and server.

Network - you have client<->server network compression.
Server - the default is already compressing with LZ4.

Both btw use LZ4 (or snappy) by default - which is less suitable for JSON/TEXT blobs (as it has no entropy compression). I think we should use zstd for it. (Another advantage is that I hope one day we'll do ZSTD client<->server as well...)

I saw that network compression is done on whole frame, which makes me think that when it both feature are working, server decompress frame, extracts column data and then compress it back when writing to the sstable. Am I correct on that ?

Yes

You definitely need control for it. You need to have a minimum length where it is even reasonable to do it and a percentage where it makes sense to keep it uncompressed.

If we do implement it, we'd want it for multiple drivers, not just gocql.

It's unclear to me how we determine which BLOBS to compress and which not to. JPEG blobs are not an ideal candidate, for example.

Best case if we leave user to decide (Custom type options), alternatively to collect some stats on compression rate and then conclude from it, until stats is available data is compressed if compression rate is not good, write uncompressed.

But it can change from workload to workload. What we should do is be able to, at the end of the compression of a specific BLOB, to determine if it passed some threshold or not. If you compressed 1000 bytes to 950 bytes, it's worthless. Don't spend the cycles.
See https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#confval-bluestore_compression_required_ratio (and other parameters) where I'm taking the idea from.

Lorak-mmk · 2024-07-30T09:14:39Z

Column-based blobs compression

General idea

If client stores big blobs of data compressing data that goes into that field will reduce operations select/update footprint on both network and server. Even with network compression turned on (ClusterConfig.Compressor) and server side data compression server still decompress frame and extracts data and compress it back before storing it to disk, which creates additional CPU and memory load, which will be completely avoided in this feature.

Do we have any benchmarks that show that this additional overhead of recompression is significant enough to warrant such a feature?

Interoperability issues

1. Data will be readable only by properly configured `gocql` driver, `cqlsh` or other drivers won't be able to read it

There is also another drawback: the risk of data corruption. The solution relies on prepending some prefix to data, and assumes that this prefix will never occur in real data.
This assumption may at some point stop being true, and then it is difficult to change prefix. It will also create the possibility of DoS attack (if the data is user provided, like a message in some IM) if user prepends the prefix to the data.

Those are quite a big drawbacks, that's why I asked the previous question, because the issue does not explain how the problem is great enough that we are willing to tolerate such a drawback.

Possible implementation of serialization/deserialization

1. Global variable + hack into `marshalVarchar`/`unmarshalVarchar`

# marshal.go

type blobCompressor interface {
	Compress([]byte) ([]byte, error)
	Decompress([]byte) ([]byte, error)
}

var MyGlobalCompressor blobCompressor

func unmarshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if MyGlobalCompressor != nil {
		data, err = BlobCompressor.Decompress(data)
		if err != nil {
			return err
		}
	}
	return unmarshalVarcharRaw(info, data, value)
}

func marshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if MyGlobalCompressor != nil {
		data, err = BlobCompressor.Decompress(data)
		if err != nil {
			return err
		}
	}
	return marshalVarcharRaw(info, data, value)
}

Pros:

1. Easy to use

2. Easy to implement

Cons:

1. As dirty as it gets

2. No control over cluster/session/column, compression is either globaly on or globaly off

2. Option in `ClusterConfig` + hack into `NativeType`

# marshal.go

type blobCompressor interface {
	Compress([]byte) ([]byte, error)
	Decompress([]byte) ([]byte, error)
}

type ClusterConfig struct {
	...
	BlobCompressor blobCompressor
}

type NativeType struct {
	...
	blobCompressor blobCompressor
}

func getCompressor(info TypeInfo) blobCompressor {
        nt, ok := info.(NativeType)
        if !ok {
                return nil
        }
        return nt.blobCompressor
}

func unmarshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if c := getCompressor(info); c != nil {
		data, err = c.Decompress(data)
		if err != nil {
			return err
		}
	}
	return unmarshalVarcharRaw(info, data, value)
}

func marshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if c := getCompressor(info); c != nil {
		data, err = c.Decompress(data)
		if err != nil {
			return err
		}
	}
	return marshalVarcharRaw(info, data, value)
}

Pros:

1. Easy to use

2. Less dirty than `Global variable`

Cons:

1. No control over column, compression is either on or off for all the columns in the given session.

2. Polutes `NativeType`

3. Custom type

type CompressedType struct{
    compressor *Compressor
    val []byte
}

func (ct *CompressedType) MarshalCQL(info TypeInfo) ([]byte, error) {
    return ct.compressor.Compress(ct.val)
}

func (ct *CompressedType) UnmarshalCQL(info TypeInfo, data []byte) (err error) {
   ct.val, err = ct.Decompress(data)
   return err
}

type Compressor struct{
	...
}

type (c Compressor) Blob() CompressedType

# How to use
...
	err = session.Query(
			`INSERT INTO gocql_test.test_blob_compressor (testuuid, testblob) VALUES (?, ?, ?)`,
			TimeUUID(), compressor.Blob().Set([]byte("my value")),
		).Exec()

Pros:

1. Maximum control over which column is compressed

2. No driver polution, could be implemented as part of `lz4` package

3. Not dirty at all

Cons;

1. Usage a bit cumbersom

3. Custom type with global compressor

I assume this was meant to be number 4

type CompressedType []byte

func (ct CompressedType) MarshalCQL(info TypeInfo) ([]byte, error) {
    return globalcompressor.Compress(ct)
}

func (ct *CompressedType) UnmarshalCQL(info TypeInfo, data []byte) (err error) {
   *val, err = globalcompressor.Decompress(data)
   return err
}

type Compressor struct{
	...
}

type (c Compressor) Blob() CompressedType

# How to use
...
	err = session.Query(
			`INSERT INTO gocql_test.test_blob_compressor (testuuid, testblob) VALUES (?, ?, ?)`,
			TimeUUID(), compressor.Blob([]byte("my value")),
		).Exec()

Pros:

1. Maximum control over which column is compressed

2. No driver pollution, could be implemented as part of `lz4` package

3. Not dirty at all

If we decide to implement this, I'd be for option 3 - because of no global variables.

vladzcloudius · 2024-07-30T16:42:46Z

Do we have any benchmarks that show that this additional overhead of recompression is significant enough to warrant such a feature?

@karol-kokoszka scylla struggles working with large cells and this is supposed to be a common ScyllaDB knowledge. When it has to deal with these cells it puts a lot of stress on the seastar memory allocator and this feature is supposed to reduce such stress.

There is also another drawback: the risk of data corruption. The solution relies on prepending some prefix to data, and assumes that this prefix will never occur in real data.

I believe that modern compression algorithms are robust enough to avoid a possibility of such a situation.

Even if the header matches the compressed archive the following attempt of decompressing of a blob which is actually not a compressed archive is going to fail. It definitely will for an ASCII input.

And if a user sends binary blobs as data and wants to prevent a potential collision they can simply disable such a compression.

It will also create the possibility of DoS attack (if the data is user provided, like a message in some IM) if user prepends the prefix to the data.

In use cases that allow such situations one should always compress or not compress, which the driver should allow configuring.

And look what I found: https://java-driver.docs.scylladb.com/stable/manual/core/compression/ (and the corresponding https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/compression/index.html)

It turns out that a Scylla and Datastax Java Drivers already allow similar things.

Lorak-mmk · 2024-07-31T09:56:49Z

Do we have any benchmarks that show that this additional overhead of recompression is significant enough to warrant such a feature?

@karol-kokoszka scylla struggles working with large cells and this is supposed to be a common ScyllaDB knowledge. When it has to deal with these cells it puts a lot of stress on the seastar memory allocator and this feature is supposed to reduce such stress.

Didn't know about this. I assume it's a tribal knowledge scattered around various issues and there is no place I can learn about this?

There is also another drawback: the risk of data corruption. The solution relies on prepending some prefix to data, and assumes that this prefix will never occur in real data.

I believe that modern compression algorithms are robust enough to avoid a possibility of such a situation.

Even if the header matches the compressed archive the following attempt of decompressing of a blob which is actually not a compressed archive is going to fail. It definitely will for an ASCII input.

And if a user sends binary blobs as data and wants to prevent a potential collision they can simply disable such a compression.

It will also create the possibility of DoS attack (if the data is user provided, like a message in some IM) if user prepends the prefix to the data.

In use cases that allow such situations one should always compress or not compress, which the driver should allow configuring.

I think we are talking about different things. Did you see the PR implementing this feature? #221

It uses a prefix (by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size.
Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short.
But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.

We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

And look what I found: https://java-driver.docs.scylladb.com/stable/manual/core/compression/ (and the corresponding https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/compression/index.html)

It turns out that a Scylla and Datastax Java Drivers already allow similar things.

It's not a similar thing, you linked to docs about CQL compression (which I'm pretty sure goCQL already supports).

dkropachev · 2024-07-31T12:08:58Z

I think we are talking about different things. Did you see the PR implementing this feature? #221

It is a draft to quickly test feature out and no good as point of reference.

It uses a prefix (by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size. Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short. But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.

We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

It could be mittigated.
But in general, it is better to support both options.

And look what I found: https://java-driver.docs.scylladb.com/stable/manual/core/compression/ (and the corresponding https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/compression/index.html)
It turns out that a Scylla and Datastax Java Drivers already allow similar things.

It's not a similar thing, you linked to docs about CQL compression (which I'm pretty sure goCQL already supports).

+1

Lorak-mmk · 2024-07-31T12:14:04Z

It uses a prefix (by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size. Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short. But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.
We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

It could be mittigated. But in general, it is better to support both options.

How could it be mitigated?

dkropachev · 2024-07-31T14:03:51Z

It uses a prefix (by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size. Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short. But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.
We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

It could be mittigated. But in general, it is better to support both options.

How could it be mitigated?

There are two cases to address:

Pre-existing data in the table that is not copressed and starts with the prefix
It could be addressed only by user them selfs, we will document to pick prefix wisely and to make sure that existing data in the table does not start with it.
Column data that is being serialized that was not compressed (had not hit compression or size limit) that starts with prefix
Having compressed and not-compressed data prefixed solves this issue.

vladzcloudius · 2024-07-31T14:45:29Z

Do we have any benchmarks that show that this additional overhead of recompression is significant enough to warrant such a feature?

@karol-kokoszka scylla struggles working with large cells and this is supposed to be a common ScyllaDB knowledge. When it has to deal with these cells it puts a lot of stress on the seastar memory allocator and this feature is supposed to reduce such stress.

Didn't know about this. I assume it's a tribal knowledge scattered around various issues and there is no place I can learn about this?

https://opensource.docs.scylladb.com/stable/troubleshooting/large-rows-large-cells-tables.html

There is also another drawback: the risk of data corruption. The solution relies on prepending some prefix to data, and assumes that this prefix will never occur in real data.

I believe that modern compression algorithms are robust enough to avoid a possibility of such a situation.
Even if the header matches the compressed archive the following attempt of decompressing of a blob which is actually not a compressed archive is going to fail. It definitely will for an ASCII input.
And if a user sends binary blobs as data and wants to prevent a potential collision they can simply disable such a compression.

It will also create the possibility of DoS attack (if the data is user provided, like a message in some IM) if user prepends the prefix to the data.

In use cases that allow such situations one should always compress or not compress, which the driver should allow configuring.

I think we are talking about different things. Did you see the PR implementing this feature? #221

It uses a prefix W(by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size. Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short. But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.

You are right. I missed that part.
When we proposed this feature I assumed that you don't need to prepend the compressed chunk with any prefix. Every compression library encodes a corresponding header already. And an attempt to uncompress a not compressed buffer is going to fail fast.

We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

We don't have to - see my comment above. We should be able to both not add any custom prefixes and be able to identify if the blob you received is a compressed one or not.

And look what I found: https://java-driver.docs.scylladb.com/stable/manual/core/compression/ (and the corresponding https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/compression/index.html)
It turns out that a Scylla and Datastax Java Drivers already allow similar things.

It's not a similar thing, you linked to docs about CQL compression (which I'm pretty sure goCQL already supports).

Oh, I assumed that this is the same thing (values compression). My bad.

Lorak-mmk · 2024-07-31T14:59:53Z

https://opensource.docs.scylladb.com/stable/troubleshooting/large-rows-large-cells-tables.html

Thanks!

You are right. I missed that part. When we proposed this feature I assumed that you don't need to prepend the compressed chunk with any prefix. Every compression library encodes a corresponding header already. And an attempt to uncompress a not compressed buffer is going to fail fast.

So you propose to deserialize like follows (pseudocode):

try:
   value = decompress(buffer) // try to decompress
except deserialization_error:
   value = buffer // if it failed assume the value is not compressed

and to serialize all blobs, regardless of length?
I think this could possibly work, as long as there are no compressed blobs stored in the table before enabling the feature - and that's a pretty big if if there is user-provided data.

dkropachev · 2024-07-31T15:11:04Z

https://opensource.docs.scylladb.com/stable/troubleshooting/large-rows-large-cells-tables.html

Thanks!

You are right. I missed that part. When we proposed this feature I assumed that you don't need to prepend the compressed chunk with any prefix. Every compression library encodes a corresponding header already. And an attempt to uncompress a not compressed buffer is going to fail fast.

So you propose to deserialize like follows (pseudocode):
try:
   value = decompress(buffer) // try to decompress
except deserialization_error:
   value = buffer // if it failed assume the value is not compressed
and to serialize all blobs, regardless of length? I think this could possibly work, as long as there are no compressed blobs stored in the table before enabling the feature - and that's a pretty big if if there is user-provided data.

Hasn't it same problem as you described in #218 (comment) ?
If somehow uncompressed data will have prefix that is expected by algorithm, on decompression it will try to decode it, opening a door for a DOS atack.

Lorak-mmk · 2024-07-31T15:19:51Z

Right, Not DoS, but incorrect data would be returned, because code would treat this as raw uncompressed data. It's probably worse :/

dkropachev · 2024-07-31T16:16:07Z

Right, Not DoS, but incorrect data would be returned, because code would treat this as raw uncompressed data. It's probably worse :/

If you forge prefix and header properly and underlying sturctures you can cause excessive memory and cpu consumption which can lead to DoS.

mykaul · 2024-07-31T19:51:08Z

Right, Not DoS, but incorrect data would be returned, because code would treat this as raw uncompressed data. It's probably worse :/

If you forge prefix and header properly and underlying sturctures you can cause excessive memory and cpu consumption which can lead to DoS.

You can limit both.

Lorak-mmk · 2024-07-31T20:16:35Z

If the feature requires careful considerations from the library user and some amount of supporting code in user code because of aforementioned problems, is there a point of implementing it in the driver instead of in application code? User will need modifications and safe guards around it anyway. Method 3, custom type, is something that can easily be implemented by the user, with very limited amount of code.

dkropachev · 2024-08-11T08:42:34Z

If the feature requires careful considerations from the library user and some amount of supporting code in user code because of aforementioned problems, is there a point of implementing it in the driver instead of in application code? User will need modifications and safe guards around it anyway. Method 3, custom type, is something that can easily be implemented by the user, with very limited amount of code.

It allows user to avoid same implementation mistakes we discussed.
To ease feature implementation for a user.
Have it properly tested in our pipelines.

mykaul · 2024-08-11T09:56:02Z

If the feature requires careful considerations from the library user and some amount of supporting code in user code because of aforementioned problems, is there a point of implementing it in the driver instead of in application code? User will need modifications and safe guards around it anyway. Method 3, custom type, is something that can easily be implemented by the user, with very limited amount of code.

It allows user to avoid same implementation mistakes we discussed.

We could add an example compressing a BLOB for all our driver examples.

To ease feature implementation for a user.

See above.

Have it properly tested in our pipelines.

See above.

vladzcloudius added the enhancement label Jul 19, 2024

dkropachev linked a pull request Jul 22, 2024 that will close this issue

Add BlobCompressor to compress blob and text fields on fly #221

Draft

roydahan assigned dkropachev Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE]: Implement values compression for text and blob types #218

[RFE]: Implement values compression for text and blob types #218

vladzcloudius commented Jul 19, 2024

dkropachev commented Jul 22, 2024 •

edited

Loading

dkropachev commented Jul 22, 2024

dkropachev commented Jul 22, 2024 •

edited

Loading

mykaul commented Jul 22, 2024 •

edited

Loading

Lorak-mmk commented Jul 30, 2024

Column-based blobs compression

General idea

Interoperability issues

Possible implementation of serialization/deserialization

1. Global variable + hack into `marshalVarchar`/`unmarshalVarchar`

2. Option in `ClusterConfig` + hack into `NativeType`

3. Custom type

3. Custom type with global compressor

vladzcloudius commented Jul 30, 2024 •

edited

Loading

Lorak-mmk commented Jul 31, 2024

dkropachev commented Jul 31, 2024

Lorak-mmk commented Jul 31, 2024

dkropachev commented Jul 31, 2024

vladzcloudius commented Jul 31, 2024

Lorak-mmk commented Jul 31, 2024

dkropachev commented Jul 31, 2024 •

edited

Loading

Lorak-mmk commented Jul 31, 2024

dkropachev commented Jul 31, 2024 •

edited

Loading

mykaul commented Jul 31, 2024

Lorak-mmk commented Jul 31, 2024 •

edited

Loading

dkropachev commented Aug 11, 2024

mykaul commented Aug 11, 2024

[RFE]: Implement values compression for text and blob types #218

[RFE]: Implement values compression for text and blob types #218

Comments

vladzcloudius commented Jul 19, 2024

Description

dkropachev commented Jul 22, 2024 • edited Loading

Column-based blobs compression

General idea

Interoperability issues

Possible implementation of serialization/deserialization

1. Global variable + hack into marshalVarchar/unmarshalVarchar

2. Option in ClusterConfig + hack into NativeType

3. Custom type

3. Custom type with global compressor

dkropachev commented Jul 22, 2024

dkropachev commented Jul 22, 2024 • edited Loading

mykaul commented Jul 22, 2024 • edited Loading

Lorak-mmk commented Jul 30, 2024

Column-based blobs compression

General idea

Interoperability issues

Possible implementation of serialization/deserialization

1. Global variable + hack into marshalVarchar/unmarshalVarchar

2. Option in ClusterConfig + hack into NativeType

3. Custom type

3. Custom type with global compressor

vladzcloudius commented Jul 30, 2024 • edited Loading

Lorak-mmk commented Jul 31, 2024

dkropachev commented Jul 31, 2024

Lorak-mmk commented Jul 31, 2024

dkropachev commented Jul 31, 2024

vladzcloudius commented Jul 31, 2024

Lorak-mmk commented Jul 31, 2024

dkropachev commented Jul 31, 2024 • edited Loading

Lorak-mmk commented Jul 31, 2024

dkropachev commented Jul 31, 2024 • edited Loading

mykaul commented Jul 31, 2024

Lorak-mmk commented Jul 31, 2024 • edited Loading

dkropachev commented Aug 11, 2024

mykaul commented Aug 11, 2024

dkropachev commented Jul 22, 2024 •

edited

Loading

1. Global variable + hack into `marshalVarchar`/`unmarshalVarchar`

2. Option in `ClusterConfig` + hack into `NativeType`

dkropachev commented Jul 22, 2024 •

edited

Loading

mykaul commented Jul 22, 2024 •

edited

Loading

1. Global variable + hack into `marshalVarchar`/`unmarshalVarchar`

2. Option in `ClusterConfig` + hack into `NativeType`

vladzcloudius commented Jul 30, 2024 •

edited

Loading

dkropachev commented Jul 31, 2024 •

edited

Loading

dkropachev commented Jul 31, 2024 •

edited

Loading

Lorak-mmk commented Jul 31, 2024 •

edited

Loading