Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement io.StringWriter #32

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

achille-roussel
Copy link

Hello!

This PR adds a WriteString method to the internal digest type to implement the io.StringWriter interface and avoid memory copies when hashing strings.

I added a benchmark and ran it before and after the change to demonstrate that a heap allocation is removed by providing this method:

benchmark                   old ns/op     new ns/op     delta
BenchmarkWriteString/1      46.0          30.8          -33.04%
BenchmarkWriteString/2      46.1          31.8          -31.02%
BenchmarkWriteString/4      43.5          28.4          -34.71%
BenchmarkWriteString/8      42.1          28.9          -31.35%
BenchmarkWriteString/16     48.3          31.3          -35.20%
BenchmarkWriteString/32     54.2          34.9          -35.61%

benchmark                   old MB/s     new MB/s     speedup
BenchmarkWriteString/1      21.75        32.46        1.49x
BenchmarkWriteString/2      43.43        62.89        1.45x
BenchmarkWriteString/4      92.01        140.65       1.53x
BenchmarkWriteString/8      190.10       277.23       1.46x
BenchmarkWriteString/16     331.10       510.92       1.54x
BenchmarkWriteString/32     590.65       916.96       1.55x

benchmark                   old allocs     new allocs     delta
BenchmarkWriteString/1      1              0              -100.00%
BenchmarkWriteString/2      1              0              -100.00%
BenchmarkWriteString/4      1              0              -100.00%
BenchmarkWriteString/8      1              0              -100.00%
BenchmarkWriteString/16     1              0              -100.00%
BenchmarkWriteString/32     1              0              -100.00%

benchmark                   old bytes     new bytes     delta
BenchmarkWriteString/1      8             0             -100.00%
BenchmarkWriteString/2      8             0             -100.00%
BenchmarkWriteString/4      8             0             -100.00%
BenchmarkWriteString/8      8             0             -100.00%
BenchmarkWriteString/16     16            0             -100.00%
BenchmarkWriteString/32     32            0             -100.00%

The use of the unsafe package is a bit unfortunate, but it seemed like the package already depended on it, so this change doesn't introduce a new dependency on unsafe.

Let me know if you'd like to see anything changed!

murmur.go Outdated
// us take over the default behavior of the compiler to have the byte slice
// share the underlying memory buffer of the string and avoid the extra heap
// allocation.
return d.Write(*(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion is unsafe (see golang/go#40701)

Copy link
Author

@achille-roussel achille-roussel Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this case is correct because it is covered by the rules 1, 3, and 6 of unsafe:

(1) Conversion of a *T1 to Pointer to *T2.

(3) Conversion of a Pointer to a uintptr and back, with arithmetic.

(6) Conversion of a reflect.SliceHeader or reflect.StringHeader Data field to or from Pointer.

https://golang.org/pkg/unsafe/#Pointer

Let me know if I'm missing something.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to actually tell and I wish this type of conversion were documented as explicitly safe or unsafe. I took this as violating rule 6 because the slice doesn't refer to an actual underlying slice, the slice is being made up out of thin air from a reflect.SliceHeader. What's more obviously safe to me is when the slice header is referring to an underlying slice beforehand.

It is entirely possible that this is safe, I just haven't trusted it since the six unsafe rules came out (this was my original go to for how to convert strings to slices).

As an aside, I think the 40701 I mentioned in the first ticket refers to other behavior that this is definitely not doing: taking in a byte pointer and making a slice out of that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/jlauinger/go-safer

I believe this is effectively patterns 1 and 2.

Copy link
Author

@achille-roussel achille-roussel Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is pattern 1 and 2, because the conversion to and from unsafe is done in a single expression.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looks like the Go doc may have been updated to explicitly indicate that this conversion may not be safe:

In general, reflect.SliceHeader and reflect.StringHeader should be used only as *reflect.SliceHeader and *reflect.StringHeader pointing at actual slices or strings, never as plain structs. A program should not declare or allocate variables of these struct types.

Here's an alternative approach we could try:
https://github.com/cespare/xxhash/blob/master/xxhash_unsafe.go#L52-L57

// sliceHeader is similar to reflect.SliceHeader, but it assumes that the layout
// of the first two words is the same as the layout of a string.
type sliceHeader struct {
	s   string
	cap int
}

What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like twmb/murmur3@610077f is safer, and afaict it's just as fast.

@achille-roussel
Copy link
Author

Hello @twmb, would you like to get this PR merged or would you prefer not to bring it in?

@twmb
Copy link

twmb commented Jan 22, 2021

I'm not a gate on this repo, I just saw the PR and decided to look since I fixed roughly the same thing in my own fork from way back.

@achille-roussel
Copy link
Author

We're maintaining a fork as well which includes this optimization https://github.com/segmentio/murmur3 (in case it is useful to someone).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants