Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirement: New data payload encoding types (general, opaque) #11

Open
krischer opened this issue Jan 3, 2018 · 12 comments
Open

Requirement: New data payload encoding types (general, opaque) #11

krischer opened this issue Jan 3, 2018 · 12 comments

Comments

@krischer
Copy link
Contributor

krischer commented Jan 3, 2018

New data payload encoding types (general, opaque) for example to support other compression techniques (e.g. 32-bit integers, general compressor; 32- bit IEEE floats, general compressor; 64-bit IEEE floats (doubles), general compressor; Opaque data, general compressor).

@chad-earthscope
Copy link
Member

As I remember it, the original motivation of this suggestion was to adopt a compression scheme that is in broad use outside of seismology and therefore take advantage of available libraries and hopefully modern compression advancements.

The ubiquitously used Steim 1 and 2 encodings are a very good balance between compression performance of seismic data, the common pattern of recording continuous data and complexity, as it relates to resource requirements for encoding/decoding and programming. But there are some, in most cases minor, drawbacks to these encodings: we (as a community) must write/maintain all the encoders/decoders, only 32-bit integer data, the rigid 64-byte framing forces wasted space when the frame cannot be filled and Steim 2 cannot encode differences larger than can be represented in 30-bits.

In my opinion the most likely scenario for FDSN adoption of a new compression encoding would be identification of one that addresses as many of those drawbacks as possible, while having similar compression performance (on miniSEED sized payloads) and reasonable complexity. Ideally, something well established and supported.

Also, while it would be convenient to introduce a new compression encoding at the same time as the next generation format, we can do this any time in the future as long as we retain the encoding identification system such as used in blockette 1000 of miniSEED 2.x.

Regarding opaque data, this was in an early requirement to provide an alternative to blockette 2000. There are cases where inserting an opaque payload into a record is handy to take advantage of a miniSEED data transmission or handling system. As I understand it that is why blockette 2000 was originally created, after transmission the miniSEED "wrapper" was discarded. Such use cases will certainly exist, providing a number for such an encoding is much better than someone choosing their own encoding value to get their own data payload included. I would prefer to document the use of an opaque encoding as something to be used transiently and within contained scenarios, i.e. strongly discouraged for use in a long term FDSN repository.

@crotwell
Copy link

crotwell commented Jan 8, 2018

I think we agree on the meaning of this, but just to be sure the phrase "general compressor" is just a placeholder and will not be associated with a encoding type. It just means we have not picked the particular compressor(s) that will be allowed, correct? Otherwise you have the problem of knowing that data is compressed, but not knowing how.

@krischer
Copy link
Contributor Author

krischer commented Jan 29, 2018

Summary

(Please let me know if I missed a point or misunderstood something)

Let's break this down in a couple of separate issues - please vote on:

  1. Retain data encoding specification system as in miniSEED 2.x. (Yes/No)
  2. Allow for an easy integration of additional data encodings without changes to the core definition. (Yes/No).
  3. Actively investigate alternative encodings. (Urgent/Not Urgent)
  4. Explicitly allow an "opaque" data encoding type. (Yes/No)
  5. Clearly state that any opaque data should not be exported by data centers and should be considered a transient transport mechanism in contained scenarios. (Yes/No)

@crotwell
Copy link

On 1, I assume yes means we keep the general idea of mapping from numbers to encoding types, and keeping the currently defined numbers, but NGF may deprecate unused encodings. Basically keep primitive types and the steims (any others?).

Maybe 5 should be rephrased as "should not be exported by a data center". Network operators can use this to transmit proprietary data from a station into their datacenter, but it would not be part of the public, general-use archive/request system.

1 yes
2 yes
3 not urgent (assuming 2 is yes)
4 yes (with limitations on public use, ie 5)
5 yes

@chad-earthscope
Copy link
Member

  1. Yes, with a number of encodings (e.g. DWWSSN) marked as deprecated.
  2. Yes.
  3. Yes, not urgent.
  4. Yes
  5. Yes

@krischer
Copy link
Contributor Author

On 1, I assume yes means we keep the general idea of mapping from numbers to encoding types, and keeping the currently defined numbers, but NGF may deprecate unused encodings. Basically keep primitive types and the steims (any others?).

Yes. We'll also have to expand this a bit to for example define the byte order for the integer + IEEE's float encodings.

Maybe 5 should be rephrased as "should not be exported by a data center". Network operators can use this to transmit proprietary data from a station into their datacenter, but it would not be part of the public, general-use archive/request system.

Done. I assume this does not change @chad-iris vote.

@kaestli
Copy link

kaestli commented Jan 30, 2018

  1. Yes, retain such a system
  2. Yes (define a procedure to adopt new codes)
  3. Yes, (urgency will rather come from non-fdsn communities)
  4. Yes
  5. Yes

@ozym
Copy link

ozym commented Jan 30, 2018

  1. Yes.
  2. Yes.
  3. Yes, not urgent but would allow taking good advantage of variable length blocks if they are voted in.
  4. Yes.
  5. Yes.

@claudiodsf
Copy link

  1. Retain data encoding specification system as in miniSEED 2.x. (Yes/No)

Yes. With marking obsolete encodings as deprecated (as per @chad-iris)

  1. Allow for an easy integration of additional data encodings without changes to the core definition. (Yes/No).

Yes

  1. Actively investigate alternative encodings. (Urgent/Not Urgent)

Not urgent, but consider as soon as possible IEEE formats

  1. Explicitly allow an "opaque" data encoding type. (Yes/No)

Yes

  1. Clearly state that any opaque data should not be exported by data centers and should be considered a transient transport mechanism in contained scenarios. (Yes/No)

Yes

@ihenson-bsl
Copy link

  1. Yes
  2. Yes
  3. Yes
  4. Yes
  5. Yes

@ValleeMartin
Copy link

  1. Yes
  2. Yes
  3. Not urgent
  4. Yes
  5. Yes

1 similar comment
@JoseAntonioJara
Copy link

  1. Yes
  2. Yes
  3. Not urgent
  4. Yes
  5. Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants