Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bit of statistic of using this package #6

Open
AlexanderMatveev opened this issue Feb 27, 2023 · 1 comment
Open

A bit of statistic of using this package #6

AlexanderMatveev opened this issue Feb 27, 2023 · 1 comment

Comments

@AlexanderMatveev
Copy link

AlexanderMatveev commented Feb 27, 2023

I have parsed about 200K midi files with this package and want to share some stats saying that about 35% of the files were not parsed.

I'm writing this because all files have been pre-checked with a mime-type check from the file's header.
I could attach examples of files that could not be parsed for one reason or another. Manual sampling showed that they are normal, they are tapped and loaded in music software.

Here are the statistics on parsing errors. An empty value means that there are no errors, this is a successful parse.

 count  |                          status                          
--------+-------------------------------------------------------------
 128420 | 
  69659 | unexpected EOF
   7017 | unexpected data content - Expected track chunk ID MTrk, got (...)
    122 | error parsing TimeSignature - unexpected data content (2)
     86 | runtime error: index out of range [1] with length 0
     52 | couldn't read full var length text
      9 | MIDI Channel Prefix event error - unexpected data content
      4 | Time Signature length not 2 as expected but 0
      4 | format not supported - expected header size to be 6, was 10
      3 | Time Signature length not 2 as expected but 3
      2 | Set Tempo event error - unexpected data content
      2 | Time Signature length not 2 as expected but 4
      2 | error parsing TimeSignature - unexpected data content (10)
      2 | error parsing TimeSignature - unexpected data content (5)
      2 | runtime error: integer divide by zero
      1 | error parsing TimeSignature - unexpected data content (9)
      1 | error parsing SMPTE Offset - unexpected data content (84)
      1 | error parsing SMPTE Offset - unexpected data content (2371)
      1 | EOF
      1 | error parsing TimeSignature - unexpected data content (0)
      1 | Time Signature length not 2 as expected but 3141
(21 rows)

Also, I would take a look on two specific cases. First, a panic in the runtime (caught by recovery):

86 | runtime error: index out of range [1] with length 0

And there was a case when Decode() took indefinitely, I fixed this by wrapping with timeout through channels, but on current midi-files database I can't reproduce this so there is no statistic for this:

func DecodeWithTimeout(d *md.Decoder) error {
	result := make(chan error, 1)
	go func() {
		defer func() {
			if err := recover(); err != nil {
				result <- err.(error)
			}
		}()
		result <- d.Decode()
	}()
	select {
	case <-time.After(5 * time.Second):
		return errors.New("timed out")
	case err := <-result:
		return err
	}
}
@mattetti
Copy link
Member

having some examples of files that fail to parse would be super useful so we can understand what causes the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants