Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode itu35 closed captioning data #534

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Decode itu35 closed captioning data #534

wants to merge 1 commit into from

Conversation

wader
Copy link
Owner

@wader wader commented Dec 20, 2022

No description provided.

@wader wader changed the title wip itu35 closed captioning data Dec 20, 2022
@wader wader changed the title itu35 closed captioning data Decode itu35 closed captioning data Dec 20, 2022
@wader wader marked this pull request as draft December 20, 2022 11:50
@bbgdzxng1
Copy link

bbgdzxng1 commented Dec 20, 2022

OMG. That is cool as hell. Replying here, rather than on original ticket, as requested.

Even your quick and dirty is genuinely super-cool. I hardly expected a reply, let alone a proof of concept.

[ And I am flattered that you have taken the time to reply. I'm genuinely blown away by fq. It has become the thing I trust, because it doesn't screw with stuff. Over the years, I've tried:

and while they are all worthy projects, fq is the easiest to install and use, albeit once you get your head around jq. I mean, seriously, fq is the neatest thing I've seen in format analysis in the last couple of years. Every other tool hides the structure of where the debug info derives from. If, for example, I want to validate exactly what a tool like FFmpeg is writing to something like the H.264 Annex E Video Usability Index, I have found that fq is the most authoritative, because it tells you exactly where it got that info from. MediaInfo and FFprobe tell you "I interpret it as X", but in the process of presenting the info to the user in a non-technical way, it masques the source of the info. You should be pretty proud of your work, and once someone picks it up to produce a cross-platform GUI wrapper, I hope your work is appreciated by a wider audience. Command-line suits me perfectly, since I like to pipe between utilities.]

Anyway, I digress. And enough flattery. Back on topic.

I'm on Mac, and I have never compiled fq, because you made it too easy for us users to just type brew install fq. But I guess I'll learn.

I have a very cool file with EIA-608 CC1/2/3/4 and EIA-708 Service 1/2, but without giving away what it is publicly, it is a reference file. Perhaps you could add me to a private repo and I can upload it to you, then you can nuke the repo? The file just can't leak out to the public 'net.

Here's a mediainfo output samplesei.mediainfo.json.txt

And here's the raw, untruncated json output from fq, extracted using:

fq --decode avc_annexb '[ .[] | objects | select(.nal_unit_type=="sei") | . ]' "DTVCC.h264" > samplesei.fq.json.txt

Unless there is a better way to produce an SEI Dump? Is there a "standard" way of storing SEI data?

I did run

fq 'first(grep_by(.payload_type=="user_data_registered_itu_t_t35")).data | tobytes' ./DTVCC.mp4 > itu_t_t35_dump

and here is the dump, for what it is worth. I suspect that the first caption may not have much info in there.
itu_t_t35_dump.zip

I have taken a look at your PR, and while I can't code myself, I believe that I can see what you have done. On your journey around various formats out there, you may find that ITU T.35 could end up useful as a generic lookup, irrelevant of Video or Closed Captions, but /formats/mpeg seems like a reasonable play-pen.

I think your externalized naming convention is solid. I'll try to get a sample in all of:

  • H.262/mpeg2 (In MPEG2, Closed Captions are not in SEI, but is instead are transmitted as Picture User Data)
  • H.264/avc
  • H.265/hevc (pretty much the same as H.264)

Anyway, raw, untruncated json output from fq is attached, as well as a comparison with mediainfo.

Meanwhile, I'll try to learn how to compile fq.

@wader
Copy link
Owner Author

wader commented Dec 20, 2022

OMG. That is cool as hell. Replying here, rather than on original ticket, as requested.

👍

Even your quick and dirty is genuinely super-cool. I hardly expected a reply, let alone a proof of concept.

[ And I am flattered that you have taken the time to reply. I'm genuinely blown away by fq. It has become the thing I trust, because it doesn't screw with stuff. Over the years, I've tried:

Thanks for the kind words 😊 it's for me a very interesting and challenging (also frustrating at times :) ) project to work on and it has also been what i used to learn about things that i need for work and for hobby projects.

and while they are all worthy projects, fq is the easiest to install and use, albeit once you get your head around jq. I mean, seriously, fq is the neatest thing I've seen in format analysis in the last couple of years. Every other tool hides the structure of where the debug info derives from. If, for example, I want to validate exactly what a tool like FFmpeg is writing to something like the H.264 Annex E Video Usability Index, I have found that fq is the most authoritative, because it tells you exactly where it got that info from. MediaInfo and FFprobe tell you "I interpret it as X", but in the process of presenting the info to the user in a non-technical way, it masques the source of the info. You should be pretty proud of your work, and once someone picks it up to produce a cross-platform GUI wrapper, I hope your work is appreciated by a wider audience. Command-line suits me perfectly, since I like to pipe between utilities.]

I think you summarized quite well part of the reason fq exist, the other part is just that i'm interested in programming languages (jq especially!) and all kind of text and binary formats and encodings.

Also i work as a software engineer in a team doing ingestion and transcoding services at a quite big media streaming service originating from sweden (start with an s), so we have to deal with lots of broken, problematic and "challenging" media file all day long. So fq is extremely useful :) we of course use ffprobe, mediainfo and lots of other tools also when debugging and trying to understand things, but as you say it's very useful to see exact details. Also with fq you can quite quickly "re-implement" things in jq that ffmpeg does just to verify you understand or see how some heuristics it does works. You can see some snippet on the wiki https://github.com/wader/fq/wiki. Have some vague idea to create a mp4.jq-project with various isobmff, dash etc specific things.

Anyway, I digress. And enough flattery. Back on topic.

😁

I'm on Mac, and I have never compiled fq, because you made it too easy for us users to just type brew install fq. But I guess I'll learn.

Once you have golang installed (maybe also git?) it should be more or less just one command to build and install your own verison.

I have a very cool file with EIA-608 CC1/2/3/4 and EIA-708 Service 1/2, but without giving away what it is publicly, it is a reference file. Perhaps you could add me to a private repo and I can upload it to you, then you can nuke the repo? The file just can't leak out to the public 'net.

Could you put on google drive etc and share a lin privately? my email is mattias.wadmam@gmail.com

Here's a mediainfo output samplesei.mediainfo.json.txt

And here's the raw, untruncated json output from fq, extracted using:

fq --decode avc_annexb '[ .[] | objects | select(.nal_unit_type=="sei") | . ]' "DTVCC.h264" > samplesei.fq.json.txt

Unless there is a better way to produce an SEI Dump? Is there a "standard" way of storing SEI data?

Don't think there is, i guess the most standard would be to produce an annexb stream with only SEI data? something like this might work:

fq -d avc_annexb 'chunk(2)[] | select(.[1].nal_unit_type=="sei") | tobytes' avc_annexb > only_sei

avc_annexb decodes to an array with sync header separate so i use chunk(2) to create an arra with [[sync,nalu], ...] pairs

fq 0.1.0 unfortunately uses truncated base64 strings for binary values (JSON can't safely encode binary data :( ), in the next release it will be changed to just a non-truncated string (stil not safe). But! you can add -o bits_format=base64 and it should work.

I did run

fq 'first(grep_by(.payload_type=="user_data_registered_itu_t_t35")).data | tobytes' ./DTVCC.mp4 > itu_t_t35_dump

and here is the dump, for what it is worth. I suspect that the first caption may not have much info in there. itu_t_t35_dump.zip

Will have a look

I have taken a look at your PR, and while I can't code myself, I believe that I can see what you have done. On your journey around various formats out there, you may find that ITU T.35 could end up useful as a generic lookup, irrelevant of Video or Closed Captions, but /formats/mpeg seems like a reasonable play-pen.

Yeah i try to keep divide into separate formats as much as possible, without making it absurd, as they become reusable and also possible to use with -d or as jq functions.

I'm a bit confused about the relationship between all these standard organizations... is half the work sometimes just to understand all the different aliases and finding where in the spec the good stuff is.

I think your externalized naming convention is solid. I'll try to get a sample in all of:

  • H.262/mpeg2 (In MPEG2, Closed Captions are not in SEI, but is instead are transmitted as Picture User Data)
  • H.264/avc
  • H.265/hevc (pretty much the same as H.264)

Would be great. Ok to add them or part of them as test files to the fq repo?

Anyway, raw, untruncated json output from fq is attached, as well as a comparison with mediainfo.

Meanwhile, I'll try to learn how to compile fq.

👍 let me know how it goes

btw there are some presentations (with video an slides) about fq that might be useful and also shows how i work with https://github.com/wader/fq#presentations

@bbgdzxng1
Copy link

bbgdzxng1 commented Dec 21, 2022

I used Google Drive to share a file to your gmail, containing both and H.262/mpeg2 (A/53 picture user data) and H.264/avc (SCTE-128 sei side data).

Obviously, I'm more interested in H.264, but it will all make sense when you see the files.

I'm a bit confused about the relationship between all these standard organizations... is half the work sometimes just to understand all the different aliases and finding where in the spec the good stuff is.

I do not think you are alone.

Following on from your PR, I did try to do some research on trying to find an authoritative reference for the US scoped itu_t_35_provider_type... It looks like the TIA represent the FCC when dealing with these codes, but I failed to find a published list. From a media perspective, if either SCTE / ATSC / CTA(EIA/CEA) self declare in a standards doc, that is the best we'll get.

So I did promise that I would compile/install and try to replicate. Install was as easy as you said...

$ brew install golang
$ GOPROXY=direct go install github.com/wader/fq@sei-itu-t35
$ "$(go env GOPATH)"/bin/fq -v

Ok, this is really cool... I'm now gonna run your code against this same very simple file containing just 608-CC1

$ ffmpeg -loglevel warning -hide_banner -i "${infile}" -map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' -f 'h264' 'pipe:1' | "$(go env GOPATH)"/bin/fq --decode avc_annexb '[ .[] | objects | select(.nal_unit_type=="sei") | select(.sei.payload_type=="user_data_registered_itu_t_t35") | d ]'

      |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|  sei{}: (avc_sei)
  0x00|04                                             |.               |    payload_type: "user_data_registered_itu_t_t35" (4)
  0x00|   1d                                          | .              |    payload_size: 29
      |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|    data{}: (mpeg_itu_t35)
  0x00|      b5                                       |  .             |      country_code: "United States" (181)
  0x00|         00 31                                 |   .1           |      provider_code: 49
  0x00|               47 41 39 34                     |     GA94       |      user_identifier: "GA94"
      |                                               |                |      user_structure{}:
  0x00|                           03                  |         .      |        user_data_type_code: "CEA-708 captions" (3)
      |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|        user_data_type_structure{}: (mpeg_cc_data)
  0x00|                              46               |          F     |          reserved0: 0
  0x00|                              46               |          F     |          process_cc_data_flag: true
  0x00|                              46               |          F     |          zero_bit: 0
  0x00|                              46               |          F     |          cc_count: 6
  0x00|                                 ff            |           .    |          reserved1: 255
      |                                               |                |          cc[0:6]:
      |                                               |                |            [0]{}: cc
  0x00|                                    fc         |            .   |              one_bit: 1
  0x00|                                    fc         |            .   |              reserved0: 15
  0x00|                                    fc         |            .   |              cc_valid: true
  0x00|                                    fc         |            .   |              cc_type: 0
  0x00|                                       94      |             .  |              cc_data_1: 148
  0x00|                                          2f   |              / |              cc_data_2: 47
      |                                               |                |            [1]{}: cc
  0x00|                                             fc|               .|              one_bit: 1
  0x00|                                             fc|               .|              reserved0: 15
  0x00|                                             fc|               .|              cc_valid: true
  0x00|                                             fc|               .|              cc_type: 0
  0x01|94                                             |.               |              cc_data_1: 148
  0x01|   2f                                          | /              |              cc_data_2: 47
      |                                               |                |            [2]{}: cc
  0x01|      fc                                       |  .             |              one_bit: 1
  0x01|      fc                                       |  .             |              reserved0: 15
  0x01|      fc                                       |  .             |              cc_valid: true
  0x01|      fc                                       |  .             |              cc_type: 0
  0x01|         94                                    |   .            |              cc_data_1: 148
  0x01|            ae                                 |    .           |              cc_data_2: 174
      |                                               |                |            [3]{}: cc
  0x01|               fc                              |     .          |              one_bit: 1
  0x01|               fc                              |     .          |              reserved0: 15
  0x01|               fc                              |     .          |              cc_valid: true
  0x01|               fc                              |     .          |              cc_type: 0
  0x01|                  94                           |      .         |              cc_data_1: 148
  0x01|                     ae                        |       .        |              cc_data_2: 174
      |                                               |                |            [4]{}: cc
  0x01|                        fc                     |        .       |              one_bit: 1
  0x01|                        fc                     |        .       |              reserved0: 15
  0x01|                        fc                     |        .       |              cc_valid: true
  0x01|                        fc                     |        .       |              cc_type: 0
  0x01|                           94                  |         .      |              cc_data_1: 148
  0x01|                              2c               |          ,     |              cc_data_2: 44
      |                                               |                |            [5]{}: cc
  0x01|                                 fc            |           .    |              one_bit: 1
  0x01|                                 fc            |           .    |              reserved0: 15
  0x01|                                 fc            |           .    |              cc_valid: true
  0x01|                                 fc            |           .    |              cc_type: 0
  0x01|                                    94         |            .   |              cc_data_1: 148
  0x01|                                       2c      |             ,  |              cc_data_2: 44
  0x01|                                          ff   |              . |    gap0: raw bits
  0x01|                                             80|               .|    rbsp_trailing_bits: raw bits
0xfb20|         06                                    |   .            |  forbidden_zero_bit: false
0xfb20|         06                                    |   .            |  nal_ref_idc: 0
0xfb20|         06                                    |   .            |  nal_unit_type: "sei" (6) (Supplemental enhancement information)
0xfb20|            04 1d b5 00 31 47 41 39 34 03 46 ff|    ....1GA94.F.|  data: raw bits
0xfb30|fc 94 2f fc 94 2f fc 94 ae fc 94 ae fc 94 2c fc|../../........,.|
0xfb40|94 2c ff 80                                    |.,..            |
       |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|.[745]{}: nalu (avc_nalu)
       |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|  sei{}: (avc_sei)
  0x000|04                                             |.               |    payload_type: "user_data_registered_itu_t_t35" (4)
  0x000|   3b                                          | ;              |    payload_size: 59
       |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|    data{}: (mpeg_itu_t35)
  0x000|      b5                                       |  .             |      country_code: "United States" (181)
  0x000|         00 31                                 |   .1           |      provider_code: 49
  0x000|               47 41 39 34                     |     GA94       |      user_identifier: "GA94"
       |                                               |                |      user_structure{}:
  0x000|                           03                  |         .      |        user_data_type_code: "CEA-708 captions" (3)
       |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|        user_data_type_structure{}: (mpeg_cc_data)
  0x000|                              50               |          P     |          reserved0: 0
  0x000|                              50               |          P     |          process_cc_data_flag: true
  0x000|                              50               |          P     |          zero_bit: 0
  0x000|                              50               |          P     |          cc_count: 16
  0x000|                                 ff            |           .    |          reserved1: 255
       |                                               |                |          cc[0:16]:
       |                                               |                |            [0]{}: cc
  0x000|                                    fc         |            .   |              one_bit: 1
  0x000|                                    fc         |            .   |              reserved0: 15
  0x000|                                    fc         |            .   |              cc_valid: true
  0x000|                                    fc         |            .   |              cc_type: 0
  0x000|                                       94      |             .  |              cc_data_1: 148
  0x000|                                          ae   |              . |              cc_data_2: 174
       |                                               |                |            [1]{}: cc
  0x000|                                             fc|               .|              one_bit: 1
  0x000|                                             fc|               .|              reserved0: 15
  0x000|                                             fc|               .|              cc_valid: true
  0x000|                                             fc|               .|              cc_type: 0
  0x001|94                                             |.               |              cc_data_1: 148
  0x001|   20                                          |                |              cc_data_2: 32
       |                                               |                |            [2]{}: cc
  0x001|      fc                                       |  .             |              one_bit: 1
  0x001|      fc                                       |  .             |              reserved0: 15
  0x001|      fc                                       |  .             |              cc_valid: true
  0x001|      fc                                       |  .             |              cc_type: 0
  0x001|         91                                    |   .            |              cc_data_1: 145
  0x001|            40                                 |    @           |              cc_data_2: 64
       |                                               |                |            [3]{}: cc
  0x001|               fc                              |     .          |              one_bit: 1
  0x001|               fc                              |     .          |              reserved0: 15
  0x001|               fc                              |     .          |              cc_valid: true
  0x001|               fc                              |     .          |              cc_type: 0
  0x001|                  c1                           |      .         |              cc_data_1: 193
  0x001|                     20                        |                |              cc_data_2: 32
       |                                               |                |            [4]{}: cc
  0x001|                        fc                     |        .       |              one_bit: 1
  0x001|                        fc                     |        .       |              reserved0: 15
  0x001|                        fc                     |        .       |              cc_valid: true
  0x001|                        fc                     |        .       |              cc_type: 0
  0x001|                           73                  |         s      |              cc_data_1: 115
  0x001|                              e5               |          .     |              cc_data_2: 229
       |                                               |                |            [5]{}: cc
  0x001|                                 fc            |           .    |              one_bit: 1
  0x001|                                 fc            |           .    |              reserved0: 15
  0x001|                                 fc            |           .    |              cc_valid: true
  0x001|                                 fc            |           .    |              cc_type: 0
  0x001|                                    e3         |            .   |              cc_data_1: 227
  0x001|                                       ef      |             .  |              cc_data_2: 239
       |                                               |                |            [6]{}: cc
  0x001|                                          fc   |              . |              one_bit: 1
  0x001|                                          fc   |              . |              reserved0: 15
  0x001|                                          fc   |              . |              cc_valid: true
  0x001|                                          fc   |              . |              cc_type: 0
  0x001|                                             6e|               n|              cc_data_1: 110
  0x002|64                                             |d               |              cc_data_2: 100
       |                                               |                |            [7]{}: cc
  0x002|   fc                                          | .              |              one_bit: 1
  0x002|   fc                                          | .              |              reserved0: 15
  0x002|   fc                                          | .              |              cc_valid: true
  0x002|   fc                                          | .              |              cc_type: 0
  0x002|      20                                       |                |              cc_data_1: 32
  0x002|         e3                                    |   .            |              cc_data_2: 227
       |                                               |                |            [8]{}: cc
  0x002|            fc                                 |    .           |              one_bit: 1
  0x002|            fc                                 |    .           |              reserved0: 15
  0x002|            fc                                 |    .           |              cc_valid: true
  0x002|            fc                                 |    .           |              cc_type: 0
  0x002|               61                              |     a          |              cc_data_1: 97
  0x002|                  70                           |      p         |              cc_data_2: 112
       |                                               |                |            [9]{}: cc
  0x002|                     fc                        |       .        |              one_bit: 1
  0x002|                     fc                        |       .        |              reserved0: 15
  0x002|                     fc                        |       .        |              cc_valid: true
  0x002|                     fc                        |       .        |              cc_type: 0
  0x002|                        f4                     |        .       |              cc_data_1: 244
  0x002|                           e9                  |         .      |              cc_data_2: 233
       |                                               |                |            [10]{}: cc
  0x002|                              fc               |          .     |              one_bit: 1
  0x002|                              fc               |          .     |              reserved0: 15
  0x002|                              fc               |          .     |              cc_valid: true
  0x002|                              fc               |          .     |              cc_type: 0
  0x002|                                 ef            |           .    |              cc_data_1: 239
  0x002|                                    6e         |            n   |              cc_data_2: 110
       |                                               |                |            [11]{}: cc
  0x002|                                       fc      |             .  |              one_bit: 1
  0x002|                                       fc      |             .  |              reserved0: 15
  0x002|                                       fc      |             .  |              cc_valid: true
  0x002|                                       fc      |             .  |              cc_type: 0
  0x002|                                          20   |                |              cc_data_1: 32
  0x002|                                             e9|               .|              cc_data_2: 233
       |                                               |                |            [12]{}: cc
  0x003|fc                                             |.               |              one_bit: 1
  0x003|fc                                             |.               |              reserved0: 15
  0x003|fc                                             |.               |              cc_valid: true
  0x003|fc                                             |.               |              cc_type: 0
  0x003|   6e                                          | n              |              cc_data_1: 110
  0x003|      20                                       |                |              cc_data_2: 32
       |                                               |                |            [13]{}: cc
  0x003|         fc                                    |   .            |              one_bit: 1
  0x003|         fc                                    |   .            |              reserved0: 15
  0x003|         fc                                    |   .            |              cc_valid: true
  0x003|         fc                                    |   .            |              cc_type: 0
  0x003|            43                                 |    C           |              cc_data_1: 67
  0x003|               43                              |     C          |              cc_data_2: 67
       |                                               |                |            [14]{}: cc
  0x003|                  fc                           |      .         |              one_bit: 1
  0x003|                  fc                           |      .         |              reserved0: 15
  0x003|                  fc                           |      .         |              cc_valid: true
  0x003|                  fc                           |      .         |              cc_type: 0
  0x003|                     31                        |       1        |              cc_data_1: 49
  0x003|                        ae                     |        .       |              cc_data_2: 174
       |                                               |                |            [15]{}: cc
  0x003|                           fc                  |         .      |              one_bit: 1
  0x003|                           fc                  |         .      |              reserved0: 15
  0x003|                           fc                  |         .      |              cc_valid: true
  0x003|                           fc                  |         .      |              cc_type: 0
  0x003|                              94               |          .     |              cc_data_1: 148
  0x003|                                 2f            |           /    |              cc_data_2: 47
  0x003|                                    ff         |            .   |    gap0: raw bits
  0x003|                                       80|     |             .| |    rbsp_trailing_bits: raw bits
0x179f0|                                       06      |             .  |  forbidden_zero_bit: false
0x179f0|                                       06      |             .  |  nal_ref_idc: 0
0x179f0|                                       06      |             .  |  nal_unit_type: "sei" (6) (Supplemental enhancement information)
0x179f0|                                          04 3b|              .;|  data: raw bits

OMG... That is cool.

Lets try something fancy with a fq query to look at cc_data_1 and cc_data_2 to filter out some of the noise... I'm not very good at jq syntax, so bear with me... But I'm gonna try this on this same very simple file containing just 608-CC1

$ ffmpeg -loglevel warning -hide_banner -i "testsrc.with608captions.mp4" -map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' -f 'h264' 'pipe:1' \
| "$(go env GOPATH)/bin/fq" --decode avc_annexb '[ .[] | objects | select(.nal_unit_type=="sei") | select(.sei.payload_type=="user_data_registered_itu_t_t35") | .sei.data.user_structure.user_data_type_structure.cc[] | .cc_data_1,.cc_data_2 | d ]'

That gives...

0x0|                                       94                  |             .      |.[1].sei.data.user_structure.user_data_type_structure.cc[0].cc_data_1: 148
0x0|                                          2f               |              /     |.[1].sei.data.user_structure.user_data_type_structure.cc[0].cc_data_2: 47
0x00|                                                94         |                .   |.[1].sei.data.user_structure.user_data_type_structure.cc[1].cc_data_1: 148
0x00|                                                   2f      |                 /  |.[1].sei.data.user_structure.user_data_type_structure.cc[1].cc_data_2: 47
0x00|                                                         94|                   .|.[1].sei.data.user_structure.user_data_type_structure.cc[2].cc_data_1: 148
0x14|ae                                                         |.                   |.[1].sei.data.user_structure.user_data_type_structure.cc[2].cc_data_2: 174
0x14|      94                                                   |  .                 |.[1].sei.data.user_structure.user_data_type_structure.cc[3].cc_data_1: 148
0x14|         ae                                                |   .                |.[1].sei.data.user_structure.user_data_type_structure.cc[3].cc_data_2: 174
0x14|               94                                          |     .              |.[1].sei.data.user_structure.user_data_type_structure.cc[4].cc_data_1: 148
0x14|                  2c                                       |      ,             |.[1].sei.data.user_structure.user_data_type_structure.cc[4].cc_data_2: 44
0x14|                        94                                 |        .           |.[1].sei.data.user_structure.user_data_type_structure.cc[5].cc_data_1: 148
0x14|                           2c                              |         ,          |.[1].sei.data.user_structure.user_data_type_structure.cc[5].cc_data_2: 44
0x0|                                       94                  |             .      |.[121].sei.data.user_structure.user_data_type_structure.cc[0].cc_data_1: 148
0x0|                                          ae               |              .     |.[121].sei.data.user_structure.user_data_type_structure.cc[0].cc_data_2: 174
0x00|                                                94         |                .   |.[121].sei.data.user_structure.user_data_type_structure.cc[1].cc_data_1: 148
0x00|                                                   20      |                    |.[121].sei.data.user_structure.user_data_type_structure.cc[1].cc_data_2: 32
0x00|                                                         91|                   .|.[121].sei.data.user_structure.user_data_type_structure.cc[2].cc_data_1: 145
0x14|40                                                         |@                   |.[121].sei.data.user_structure.user_data_type_structure.cc[2].cc_data_2: 64
0x14|      54                                                   |  T                 |.[121].sei.data.user_structure.user_data_type_structure.cc[3].cc_data_1: 84
0x14|         68                                                |   h                |.[121].sei.data.user_structure.user_data_type_structure.cc[3].cc_data_2: 104
0x14|               e9                                          |     .              |.[121].sei.data.user_structure.user_data_type_structure.cc[4].cc_data_1: 233
0x14|                  73                                       |      s             |.[121].sei.data.user_structure.user_data_type_structure.cc[4].cc_data_2: 115
0x14|                        20                                 |                    |.[121].sei.data.user_structure.user_data_type_structure.cc[5].cc_data_1: 32
0x14|                           e9                              |         .          |.[121].sei.data.user_structure.user_data_type_structure.cc[5].cc_data_2: 233
0x14|                                 73                        |           s        |.[121].sei.data.user_structure.user_data_type_structure.cc[6].cc_data_1: 115
0x14|                                    20                     |                    |.[121].sei.data.user_structure.user_data_type_structure.cc[6].cc_data_2: 32
0x14|                                          61               |              a     |.[121].sei.data.user_structure.user_data_type_structure.cc[7].cc_data_1: 97
0x14|                                             20            |                    |.[121].sei.data.user_structure.user_data_type_structure.cc[7].cc_data_2: 32
0x14|                                                   e3      |                 .  |.[121].sei.data.user_structure.user_data_type_structure.cc[8].cc_data_1: 227
0x14|                                                      61   |                  a |.[121].sei.data.user_structure.user_data_type_structure.cc[8].cc_data_2: 97
0x28|70                                                         |p                   |.[121].sei.data.user_structure.user_data_type_structure.cc[9].cc_data_1: 112
0x28|   f4                                                      | .                  |.[121].sei.data.user_structure.user_data_type_structure.cc[9].cc_data_2: 244
0x28|         e9                                                |   .                |.[121].sei.data.user_structure.user_data_type_structure.cc[10].cc_data_1: 233
0x28|            ef                                             |    .               |.[121].sei.data.user_structure.user_data_type_structure.cc[10].cc_data_2: 239
0x28|                  6e                                       |      n             |.[121].sei.data.user_structure.user_data_type_structure.cc[11].cc_data_1: 110
0x28|                     20                                    |                    |.[121].sei.data.user_structure.user_data_type_structure.cc[11].cc_data_2: 32
0x28|                           e9                              |         .          |.[121].sei.data.user_structure.user_data_type_structure.cc[12].cc_data_1: 233
0x28|                              6e                           |          n         |.[121].sei.data.user_structure.user_data_type_structure.cc[12].cc_data_2: 110
0x28|                                    20                     |                    |.[121].sei.data.user_structure.user_data_type_structure.cc[13].cc_data_1: 32
0x28|                                       43                  |             C      |.[121].sei.data.user_structure.user_data_type_structure.cc[13].cc_data_2: 67
0x28|                                             43            |               C    |.[121].sei.data.user_structure.user_data_type_structure.cc[14].cc_data_1: 67
0x28|                                                31         |                1   |.[121].sei.data.user_structure.user_data_type_structure.cc[14].cc_data_2: 49
0x28|                                                      94   |                  . |.[121].sei.data.user_structure.user_data_type_structure.cc[15].cc_data_1: 148
0x28|                                                         2f|                   /|.[121].sei.data.user_structure.user_data_type_structure.cc[15].cc_data_2: 47

You can almost see the magic text This is a caption in CC1 in the third column. 608 data is encoded as 7 bit odd parity with most significant bit used as the parity bit, so when it is decoded as 8 bit, some characters don't just naturally decode back to plain text. Here is the secret message in the above file, by pulling out cc_data_1 and cc_data_2 into 2 byte words. This layout in 2 byte words is common, in fact the de-facto standard for EIA-608 data in text, called the Sonic Scenarist Captioning (SCC) format, is the standard layout for the text-based representation of 608. Any captioneer would recognize this as a decodable payload.

942f 942f 94ae 94ae 942c 942c 94ae 9420 9140 5468 e973 2039 7320 6120 e361 70f4 e9ef 6e20 e96e 2043 4331 942f
                                             T h  i s    i  s    a    c a  p t  i o  n    i n    C  C 1

In terms of the decode table, simple 608 captions are always encoded as 2 byte words, so you could save some vertical real estate by putting cc_data_1 and cc_data_2 on the same line in the decode, because they always go together. So instead of:

   |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13|0123456789abcdef0123|.[1].sei.data.user_structure.user_data_type_structure.cc[0]{}: cc
0x0|                                    fc                     |            .       |  one_bit: 1
0x0|                                    fc                     |            .       |  reserved0: 15
0x0|                                    fc                     |            .       |  cc_valid: true
0x0|                                    fc                     |            .       |  cc_type: 0
0x0|                                       94                  |             .      |  cc_data_1: 148
0x0|                                          2f               |              /     |  cc_data_2: 47

You could do:

   |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13|0123456789abcdef0123|.[1].sei.data.user_structure.user_data_type_structure.cc[0]{}: cc
0x0|                                    fc                     |            .       |  one_bit: 1
0x0|                                    fc                     |            .       |  reserved0: 15
0x0|                                    fc                     |            .       |  cc_valid: true
0x0|                                    fc                     |            .       |  cc_type: 0
0x0|                                       942f                |             .      |  cc_data_1: 148, cc_data_2: 47

That is so cool - and while this file contains only 608 CC1 data in the DTVCC payload (the 708 Service1 format is more complex than the legacy 608 format).

I do need to learn how to do 7 bit parity with jq and work out how to group stuff together in 2-byte words when I'm using fq in json format, but I'm blown away that your work actually creates a decode trace of 608 closed captions.

But congrats, man!!! You just wrote a EIA-608 CC1 closed captioning analyzer, at least for a simple CC1 file!!! I sent you that really complex file containing CC1/2/3/4 and Service1/2.

One suggestion... Now I see it with my own eyes, I think user_data_type_code: "CEA-708 captions" would make more sense to be listed in the table as user_data_type_code: "DTVCC Captions", since although the standard is EIA-708, the EIA-708 standard contains both 608 (the simple CC1/2/3/4 stuff) and 708 (which is a more complex Service [1...N] payload).

I'm still in awe of your work. fq gets cooler each week.

I'll not bug you for more unless you want, but if you want to take this even further into the depths of decoding 608 to text, I would be more than happy to help you, since you have now given me a really cool tool. I want to be respectful of your voluntary and generous time.

But you can now tell your Swedish bosses that your awesome tool could now be used for debugging 608 data in H.264 in HLS and DASH, once the annex_b is extracted with FFmpeg.

My mind is blown.

tak, tak, tak!

@wader
Copy link
Owner Author

wader commented Dec 21, 2022

I used Google Drive to share a file to your gmail, containing both and H.262/mpeg2 (A/53 picture user data) and H.264/avc (SCTE-128 sei side data).

Obviously, I'm more interested in H.264, but it will all make sense when you see the files.

Thanks, not so great was that i had a typo in the email, should be mattias.wadman@gmail.com (now doubled chcked), sorry about that.

Following on from your PR, I did try to do some research on trying to find an authoritative reference for the US scoped itu_t_35_provider_type... It looks like the TIA represent the FCC when dealing with these codes, but I failed to find a published list. From a media perspective, if either SCTE / ATSC / CTA(EIA/CEA) self declare in a standards doc, that is the best we'll get.

Ok thanks, will have look also. I usually try to make the symbolic value lowercase and snake_case (the thing called Sym in the go code) and let the description (Description in code) be something more free form. That is to make jq queries nicer, the "jq value" for a field is the sym if set otherwise the "actual" value (the decoded value). So ex for country_code, do ppl usually see them as just numbers or should be map to "se", "us" etc and let the description be "United states", "Sweden" etc. Hope that makade sense.

So I did promise that I would compile/install and try to replicate. Install was as easy as you said...

$ brew install golang
$ GOPROXY=direct go install github.com/wader/fq@sei-itu-t35
$ "$(go env GOPATH)"/bin/fq -v

Ok, this is really cool... I'm now gonna run your code against this same very simple file containing just 608-CC1

$ ffmpeg -loglevel warning -hide_banner -i "${infile}" -map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' -f 'h264' 'pipe:1' | "$(go env GOPATH)"/bin/fq --decode avc_annexb '[ .[] | objects | select(.nal_unit_type=="sei") | select(.sei.payload_type=="user_data_registered_itu_t_t35") | d ]'
...
OMG... That is cool.

🥳

(btw i'm working on a mpeg_ts decoder but it's very rough currently, can see it here https://github.com/wader/fq/tree/mpeg_ts_wip, build with @mpeg_ts_wip if you want to try)

Lets try something fancy with a fq query to look at cc_data_1 and cc_data_2 to filter out some of the noise... I'm not very good at jq syntax, so bear with me... But I'm gonna try this on this same very simple file containing just 608-CC1

$ ffmpeg -loglevel warning -hide_banner -i "testsrc.with608captions.mp4" -map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' -f 'h264' 'pipe:1' \
| "$(go env GOPATH)/bin/fq" --decode avc_annexb '[ .[] | objects | select(.nal_unit_type=="sei") | select(.sei.payload_type=="user_data_registered_itu_t_t35") | .sei.data.user_structure.user_data_type_structure.cc[] | .cc_data_1,.cc_data_2 | d ]'

That gives...

....

You can almost see the magic text This is a caption in CC1 in the third column. 608 data is encoded as 7 bit odd parity with most significant bit used as the parity bit, so when it is decoded as 8 bit, some characters don't just naturally decode back to plain text. Here is the secret message in the above file, by pulling out cc_data_1 and cc_data_2 into 2 byte words. This layout in 2 byte words is common, in fact the de-facto standard for EIA-608 data in text, called the Sonic Scenarist Captioning (SCC) format, is the standard layout for the text-based representation of 608. Any captioneer would recognize this as a decodable payload.

942f 942f 94ae 94ae 942c 942c 94ae 9420 9140 5468 e973 2039 7320 6120 e361 70f4 e9ef 6e20 e96e 2043 4331 942f
                                             T h  i s    i  s    a    c a  p t  i o  n    i n    C  C 1

Aha will have look at those, think i've seen this before and wondered how it works.

Try this:

ffmpeg -loglevel warning -hide_banner -i ~/Downloads/testsrc.with608captions.ts -map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' -f 'h264' 'pipe:1' | go run .  --decode avc_annexb '[grep_by(.nal_unit_type=="sei" and .sei.payload_type=="user_data_registered_itu_t_t35") | .sei.data.user_structure.user_data_type_structure.cc[] | .cc_data_1,.cc_data_2] | map(band(.;0x7f)) | implode'
"\u0014/\u0014/\u0014.\u0014.\u0014,\u0014,\u0014.\u0014 \u0011@This is a caption in CC1\u0014/\u0014/\u0014/\u0014.\u0014.\u0014,\u0014,\u0014.\u0014 \u0011@A second caption in CC1.\u0014/\u0014/\u0014/\u0014.\u0014.\u0014,\u0014,"

grep_by is very convenient, it's a shorthand for def grep_by(f): .. | select(f)? so it will recursively look for things and ignore errors, but can be a bit slower

Also maybe good to know that d (short for the display function) is a function that directly prints some kind of representation of the input and outputs nothing (same as empty in jq). So it does work do to [123 | d], but it will result in 123 being printed and then the end result will be an empty array. In your example above you should be able to just skip the [ ... ] around.

In terms of the decode table, simple 608 captions are always encoded as 2 byte words, so you could save some vertical real estate by putting cc_data_1 and cc_data_2 on the same line in the decode, because they always go together. So instead of:

...

You could do:

   |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13|0123456789abcdef0123|.[1].sei.data.user_structure.user_data_type_structure.cc[0]{}: cc
0x0|                                    fc                     |            .       |  one_bit: 1
0x0|                                    fc                     |            .       |  reserved0: 15
0x0|                                    fc                     |            .       |  cc_valid: true
0x0|                                    fc                     |            .       |  cc_type: 0
0x0|                                       942f                |             .      |  cc_data_1: 148, cc_data_2: 47

Currently not possible to do that but we will figure something out. Im thinking if this will produce huge decode tree for "real" files maybe it should be optional or maybe thee should be an option how decode the "cc" data.

That is so cool - and while this file contains only 608 CC1 data in the DTVCC payload (the 708 Service1 format is more complex than the legacy 608 format).

I do need to learn how to do 7 bit parity with jq and work out how to group stuff together in 2-byte words when I'm using fq in json format, but I'm blown away that your work actually creates a decode trace of 608 closed captions.

See use of band (bitwise-and) above, that masks bit 7 to zero, you can write the mask as 0b111_1111 etc also.

With some code to keep track of pts for the samples one could nearly write them out as SRT or something :)

But congrats, man!!! You just wrote a EIA-608 CC1 closed captioning analyzer, at least for a simple CC1 file!!! I sent you that really complex file containing CC1/2/3/4 and Service1/2.

One suggestion... Now I see it with my own eyes, I think user_data_type_code: "CEA-708 captions" would make more sense to be listed in the table as user_data_type_code: "DTVCC Captions", since although the standard is EIA-708, the EIA-708 standard contains both 608 (the simple CC1/2/3/4 stuff) and 708 (which is a more complex Service [1...N] payload).

Ah yes will have a look and clean that up, thanks

I'm still in awe of your work. fq gets cooler each week.

Same for me, i'm at awe with what jq can do and how damn well it seems to fit, i know it was nice... but this nice? and the combination to do the bit-streaming decoding in go and use jq for the more flexible and fancy stuff is very nice.

I'll not bug you for more unless you want, but if you want to take this even further into the depths of decoding 608 to text, I would be more than happy to help you, since you have now given me a really cool tool. I want to be respectful of your voluntary and generous time.

I'm up for it, maybe open a new issue if you want to dump some specs and ideas.

But you can now tell your Swedish bosses that your awesome tool could now be used for debugging 608 data in H.264 in HLS and DASH, once the annex_b is extracted with FFmpeg.

My mind is blown.

tak, tak, tak!

No problem, glad someone else is as passionate to understand these things as me :)

Maybe i will answer a bit more sporadic the comes week(s), heading home to parents and whatnot... but i usually end up coding anyway.

@wader
Copy link
Owner Author

wader commented Dec 21, 2022

Got a little curious how 608 works:

➜  fq git:(sei-itu-t35) ✗ ffmpeg -loglevel warning -hide_banner -i ~/Downloads/testsrc.with608captions.ts -map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' -f 'h264' 'pipe:1' | go run .  --decode avc_annexb '[grep_by(.nal_unit_type=="sei" and .sei.payload_type=="user_data_registered_itu_t_t35") | .sei.data.user_structure.user_data_type_structure.cc[] | .cc_data_1,.cc_data_2 | band(.;0x7f)] | tobytes | dd'
    │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15│0123456789abcdef012345│
0x00│14 2f 14 2f 14 2e 14 2e 14 2c 14 2c 14 2e 14 20 11 40 54 68 69 73│././.....,.,... .@This│.: raw bits 0x0-0x63.7 (100)
0x16│20 69 73 20 61 20 63 61 70 74 69 6f 6e 20 69 6e 20 43 43 31 14 2f│ is a caption in CC1./│
0x2c│14 2f 14 2f 14 2e 14 2e 14 2c 14 2c 14 2e 14 20 11 40 41 20 73 65│././.....,.,... .@A se│
0x42│63 6f 6e 64 20 63 61 70 74 69 6f 6e 20 69 6e 20 43 43 31 2e 14 2f│cond caption in CC1../│
0x58│14 2f 14 2f 14 2e 14 2e 14 2c 14 2c│                             │././.....,.,│         │

The wikipedia article about it seems quite good https://en.wikipedia.org/wiki/EIA-608 but would be nice to get hands on the spec.

As i read it 0x14 is load into CC1 (caption channel 1?), so load "/" then load "/", ....., but then 0x11 comes which seem to mean load following bytes as caption text until... something, next command byte? once we figure this i guess it would not be that hard (might regret this) to write some basic 608 decode in jq.. possibly could have it in go also, will see.

@wader
Copy link
Owner Author

wader commented Dec 21, 2022

One suggestion... Now I see it with my own eyes, I think user_data_type_code: "CEA-708 captions" would make more sense to be listed in the table as user_data_type_code: "DTVCC Captions", since although the standard is EIA-708, the EIA-708 standard contains both 608 (the simple CC1/2/3/4 stuff) and 708 (which is a more complex Service [1...N] payload).

Fixed. I updated the user_data_type_code mapping to show how i mean by symbolic and description. Now you for example can do .user_data_type_code == "dtvcc" and you see the description explains more details. btw you can access the "raw" actual non-symbolic-mapped value with toactual, ex (.user_data_type_code | toactual) == 3 etc, there is also todescription and tosym ... also also tovalue which is gives you sym if set otherwise actual.

@wader
Copy link
Owner Author

wader commented Dec 23, 2022

Hey, did you send a new email with link to the file? Havent seen anything yet

@bbgdzxng1
Copy link

bbgdzxng1 commented Dec 26, 2022

I have re-shared the link.

With some code to keep track of pts for the samples one could nearly write them out as SRT or something :)

I had a think about this, and while SRT is a very readable format, it is then starting to get into the domain of conversion, rather than pure analysis. (A generic PTS > SMPTE timecode decode module could be cool for all kinds of formats though, not just packets, but timecode can be tricky).

From a DTVCC / SCTE-128 / EIA-608 perspective, the file format that is used to represent EIA-608 is SCC, which looks like:

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

Anyone who deals with US Closed Captions will recognize this format of 2 byte words. There are three projects that extract 608>SCC data:

The challenge is that all of these have a habit of also interpreting the data during conversion from 608>SCC, and all three give different results on the same source. They all have a developer-focused debug mode, which is a little more absolute, but their end-user facing conversion is aimed at producing a usable SCC file, rather than displaying what is in there. And this was why I reached out for the SEI T.35 stuff in fq- I was getting different results from different tools and I wanted "ground truth".

fq displaying the 2 byte words in the SEI would act as the independent arbitrator as to what 608 data was actually in there - but I don't think needs to go as far as trying to recreate SCC files with a timecode - there are already the three higher level conversion tools listed above for that, but there is not a lower-level DTVCC / SCTE-128 / 608 reference tool - and I think your fq branch meets that need. With the 7-bit parity stuff (thanks!), it is enough to be able to see what was in the actual payload - which is way beyond where we were.

But I thought I would share what the common SCC syntax looks like, because media professionals will recognize the 2-byte words format - and that may help at presentation level.

Now that I have shared that reference file in google drive (sorry about that' totally missed it), you'll see how complex it gets with CC1/2/3/4 and Service1/2, all blended in together. That is kinda like a "worst case scenario" file. It is worth keeping as a reference, but will also help determine how best to present info to a user.

The dream would be a filter that could display cc1/2/3/4 using jq syntax, ie "As a user, I would like to see exactly what EIA-608 commands were sent in the SEI message for CC3, presented in 2-byte words similar to the SCC format", but to be honest, what you have produced so far is already way beyond "There's some random T.35 message in the SEI" and that is really useful as-is. The patch and commands that you have produced allow me to get very close and in conjunction with ccextractor and caption-inspector's debug modes, it is possible to validate is in there with fq.

My initial requirement was that I was encoding 608 data with libcaption, and they were not getting identified by mediainfo and were not getting displayed by VLC, but I knew they were in there because mpv could play 'em. So I wanted to see what libcaption was doing differently, hence the request for the format. It was a perfect example of the need for a ground-truth SEI decoder. (The conclusion is that libcaption is not perfect, mediainfo was looking for a particular data rate and vlc is vlc).

While I have been shouting the virtues of fq, there was one key benefit of fq that I missed, which is that you can debug many different media files all with the same presentation format. If I am looking at a container, such as an MPEG TS or looking at an SEI header, the debug is presented in a common format. I don't need to learn a how to read the debug trace each time - that has a such a huge value. It was really useful when I was trying to ensure that my video usability information (VUI) was accurate and standards-compliant.

@wader
Copy link
Owner Author

wader commented Dec 27, 2022

I have re-shared the link.

With some code to keep track of pts for the samples one could nearly write them out as SRT or something :)

I had a think about this, and while SRT is a very readable format, it is then starting to get into the domain of conversion, rather than pure analysis. (A generic PTS > SMPTE timecode decode module could be cool for all kinds of formats though, not just packets, but timecode can be tricky).

From a DTVCC / SCTE-128 / EIA-608 perspective, the file format that is used to represent EIA-608 is SCC, which looks like:

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

Anyone who deals with US Closed Captions will recognize this format of 2 byte words. There are three projects that extract 608>SCC data:

The challenge is that all of these have a habit of also interpreting the data during conversion from 608>SCC, and all three give different results on the same source. They all have a developer-focused debug mode, which is a little more absolute, but their end-user facing conversion is aimed at producing a usable SCC file, rather than displaying what is in there. And this was why I reached out for the SEI T.35 stuff in fq- I was getting different results from different tools and I wanted "ground truth".

fq displaying the 2 byte words in the SEI would act as the independent arbitrator as to what 608 data was actually in there - but I don't think needs to go as far as trying to recreate SCC files with a timecode - there are already the three higher level conversion tools listed above for that, but there is not a lower-level DTVCC / SCTE-128 / 608 reference tool - and I think your fq branch meets that need. With the 7-bit parity stuff (thanks!), it is enough to be able to see what was in the actual payload - which is way beyond where we were.

But I thought I would share what the common SCC syntax looks like, because media professionals will recognize the 2-byte words format - and that may help at presentation level.

I see, and looking at the EIA-608 spec it seems a bit more complicated then i expect, but we will see. I usually try to make the go decoder "present" the formats in as neutral way as possible and you can use jq to massage things into other formats, ex something like this will the the cc bytes and produce the hex bytes pairs format above (ignoring how the timestamp would be extracted):

$ ffmpeg -loglevel warning -hide_banner -i ~/Downloads/testsrc.with608captions.ts -map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' -f 'h264' 'pipe:1' | go run .  -r --decode avc_annexb 'grep_by(.nal_unit_type=="sei" and .sei.payload_type=="user_data_registered_itu_t_t35") | .sei.data.user_structure.user_data_type_structure.cc | map(.cc_data_1, .cc_data_2) | tobytes | to_hex | chunk(4) | join(" ") as $pairs | "01:02:53:14 \($pairs)\n"'
01:02:53:14 942f 942f 94ae 94ae 942c 942c

01:02:53:14 94ae 9420 9140 5468 e973 20e9 7320 6120 e361 70f4 e9ef 6e20 e96e 2043 4331 942f

01:02:53:14 942f 942f 94ae 94ae 942c 942c

01:02:53:14 94ae 9420 9140 c120 73e5 e3ef 6e64 20e3 6170 f4e9 ef6e 20e9 6e20 4343 31ae 942f

01:02:53:14 942f 942f 94ae 94ae 942c 942c

So in some future one could put such function(s) in a scc.jq file etc and do fq -L . "include "scc"; .... | to_scc" ... (maybe even have from_scc somehow if it could make sense).

Now that I have shared that reference file in google drive (sorry about that' totally missed it), you'll see how complex it gets with CC1/2/3/4 and Service1/2, all blended in together. That is kinda like a "worst case scenario" file. It is worth keeping as a reference, but will also help determine how best to present info to a user.

The dream would be a filter that could display cc1/2/3/4 using jq syntax, ie "As a user, I would like to see exactly what EIA-608 commands were sent in the SEI message for CC3, presented in 2-byte words similar to the SCC format", but to be honest, what you have produced so far is already way beyond "There's some random T.35 message in the SEI" and that is really useful as-is. The patch and commands that you have produced allow me to get very close and in conjunction with ccextractor and caption-inspector's debug modes, it is possible to validate is in there with fq.

The files will be very useful and we will see what makes to do as go decode or as jq programs.

My initial requirement was that I was encoding 608 data with libcaption, and they were not getting identified by mediainfo and were not getting displayed by VLC, but I knew they were in there because mpv could play 'em. So I wanted to see what libcaption was doing differently, hence the request for the format. It was a perfect example of the need for a ground-truth SEI decoder. (The conclusion is that libcaption is not perfect, mediainfo was looking for a particular data rate and vlc is vlc).

While I have been shouting the virtues of fq, there was one key benefit of fq that I missed, which is that you can debug many different media files all with the same presentation format. If I am looking at a container, such as an MPEG TS or looking at an SEI header, the debug is presented in a common format. I don't need to learn a how to read the debug trace each time - that has a such a huge value. It was really useful when I was trying to ensure that my video usability information (VUI) was accurate and standards-compliant.

This was also one of the reason i really wanted jq (and JSON) as i've noticed that i wrote lots of script to turn differente debug outputs from other tools into JSON and then used jq to combine and query.

BTW fq has a diff function if you happen to have two values you want to compare, i've used to compare sps/pps between files, ex:

# -n tell fq to not automatically read/decode input files, will have to use input/inputs explicitly
# define a function f that finds the sps (assumes there is only one)
# use diff with input|f as both arguments, reads/decode next file and finds sps (note ";" is the argument sepeator in jq)
# object with a/b value is shown in the structure where things differ
fq -n 'def f: grep_by(format=="avc_sps"); diff(input|f;input|f)' file1.mp4 file2.mp4
{
  "frame_cropping_flag": {
    "a": true,
    "b": false
  },
  "level_idc": {
    "a": "3",
    "b": "3.1"
  }
}

@bbgdzxng1
Copy link

That command is genius, irrelevant of the timecode. Thank-you. It does exactly what I need - display the words in a format that I can use to validate the output of ccextractor, FFmpeg and caption-inspector to arbitrate what was actually transmitted in the 608. Wow.

$ ffmpeg -loglevel 'warning' -hide_banner \
	-an -sn -dn -i "${infile}" \
	-map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' \
	-f 'h264' 'pipe:1' \
	| "$(go env GOPATH)/bin/fq" --raw-output --decode 'avc_annexb' 'grep_by(.nal_unit_type=="sei" and .sei.payload_type=="user_data_registered_itu_t_t35") | .sei.data.user_structure.user_data_type_structure.cc | map(.cc_data_1, .cc_data_2) | tobytes | tohex | chunk(4) | join(" ") as $pairs | "TI:ME:CO;DE\t\($pairs)\n"'

TI:ME:CO;DE	942f 942f 94ae 94ae 942c 942c

TI:ME:CO;DE	94ae 9420 9140 5468 e973 20e9 7320 6120 e361 70f4 e9ef 6e20 e96e 2043 4331 942f

TI:ME:CO;DE	942f 942f 94ae 94ae 942c 942c

TI:ME:CO;DE	94ae 9420 9140 c120 73e5 e3ef 6e64 20e3 6170 f4e9 ef6e 20e9 6e20 4343 31ae 942f

TI:ME:CO;DE	942f 942f 94ae 94ae 942c 942c

And that diff is pretty cool.

You have created an awesome tool for EIA-608 debugging, irrelevant of the timecode. I had to use tohex rather than to_hex, but that was irrelevant.

You did it! I genuinely think that you have created a unique tool that allows a user to validate the output of the other 608 caption tools. I'll leave you alone and I'll go play with my captions files. Thank-you, sir.

fq needs more exposure to media folks. You are onto something really quite special with "seeing everything about the picture, except the picture".

@wader
Copy link
Owner Author

wader commented Dec 27, 2022

That command is genius, irrelevant of the timecode. Thank-you. It does exactly what I need - display the words in a format that I can use to validate the output of ccextractor, FFmpeg and caption-inspector to arbitrate what was actually transmitted in the 608. Wow.

$ ffmpeg -loglevel 'warning' -hide_banner \
	-an -sn -dn -i "${infile}" \
	-map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' \
	-f 'h264' 'pipe:1' \
	| "$(go env GOPATH)/bin/fq" --raw-output --decode 'avc_annexb' 'grep_by(.nal_unit_type=="sei" and .sei.payload_type=="user_data_registered_itu_t_t35") | .sei.data.user_structure.user_data_type_structure.cc | map(.cc_data_1, .cc_data_2) | tobytes | tohex | chunk(4) | join(" ") as $pairs | "TI:ME:CO;DE\t\($pairs)\n"'

TI:ME:CO;DE	942f 942f 94ae 94ae 942c 942c

TI:ME:CO;DE	94ae 9420 9140 5468 e973 20e9 7320 6120 e361 70f4 e9ef 6e20 e96e 2043 4331 942f

TI:ME:CO;DE	942f 942f 94ae 94ae 942c 942c

TI:ME:CO;DE	94ae 9420 9140 c120 73e5 e3ef 6e64 20e3 6170 f4e9 ef6e 20e9 6e20 4343 31ae 942f

TI:ME:CO;DE	942f 942f 94ae 94ae 942c 942c

Great! what is ${infile} in this case mp4 or ts? if it's mp4 i think you can already decode directly with fq if you want to skip going thru ffmpeg, maybe something like this:

$ "$(go env GOPATH)/bin/fq" --raw-output 'grep_by(.nal_unit_type=="sei" and .sei.payload_type=="user_data_registered_itu_t_t35") | .sei.data.user_structure.user_data_type_structure.cc | map(.cc_data_1, .cc_data_2) | tobytes | tohex | chunk(4) | join(" ") as $pairs | "TI:ME:CO;DE\t\($pairs)\n"' ${infile}"

or maybe do .tracks[0] | grep_by(..) if you only want to look in SEI:s for the first track. to look in the first video track like ffmpeg:s 0:v:0 you have to do some "trak" box queries etc to figure out media and track id first.

And that diff is pretty cool.

You have created an awesome tool for EIA-608 debugging, irrelevant of the timecode. I had to use tohex rather than to_hex, but that was irrelevant.

Aha sorry about that, tohex was renamed to to_hex in the latest release. Sadly jq's standard library is a bit inconsistent with the naming but i feel lik snake_case is probably more readable and did not want to keep double functions around... maybe i should have kept the old names around for one release and made them throw a useful error now when i think about it.

You did it! I genuinely think that you have created a unique tool that allows a user to validate the output of the other 608 caption tools. I'll leave you alone and I'll go play with my captions files. Thank-you, sir.

No problem, happy play around! let me know how it goes and feel free to ask jq questions also, want to spread knowledge about it. And i do have lots to thank jq (and gojq that fq uses a modified version of) for making all this possible, i sometimes feel like nearly accidentally happen make it fit together with a bit stream decoder... but yeah it was quite a lot of work and thinking to make it happen :)

fq needs more exposure to media folks. You are onto something really quite special with "seeing everything about the picture, except the picture".

Hehe it's a good summary what fq is about :) and i hope more ppl will find it useful, and i think uses cases like yours shows very well what it's capable of.

@wader
Copy link
Owner Author

wader commented Apr 13, 2023

@bbgdzxng1 sorry the progress on this stalled, got stuck in other things. hope i will get back to this and mpeg ts, but i will try keep this PR rebased on master from time to time

@bbgdzxng1
Copy link

@wader. Mattias - You were kind enough to help me 18 months ago with this branch. I just wanted to thank you again - I've been using fq pretty much every few weeks as the need arises. The work that you did on this branch was super-human, and having been given your guidance, I have found that main-branch fq to be sufficient when inspecting annexb files containing SEI side data. With some of your tricks and commands, you have steered me in the right direction with my limited needs and limited skills.

Of course, if you ever do feel the desire to add this DTVCC-parsing branch to mainstream, that would not be unwelcome, but I appreciate that you may not want branches like this hanging around indefinitely. If so, feel free to archive - I have been able to get along sufficiently for my needs with each new stable release of fq. If you ever want to get back to H.264 and media files, you know where to find me.

It is great to see the fq project getting stronger and being recognized for the powerful tool that it is.

I hope you remain well, Mattias.

@wader
Copy link
Owner Author

wader commented Nov 16, 2023

@wader. Mattias - You were kind enough to help me 18 months ago with this branch. I just wanted to thank you again - I've been using fq pretty much every few weeks as the need arises. The work that you did on this branch was super-human, and having been given your guidance, I have found that main-branch fq to be sufficient when inspecting annexb files containing SEI side data. With some of your tricks and commands, you have steered me in the right direction with my limited needs and limited skills.

Nice to hear and nice to hear from you!

Of course, if you ever do feel the desire to add this DTVCC-parsing branch to mainstream, that would not be unwelcome, but I appreciate that you may not want branches like this hanging around indefinitely. If so, feel free to archive - I have been able to get along sufficiently for my needs with each new stable release of fq. If you ever want to get back to H.264 and media files, you know where to find me.

I would love to get parts of or all of these WIP branches merged somehow. Maybe the sei-itu-t35 stuff could be merged after some small polishing? the mpeg ts stuff is a bit more complex but maybe that could also be split into some more mergable parts? think i got a bit stuck on how to model things and also how to handle corrupt streams.

BTW i just rebased both branches on top of master, seems to work fine.

It is great to see the fq project getting stronger and being recognized for the powerful tool that it is.

I hope you remain well, Mattias.

Same!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants