-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different subtitle outputs with CLI commands #33
Comments
There's no standardized format (that I know of) for word-level subtitles, unfortunately. The auto-subtitles from YouTube internally use both a custom JSON format like: "events": [
{
"tStartMs": 0,
"dDurationMs": 502120,
"id": 1,
"wpWinPosId": 1,
"wsWinStyleId": 1
},
{
"tStartMs": 120,
"dDurationMs": 7239,
"wWinId": 1,
"segs": [
{
"utf8": "great",
"acAsrConf": 0
},
{
"utf8": " paper",
"tOffsetMs": 400,
"acAsrConf": 0
},
{
"utf8": " today",
"tOffsetMs": 760,
"acAsrConf": 0
},
{
"utf8": " fellow",
"tOffsetMs": 1240,
"acAsrConf": 0
},
{
"utf8": " Scholars",
"tOffsetMs": 1640,
"acAsrConf": 0
},
{
"utf8": " stable",
"tOffsetMs": 2519,
"acAsrConf": 0
}
]
},
{
"tStartMs": 3149,
"dDurationMs": 4210,
"wWinId": 1,
"aAppend": 1,
"segs": [
{
"utf8": "\n"
}
]
},
{
"tStartMs": 3159,
"dDurationMs": 6841,
"wWinId": 1,
"segs": [
{
"utf8": "diffusion",
"acAsrConf": 0
},
{
"utf8": " XL",
"tOffsetMs": 800,
"acAsrConf": 0
},
{
"utf8": " turbo",
"tOffsetMs": 1761,
"acAsrConf": 0
},
{
"utf8": " why",
"tOffsetMs": 2761,
"acAsrConf": 0
},
{
"utf8": " well",
"tOffsetMs": 3441,
"acAsrConf": 0
},
{
"utf8": " because",
"tOffsetMs": 3881,
"acAsrConf": 0
}
]
}, And also extend the VTT subtitle format using special word timestamp tags: WEBVTT
Kind: captions
Language: en
00:00:00.120 --> 00:00:03.149 align:start position:0%
great<00:00:00.520><c> paper</c><00:00:00.880><c> today</c><00:00:01.360><c> fellow</c><00:00:01.760><c> Scholars</c><00:00:02.639><c> stable</c>
00:00:03.149 --> 00:00:03.159 align:start position:0%
great paper today fellow Scholars stable
00:00:03.159 --> 00:00:07.349 align:start position:0%
great paper today fellow Scholars stable
diffusion<00:00:03.959><c> XL</c><00:00:04.920><c> turbo</c><00:00:05.920><c> why</c><00:00:06.600><c> well</c><00:00:07.040><c> because</c>
00:00:07.349 --> 00:00:07.359 align:start position:0%
diffusion XL turbo why well because These are internal formats they use, which I fetched using a special downloader like I don't know of any software that actually supports these formats for viewing, so I'm not sure what would be the benefit to support them or try to imitate them (However, it could support reading and converting them in the future - but remember that they can only be fetched using special downloaders and not by the official YouTube API, so the priority to implement this is currently low). The JSON format produced by Echogarden contains a lot of extra linguistic information, like phonetic pronunciation and sub-word timing, and also includes word offsets to the original raw text. |
Is there a way to edit subtitle outputs with CLI commands? It would be very good to have formats like this:
So far the only method I can think of is converting JSON files but it's a bit hard for me as a non-coder.
The text was updated successfully, but these errors were encountered: