Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]文本单词重复时,字符数目不匹配导致句子匹配错误 #263

Open
Brzjomo opened this issue Nov 14, 2024 · 3 comments
Open

Comments

@Brzjomo
Copy link

Brzjomo commented Nov 14, 2024

⚠️ 警告:未找到与句子完全匹配的结果: but they can also very easily mess up the perspective in your scene so the you know the feet is going to be are going to be the things that that basically connect your character with its environment you know if you're drawing a character in an environment of course
Difference positions:
Expected sentence: buttheycanalsoveryeasilymessuptheperspectiveinyourscenesotheyouknowthefeetisgoingtobearegoingtobethethingsthatthatbasicallyconnectyourcharacterwithitsenvironmentyouknowifyouredrawingacharacterinanenvironmentofcourse
Actual match: erstandthelogicbehindeverythingandthenonceyouknowityoucanstarttoskipthingsaheadyoucanstarttofastforwardandeventuallyendupbeingabletodothisreallyreallyquicklyinamatterofminutessoyeahkeeponpracticingandillseeyoulater
Position markers: ^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^ ^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^ ^^^^^^^^ ^^ ^^^^^^^
Difference indices: [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 111, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 132, 133, 134, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 193, 194, 196, 197, 198, 199, 200, 201, 202, 203, 205, 206, 208, 209, 210, 211, 212, 213, 214]

原句: but they can also very easily mess up the perspective in your scene so the you know the feet is going to be are going to be the things that that basically connect your character with its environment you know if you're drawing a character in an environment of course

根据step6的get_sentence_timestamps()函数来看,程序用清理后的CLEANED_CHUNKS_FILE中的文本与TRANSLATION_RESULTS_FOR_SUBTITLES_FILE文件中清理后的文本进行比较,以确定是否匹配。在log文件夹中,只看到了“cleaned_chunks.xlsx”文件,进行分析后发现该文件包含临近的重复文本,比如下图:
PixPin_2024-11-15_00-38-02
在尝试修复的过程中,我发现“sentence_splitbymeaning.txt”文件中相应的文本并没有出现重复的单词:
PixPin_2024-11-15_00-43-05
于是,我把“cleaned_chunks.xlsx”文件中临近的第2个“the”文本替换为“”,然后把时间加在上一个"the"上。运行后,成功解决了问题,并来到第2个类似的问题处。见最上方的日志。
于是,我用同样的方法处理这个问题:
PixPin_2024-11-15_00-59-32
成功解决。之后顺利完成翻译。
观看视频,字幕时间轴正确,翻译正常。

可见,目前遇到类似问题都可以这样处理,希望能修复。

@ppoudd1
Copy link

ppoudd1 commented Nov 15, 2024

这个我也遇到过。确实有这个BUG,希望作者能修复。

@Huanshere
Copy link
Owner

嗯。。。whisper有幻觉、llm有幻觉,叠在一起就很恐怖。不过下个版本改写了这块,不进行模糊匹配,而是直接严格匹配,就不会出现这些问题了(大概),周末发布~

@Huanshere
Copy link
Owner

这个 bug 应该在 v2.0 能解决了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants