Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect newline characters breaking JSON parsing #366

Open
vincerubinetti opened this issue Mar 11, 2024 · 0 comments
Open

Incorrect newline characters breaking JSON parsing #366

vincerubinetti opened this issue Mar 11, 2024 · 0 comments

Comments

@vincerubinetti
Copy link
Collaborator

See https://github.com/3b1b/captions/blob/main/2023/gaussian-integral/hebrew/sentence_translations.json#L774

I think this is the AI model trying to translate a \n newline character, and using a Hebrew "n" instead, which is not a valid JSON escape character. So, parsing fails, and going to that lesson page shows that the captions file is missing (I could improve the message to discriminate between loading errors and parsing errors).

It'd be hard to make the app recover from this type of parsing error though. I could replace all s with \ns, but what about other languages and escape characters? Perhaps a better solution here would be to make sure these characters are removed from the input English before passing them to the models. Could more easily make sure all escape characters are captured that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant