You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason why I looked at `transkribus-to-prima` is that we noticed a new strange thing in Transkribus page files: they contain text regions (`TextRegion`) with text lines (`TextLine`) which contain text (`TextEquiv`), but the text for the region (`TextEquiv`) is empty. Example: first text region in https://raw.githubusercontent.com/UB-Mannheim/reichsanzeiger-gt/main/page-xml/1820_84_0220.xml. Converting that PAGE XML file to text with `ocr-transform` results therefore in missing text.
If that is a common problem with Transkribus files, adding a fix for it to transkribus-to-prima might be a good idea.
If that is a common problem with Transkribus files, adding a fix for it to
transkribus-to-prima
might be a good idea.See also UB-Mannheim/reichsanzeiger-gt#1.
Originally posted by @stweil in #17 (comment)
The text was updated successfully, but these errors were encountered: