You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that there are several XML files that went through passes of the word segmentation workflow at earlier stages which now preserve transcription_segmented divs that are not encoded to the current standard.
For example, jeru0183 : <orig xml:id="jeru0183-7" xml:lang="arc"><foreign xml:lang="grc"><unclear>Κ</unclear></foreign><g ref="interpunct">·</g><foreign xml:lang="grc"><unclear>Ν</unclear>ΙΦ</foreign></orig>
The foreign tag should be removed and its xml:lang attribute moved to replace the xml:lang attribute in the enclosing orig tag.
To do: check other transcription_segmented divs for issues like this and push an update for these.
The text was updated successfully, but these errors were encountered:
Running through some validation of the existing output from past word segmentation workflows. Will add to this thread as issues that should be resolved come up.
Another error: the occurrences column in the parsed language wordlists reference some XML files that do not seem to exist (anymore):
Running through some validation of the existing output from past word segmentation workflows. Will add to this thread as issues that should be resolved come up.
Another error: the occurrences column in the parsed language wordlists reference some XML files that do not seem to exist (anymore):
It appears that there are several XML files that went through passes of the word segmentation workflow at earlier stages which now preserve transcription_segmented divs that are not encoded to the current standard.
For example, jeru0183 :
<orig xml:id="jeru0183-7" xml:lang="arc"><foreign xml:lang="grc"><unclear>Κ</unclear></foreign><g ref="interpunct">·</g><foreign xml:lang="grc"><unclear>Ν</unclear>ΙΦ</foreign></orig>
The foreign tag should be removed and its xml:lang attribute moved to replace the xml:lang attribute in the enclosing orig tag.
To do: check other transcription_segmented divs for issues like this and push an update for these.
The text was updated successfully, but these errors were encountered: