Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greek Model number #40

Open
Ctaaffe opened this issue Jan 17, 2022 · 6 comments
Open

Greek Model number #40

Ctaaffe opened this issue Jan 17, 2022 · 6 comments

Comments

@Ctaaffe
Copy link

Ctaaffe commented Jan 17, 2022

"σ΄. ψιμμυθίου δραχ. ρν΄. πηγάνου δραχ. ρ΄. σταφίδος ἀγρίας δραχ. κε΄. ὄξους ξεστ. γ΄. μυρσίνου λίτρας γ΄. τρῖβε κατὰ μέρος ἐπιβάλλων τὸ ὑγρὸν καὶ συντίθει"

Using this recipe from Galen that includes Greek letters as numerals; the numbers are marked with the "΄" and are lemmatized as words and the "΄" is "." any fix that would allow pie-extended to recognize the numbers without having to retrain the entire model. Thank you for any answer.

@PonteIneptique
Copy link
Member

This is probably possible, but soneone would need to come up with an Ancien Greek numeral regular expression that I would then have to inject in the tokenizer, thus fixing the lemmatization.

@Ctaaffe
Copy link
Author

Ctaaffe commented Jan 17, 2022

I could do that, do you want the expression directly in pyhton or in an other format ?

@PonteIneptique
Copy link
Member

I'd need it in python to capture any numeral, such as

RomanNumbers = r"(?:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})" \
:)

@PonteIneptique
Copy link
Member

I can't promise to include it before early February though.

@Ctaaffe
Copy link
Author

Ctaaffe commented Jan 17, 2022

Thank you, for you answer no problem I'll see when I'll have the time to write it and use it on my own install of pie-extended in the meantime.

@etymologika
Copy link

Bonjour à tous deux, voici les deux règles pour qu'une lettre ait valeur de chiffre en grec:

  1. Une seule lettre ou un groupe de plusieurs lettres de la liste suivante, suivies du signe ʹ (apostrophe légèrement oblique, dénommée dexia keraia, code Unicode 0374), forment un chiffre (ex.: αʹ =1, ιϛʹ =16, κδʹ = 24).
    α
    β
    γ
    δ
    ε
    ϛ
    ζ
    η
    θ
    ι
    κ
    λ
    μ
    ν
    ξ
    ο
    π
    ϟ
    ρ
    σ
    τ
    υ
    φ
    χ
    ψ
    ω
    ϡ
  2. Une seule lettre ou un groupe de lettres peut être précédé ou interrompu par le signe ͵ (aristeri keraia, code Unicode 0375), avec ou sans signe ʹ (dexia keraia) à la fin, exemples: ͵α =1000͵ α͵ροστ΄ = 1176.
    Grand merci d'avance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants