-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor SPARQL queries into atomic structures: #387
Refactor SPARQL queries into atomic structures: #387
Conversation
Also renamed directories that did not follow naming convention
Thank you for the pull request!The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :) If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you! Maintainer checklist |
@andrewtavis, I have refactored the language_data_extraction to follow the new convention - keeping queries "atomic". This PR touched many files, especially those related to nouns (broken to I also ran the queries check and all data_type QIDs checks passed. We are left with language QIDs checks but that is done because once see the file here what do you think? |
Hey @DeleMike! 👋 This is amazing 🤩 One thing here is that I'm not sure if we actually need all of the forms from noun queries for the new proper noun queries 🤔 Can you give it a check, and maybe for these new ones it's as simple as singular and plural :) |
Thank you @andrewtavis, could you help explain what you mean? an example should help... My current understanding is you only want singular and plural for proper nouns, we remove gender types. For example see how # tool: scribe-data
# Arabic (Q13955) proper nouns with singular and plural forms.
# Enter this query at https://query.wikidata.org/.
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?noun
?singularNominative
?pluralNominative
?singularAccusative
?pluralAccusative
?singularGenitive
?pluralGenitive
WHERE {
?lexeme dct:language wd:Q13955 ;
wikibase:lexicalCategory wd:Q147276 ;
wikibase:lemma ?noun .
# Singular Nominative
OPTIONAL {
?lexeme ontolex:lexicalForm ?singularNominativeForm .
?singularNominativeForm ontolex:representation ?singularNominative ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q131105 .
}
# Plural Nominative
OPTIONAL {
?lexeme ontolex:lexicalForm ?pluralNominativeForm .
?pluralNominativeForm ontolex:representation ?pluralNominative ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q131105 .
}
# Singular Accusative
OPTIONAL {
?lexeme ontolex:lexicalForm ?singularAccusativeForm .
?singularAccusativeForm ontolex:representation ?singularAccusative ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q146078 .
}
# Plural Accusative
OPTIONAL {
?lexeme ontolex:lexicalForm ?pluralAccusativeForm .
?pluralAccusativeForm ontolex:representation ?pluralAccusative ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q146078 .
}
# Singular Genitive
OPTIONAL {
?lexeme ontolex:lexicalForm ?singularGenitiveForm .
?singularGenitiveForm ontolex:representation ?singularGenitive ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q146233 .
}
# Plural Genitive
OPTIONAL {
?lexeme ontolex:lexicalForm ?pluralGenitiveForm .
?pluralGenitiveForm ontolex:representation ?pluralGenitive ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q146233 .
}
} compared against # tool: scribe-data
# All Arabic (Q13955) proper nouns.
# Enter this query at https://query.wikidata.org/.
SELECT
(REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
?noun
?femSingularNominativeIndef
?masSingularNominativeIndef
?femDualNominativeIndef
?masDualNominativeIndef
?femPluralNominativeIndef
?masPluralNominativeIndef
?femSingularAccusativeIndef
?masSingularAccusativeIndef
?femDualAccusativeIndef
?masDualAccusativeIndef
?femPluralAccusativeIndef
?masPluralAccusativeIndef
?femSingularGenitiveIndef
?masSingularGenitiveIndef
?femDualGenitiveIndef
?masDualGenitiveIndef
?femPluralGenitiveIndef
?masPluralGenitiveIndef
?femSingularPausalIndef
?masSingularPausalIndef
?femDualPausalIndef
?masDualPausalIndef
?femPluralPausalIndef
?masPluralPausalIndef
WHERE {
?lexeme dct:language wd:Q13955 ;
wikibase:lexicalCategory wd:Q147276 ;
wikibase:lemma ?noun .
# MARK: Nominative
# Singular
OPTIONAL {
?lexeme ontolex:lexicalForm ?femSingularNominativeIndefForm .
?femSingularNominativeIndefForm ontolex:representation ?femSingularNominativeIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q131105 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masSingularNominativeIndefForm .
?masSingularNominativeIndefForm ontolex:representation ?masSingularNominativeIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q131105 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# Dual
OPTIONAL {
?lexeme ontolex:lexicalForm ?femDualNominativeIndefForm .
?femDualNominativeIndefForm ontolex:representation ?femDualNominativeIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q110022 ;
wikibase:grammaticalFeature wd:Q131105 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masDualNominativeIndefForm .
?masDualNominativeIndefForm ontolex:representation ?masDualNominativeIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q110022 ;
wikibase:grammaticalFeature wd:Q131105 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# Plural
OPTIONAL {
?lexeme ontolex:lexicalForm ?femPluralNominativeIndefForm .
?femPluralNominativeIndefForm ontolex:representation ?femPluralNominativeIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q131105 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masPluralNominativeIndefForm .
?masPluralNominativeIndefForm ontolex:representation ?masPluralNominativeIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q131105 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# MARK: Accusative
# Singular
OPTIONAL {
?lexeme ontolex:lexicalForm ?femSingularAccusativeIndefForm .
?femSingularAccusativeIndefForm ontolex:representation ?femSingularAccusativeIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q146078 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masSingularAccusativeIndefForm .
?masSingularAccusativeIndefForm ontolex:representation ?masSingularAccusativeIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q146078 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# Dual
OPTIONAL {
?lexeme ontolex:lexicalForm ?femDualAccusativeIndefForm .
?femDualAccusativeIndefForm ontolex:representation ?femDualAccusativeIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q110022 ;
wikibase:grammaticalFeature wd:Q146078 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masDualAccusativeIndefForm .
?masDualAccusativeIndefForm ontolex:representation ?masDualAccusativeIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q110022 ;
wikibase:grammaticalFeature wd:Q146078 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# Plural
OPTIONAL {
?lexeme ontolex:lexicalForm ?femPluralAccusativeIndefForm .
?femPluralAccusativeIndefForm ontolex:representation ?femPluralAccusativeIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q146078 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masPluralAccusativeIndefForm .
?masPluralAccusativeIndefForm ontolex:representation ?masPluralAccusativeIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q146078 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# MARK: Genitive
# Singular
OPTIONAL {
?lexeme ontolex:lexicalForm ?femSingularGanitiveIndefForm .
?femSingularGanitiveIndefForm ontolex:representation ?femSingularGanitiveIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q146233 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masSingularGanitiveIndefForm .
?masSingularGanitiveIndefForm ontolex:representation ?masSingularGanitiveIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q146233 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# Dual
OPTIONAL {
?lexeme ontolex:lexicalForm ?femDualGanitiveIndefForm .
?femDualGanitiveIndefForm ontolex:representation ?femDualGanitiveIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q110022 ;
wikibase:grammaticalFeature wd:Q146233 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masDualGanitiveIndefForm .
?masDualGanitiveIndefForm ontolex:representation ?masDualGanitiveIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q110022 ;
wikibase:grammaticalFeature wd:Q146233 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# Plural
OPTIONAL {
?lexeme ontolex:lexicalForm ?femPluralGanitiveIndefForm .
?femPluralGanitiveIndefForm ontolex:representation ?femPluralGanitiveIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q146233 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masPluralGanitiveIndefForm .
?masPluralGanitiveIndefForm ontolex:representation ?masPluralGanitiveIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q146233 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# MARK: Pausal
# Singular
OPTIONAL {
?lexeme ontolex:lexicalForm ?femSingularPausalIndefForm .
?femSingularPausalIndefForm ontolex:representation ?femSingularPausalIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q117262361 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masSingularPausalIndefForm .
?masSingularPausalIndefForm ontolex:representation ?masSingularPausalIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q110786 ;
wikibase:grammaticalFeature wd:Q117262361 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# Dual
OPTIONAL {
?lexeme ontolex:lexicalForm ?femDualPausalIndefForm .
?femDualPausalIndefForm ontolex:representation ?femDualPausalIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q110022 ;
wikibase:grammaticalFeature wd:Q117262361 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masDualPausalIndefForm .
?masDualPausalIndefForm ontolex:representation ?masDualPausalIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q110022 ;
wikibase:grammaticalFeature wd:Q117262361 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
# Plural
OPTIONAL {
?lexeme ontolex:lexicalForm ?femPluralPausalIndefForm .
?femPluralPausalIndefForm ontolex:representation ?femPluralPausalIndef ;
wikibase:grammaticalFeature wd:Q1775415 ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q117262361 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
OPTIONAL {
?lexeme ontolex:lexicalForm ?masPluralPausalIndefForm .
?masPluralPausalIndefForm ontolex:representation ?masPluralPausalIndef ;
wikibase:grammaticalFeature wd:Q499327 ;
wikibase:grammaticalFeature wd:Q146786 ;
wikibase:grammaticalFeature wd:Q117262361 ;
wikibase:grammaticalFeature wd:Q53997857 ;
} .
}
|
I guess it's more of a question of what forms do the proper nouns even have? There's doubtless a distinction between them and other nouns 🤔 But then I just checked the Czech proper nouns and there are tons for forms for every item 🤯 Surprising. I thought that names would just be names. |
I'll give the proper nouns a check then and we can be good here :) |
Just as a help: A proper noun is the name of a particular person, place, organization, or thing. For example, "London", "GitHub", "Scribe", "Michael", "Lagos". The difference between proper nouns and common nouns (other types of nouns) is Specificity & Capitalization. Examples of common nouns are: country, city, mountain, love, freedom, education. |
I know, but there's no guarantee that the way a noun is modeled in a language is the same as how a proper noun is modeled :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the hard work here, @DeleMike! We're really making progress on these queries here, and this will set us up nicely to getting all the tests up and running 😊
Oh I see! You are right 💯💯 |
Also renamed directories that did not follow the naming convention
Contributor checklist
Description
PR Description
This pull request proposes a refactor of the SPARQL queries to improve consistency and maintainability by splitting the combined queries into atomic files, each dedicated to a specific lexical category (e.g., nouns, prepositions, verbs, etc.). Previously, multiple data types were combined in single queries using the
VALUES
keyword, which complicated the validation process and made it harder to apply uniform patterns likewikibase:lexicalCategory wd:Q\d+
. By separating the queries into distinct files, validation is streamlined, and future updates or changes can be managed more easily.Testing
To validate these changes, I copied the
check_queries()
function from PR #371 to ensure the correctness of the data types in the refactored queries. I have confirmed that all data types in Scribe-Data are valid.However, there are still issues with languages because the
language_metadata
file is not updated, which prevents proper validation of language QIDs. Once the metadata is updated, the validation should work for languages as well.This refactor improves maintainability and simplifies the process of adding or modifying specific lexical categories in the future.
Related issue