Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SPARQL queries into atomic structures: #387

Merged
merged 5 commits into from
Oct 16, 2024

Conversation

DeleMike
Copy link
Contributor

Also renamed directories that did not follow the naming convention

Contributor checklist


Description

PR Description

This pull request proposes a refactor of the SPARQL queries to improve consistency and maintainability by splitting the combined queries into atomic files, each dedicated to a specific lexical category (e.g., nouns, prepositions, verbs, etc.). Previously, multiple data types were combined in single queries using the VALUES keyword, which complicated the validation process and made it harder to apply uniform patterns like wikibase:lexicalCategory wd:Q\d+. By separating the queries into distinct files, validation is streamlined, and future updates or changes can be managed more easily.

Testing

To validate these changes, I copied the check_queries() function from PR #371 to ensure the correctness of the data types in the refactored queries. I have confirmed that all data types in Scribe-Data are valid.

However, there are still issues with languages because the language_metadata file is not updated, which prevents proper validation of language QIDs. Once the metadata is updated, the validation should work for languages as well.


This refactor improves maintainability and simplifies the process of adding or modifying specific lexical categories in the future.

Related issue

Also renamed directories that did not follow naming convention
Copy link

github-actions bot commented Oct 16, 2024

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The linting and formatting workflow within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

@DeleMike
Copy link
Contributor Author

@andrewtavis, I have refactored the language_data_extraction to follow the new convention - keeping queries "atomic".

This PR touched many files, especially those related to nouns (broken to nouns and proper nouns) and prepostpositions (prepositions and postpositions).

I also ran the queries check and all data_type QIDs checks passed. We are left with language QIDs checks but that is done because once language_metadata.json is sorted, that should work automatically.

see the file here

what do you think?

@andrewtavis andrewtavis added the hacktoberfest-accepted Accepted as a part of Hacktoberfest label Oct 16, 2024
@andrewtavis andrewtavis self-requested a review October 16, 2024 16:20
@andrewtavis
Copy link
Member

Hey @DeleMike! 👋 This is amazing 🤩 One thing here is that I'm not sure if we actually need all of the forms from noun queries for the new proper noun queries 🤔 Can you give it a check, and maybe for these new ones it's as simple as singular and plural :)

@DeleMike
Copy link
Contributor Author

Hey @DeleMike! 👋 This is amazing 🤩 One thing here is that I'm not sure if we actually need all of the forms from noun queries for the new proper noun queries 🤔 Can you give it a check, and maybe for these new ones it's as simple as singular and plural :)

Thank you @andrewtavis, could you help explain what you mean? an example should help...

My current understanding is you only want singular and plural for proper nouns, we remove gender types. For example see how Arabic/proper_nouns/query_proper_nouns.sparql will look like:

# tool: scribe-data
# Arabic (Q13955) proper nouns with singular and plural forms.
# Enter this query at https://query.wikidata.org/.

SELECT
  (REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
  ?noun
  ?singularNominative
  ?pluralNominative
  ?singularAccusative
  ?pluralAccusative
  ?singularGenitive
  ?pluralGenitive

WHERE {
  ?lexeme dct:language wd:Q13955 ;
    wikibase:lexicalCategory wd:Q147276 ;
    wikibase:lemma ?noun .

  # Singular Nominative
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?singularNominativeForm .
    ?singularNominativeForm ontolex:representation ?singularNominative ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q131105 .
  }

  # Plural Nominative
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?pluralNominativeForm .
    ?pluralNominativeForm ontolex:representation ?pluralNominative ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q131105 .
  }

  # Singular Accusative
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?singularAccusativeForm .
    ?singularAccusativeForm ontolex:representation ?singularAccusative ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q146078 .
  }

  # Plural Accusative
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?pluralAccusativeForm .
    ?pluralAccusativeForm ontolex:representation ?pluralAccusative ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q146078 .
  }

  # Singular Genitive
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?singularGenitiveForm .
    ?singularGenitiveForm ontolex:representation ?singularGenitive ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q146233 .
  }

  # Plural Genitive
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?pluralGenitiveForm .
    ?pluralGenitiveForm ontolex:representation ?pluralGenitive ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q146233 .
  }
}

compared against

# tool: scribe-data
# All Arabic (Q13955) proper nouns.
# Enter this query at https://query.wikidata.org/.

SELECT
  (REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") AS ?lexemeID)
  ?noun

  ?femSingularNominativeIndef
  ?masSingularNominativeIndef
  ?femDualNominativeIndef
  ?masDualNominativeIndef
  ?femPluralNominativeIndef
  ?masPluralNominativeIndef

  ?femSingularAccusativeIndef
  ?masSingularAccusativeIndef
  ?femDualAccusativeIndef
  ?masDualAccusativeIndef
  ?femPluralAccusativeIndef
  ?masPluralAccusativeIndef

  ?femSingularGenitiveIndef
  ?masSingularGenitiveIndef
  ?femDualGenitiveIndef
  ?masDualGenitiveIndef
  ?femPluralGenitiveIndef
  ?masPluralGenitiveIndef

  ?femSingularPausalIndef
  ?masSingularPausalIndef
  ?femDualPausalIndef
  ?masDualPausalIndef
  ?femPluralPausalIndef
  ?masPluralPausalIndef

WHERE {

  ?lexeme dct:language wd:Q13955 ;
    wikibase:lexicalCategory wd:Q147276 ;
    wikibase:lemma ?noun .

  # MARK: Nominative

  # Singular

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femSingularNominativeIndefForm .
    ?femSingularNominativeIndefForm ontolex:representation ?femSingularNominativeIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masSingularNominativeIndefForm .
    ?masSingularNominativeIndefForm ontolex:representation ?masSingularNominativeIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # Dual

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femDualNominativeIndefForm .
    ?femDualNominativeIndefForm ontolex:representation ?femDualNominativeIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q110022 ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masDualNominativeIndefForm .
    ?masDualNominativeIndefForm ontolex:representation ?masDualNominativeIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q110022 ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # Plural

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femPluralNominativeIndefForm .
    ?femPluralNominativeIndefForm ontolex:representation ?femPluralNominativeIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masPluralNominativeIndefForm .
    ?masPluralNominativeIndefForm ontolex:representation ?masPluralNominativeIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # MARK: Accusative

  # Singular

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femSingularAccusativeIndefForm .
    ?femSingularAccusativeIndefForm ontolex:representation ?femSingularAccusativeIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q146078 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masSingularAccusativeIndefForm .
    ?masSingularAccusativeIndefForm ontolex:representation ?masSingularAccusativeIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q146078 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # Dual

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femDualAccusativeIndefForm .
    ?femDualAccusativeIndefForm ontolex:representation ?femDualAccusativeIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q110022 ;
      wikibase:grammaticalFeature wd:Q146078 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masDualAccusativeIndefForm .
    ?masDualAccusativeIndefForm ontolex:representation ?masDualAccusativeIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q110022 ;
      wikibase:grammaticalFeature wd:Q146078 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # Plural

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femPluralAccusativeIndefForm .
    ?femPluralAccusativeIndefForm ontolex:representation ?femPluralAccusativeIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q146078 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masPluralAccusativeIndefForm .
    ?masPluralAccusativeIndefForm ontolex:representation ?masPluralAccusativeIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q146078 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # MARK: Genitive

  # Singular

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femSingularGanitiveIndefForm .
    ?femSingularGanitiveIndefForm ontolex:representation ?femSingularGanitiveIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q146233 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masSingularGanitiveIndefForm .
    ?masSingularGanitiveIndefForm ontolex:representation ?masSingularGanitiveIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q146233 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # Dual

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femDualGanitiveIndefForm .
    ?femDualGanitiveIndefForm ontolex:representation ?femDualGanitiveIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q110022 ;
      wikibase:grammaticalFeature wd:Q146233 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masDualGanitiveIndefForm .
    ?masDualGanitiveIndefForm ontolex:representation ?masDualGanitiveIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q110022 ;
      wikibase:grammaticalFeature wd:Q146233 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # Plural

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femPluralGanitiveIndefForm .
    ?femPluralGanitiveIndefForm ontolex:representation ?femPluralGanitiveIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q146233 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masPluralGanitiveIndefForm .
    ?masPluralGanitiveIndefForm ontolex:representation ?masPluralGanitiveIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q146233 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # MARK: Pausal

  # Singular

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femSingularPausalIndefForm .
    ?femSingularPausalIndefForm ontolex:representation ?femSingularPausalIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q117262361 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masSingularPausalIndefForm .
    ?masSingularPausalIndefForm ontolex:representation ?masSingularPausalIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q110786 ;
      wikibase:grammaticalFeature wd:Q117262361 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # Dual

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femDualPausalIndefForm .
    ?femDualPausalIndefForm ontolex:representation ?femDualPausalIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q110022 ;
      wikibase:grammaticalFeature wd:Q117262361 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masDualPausalIndefForm .
    ?masDualPausalIndefForm ontolex:representation ?masDualPausalIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q110022 ;
      wikibase:grammaticalFeature wd:Q117262361 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  # Plural

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?femPluralPausalIndefForm .
    ?femPluralPausalIndefForm ontolex:representation ?femPluralPausalIndef ;
      wikibase:grammaticalFeature wd:Q1775415 ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q117262361 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .

  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?masPluralPausalIndefForm .
    ?masPluralPausalIndefForm ontolex:representation ?masPluralPausalIndef ;
      wikibase:grammaticalFeature wd:Q499327 ;
      wikibase:grammaticalFeature wd:Q146786 ;
      wikibase:grammaticalFeature wd:Q117262361 ;
      wikibase:grammaticalFeature wd:Q53997857 ;
  } .
}

@andrewtavis
Copy link
Member

I guess it's more of a question of what forms do the proper nouns even have? There's doubtless a distinction between them and other nouns 🤔 But then I just checked the Czech proper nouns and there are tons for forms for every item 🤯 Surprising. I thought that names would just be names.

@andrewtavis
Copy link
Member

I'll give the proper nouns a check then and we can be good here :)

@DeleMike
Copy link
Contributor Author

Just as a help:


A proper noun is the name of a particular person, place, organization, or thing. For example, "London", "GitHub", "Scribe", "Michael", "Lagos". The difference between proper nouns and common nouns (other types of nouns) is Specificity & Capitalization. Examples of common nouns are: country, city, mountain, love, freedom, education.

@andrewtavis
Copy link
Member

I know, but there's no guarantee that the way a noun is modeled in a language is the same as how a proper noun is modeled :)

Copy link
Member

@andrewtavis andrewtavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the hard work here, @DeleMike! We're really making progress on these queries here, and this will set us up nicely to getting all the tests up and running 😊

@andrewtavis andrewtavis merged commit 999d862 into scribe-org:main Oct 16, 2024
5 checks passed
@DeleMike
Copy link
Contributor Author

I know, but there's no guarantee that the way a noun is modeled in a language is the same as how a proper noun is modeled :)

Oh I see! You are right 💯💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Accepted as a part of Hacktoberfest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Query Structure for Atomic SPARQL Queries
2 participants