Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Wikifunctions Beta for generating default forms #140

Open
vrandezo opened this issue Dec 14, 2022 · 12 comments
Open

Use Wikifunctions Beta for generating default forms #140

vrandezo opened this issue Dec 14, 2022 · 12 comments

Comments

@vrandezo
Copy link

vrandezo commented Dec 14, 2022

Proposal / Feature request

Regular forms can often be generated by a function. Wikifunctions Beta already has a number functions that in most cases generate the right form for a given lemma. The suggestion is:

  1. to add an optional field on each form that holds the ZID of the function to call to generate a specific form from the first form (i.e. the lemma)
  2. once the first form is entered, either automatically or through an action call the given function and fill in the forms based on the generated results

A warning that the forms should still be checked would be good, as these functions are rarely always right.

A JavaScript function that runs in the browser and can call a function on Wikifunctions can be found here:

https://github.com/vrandezo/formcheck/blob/e0aa0e6bca94f35f66877e485df1aea6d529ecbd/index.html#L600

(I wanted to code this up myself and send a pull request, alas, I couldn't get lexeme-forms to run. Sorry)

@lucaswerkmeister
Copy link
Owner

Wikifunctions Beta already has a number functions that in most cases generate the right form for a given lemma.

It would be helpful to have some examples for these.

to add an optional field on each form that holds the ZID of the function to call to generate a specific form from the first form (i.e. the lemma)

Would this be part of the templates (hard-coded for each template) or part of the user interface (users can specify arbitrary ZIDs)?

@vrandezo
Copy link
Author

Here are a few examples:

The functions would be hardcoded for each template and each form. The end user would not need to see ZIDs nor would they need to select functions. They are created once when the template is created and stored with the template.

I made some slideware to show what I mean:
https://docs.google.com/presentation/d/1xSZNm4yaoICtPKX6l_mQwfUodpXTSepdpf8GKtapbDo/edit#slide=id.g1b9b89ec891_0_14

@vrandezo
Copy link
Author

vrandezo commented Dec 15, 2022

Here's an example page of how the templates could look like for German female substantives:

https://www.wikidata.org/wiki/Wikidata:Wikidata_Lexeme_Forms/German_with_ZID

@vrandezo
Copy link
Author

Does this sound like something you'd want to have in the tool?

@lucaswerkmeister
Copy link
Owner

lucaswerkmeister commented Jan 9, 2023

Maybe, yes. But at least while it’s targeting the Beta cluster, it should probably be an opt-in setting, which means I should finish the language-preference branch first (which introduces settings for the tool in general).

@lucaswerkmeister
Copy link
Owner

I guess we can rethink this now that non-Beta Wikifunctions launched :D (and also allows anonymous function calls, because limiting the feature to users who have a certain user right would be a bit annoying)

So: optionally, a non-first¹ form in a template can specify a Wikifunction that’s called whenever the first form¹ changes, to generate the other forms?

I think I’d only want to do this in the JS frontend, so that users can see (and potentially correct) the output before submitting it to the server; we should probably show some kind of loading indicator next to the first form¹ while making the function calls, so the user knows that they should wait instead of starting to enter the next forms.

(Actually, users are probably already waiting for a little bit to see if the duplicate warning shows up or not. I wonder if the duplicate warning should also come with a loading indicator, so that you have an indication when it finished looking for duplicates and none were found 🤔)

¹ note that there’s also a branch (@nikkiwd did you ever get around to testing it?) to make the “lemma” form not necessarily be the first form, in which case the function input should probably also be the lemma, not necessarily the first form?

@lucaswerkmeister
Copy link
Owner

I wonder what’s better: making the function call API requests from JS or from Python?

  • JS
    • API requests will be anonymous because CORS
    • no custom user agent (but referrer, at least)
    • tool can show each form as soon as it comes in
  • Python
    • API requests can be authenticated
    • tool-specific user agent
    • tool probably waits until all the forms are done before responding to JS
    • probably faster – all the MediaWiki API requests only travel between Toolforge and the production cluster, and the potentially longer round-trip time to the end user is only paid once
    • but the tool can only show the forms when they’re all done

I think I’ll try the Python approach first and see how bad it is to have to wait for all the evaluations to finish. (I can test it for one of the really long templates, like Czech adjectives – the functions can be fake.)

lucaswerkmeister added a commit that referenced this issue Oct 28, 2023
Very quickly slapped together, doesn’t work with lemmas that aren’t
first forms yet (which I keep forgetting are already possible without
the lemma branch, in advanced mode). Proof of concept.

The workflow isn’t great either. At the moment, this fires as soon as
you “commit” a change to the first input, and when the results come in,
overwrites any inputs that are empty. But this means that, if you go
back and change the lemma, the recomputatiaon won’t update anything,
since all the inputs will still be nonempty with the computation. So the
computation should probably track what the value of each input at the
beginning of the computation was, and update it when it’s still the same
afterwards. (The point of the check at all is that I don’t want to
overwrite what the user did if they started typing in the meantime.)
Also the console.log should be something properly visible (spinner?).

See #140.
@lucaswerkmeister
Copy link
Owner

Pushed a proof-of-concept on the compute branch; I think I’ll iterate on it a bit more tomorrow.

The string identity function is nice for initial testing, but are there any functions on Wikifunctions yet that we could use for real templates?

(Also, what’s a good verb for this feature? I went with “compute” in the above commit, but that was just the first thing that came into my head.)

@nikkiwd
Copy link
Contributor

nikkiwd commented Oct 28, 2023

(Actually, users are probably already waiting for a little bit to see if the duplicate warning shows up or not. I wonder if the duplicate warning should also come with a loading indicator, so that you have an indication when it finished looking for duplicates and none were found 🤔)

It's effectively instantaneous for me, so I would find a loading indicator more annoying than useful. If you really want to add something like that, I think it would make more sense to display the final state (no duplicates found, duplicates were found, or wasn't able to check for duplicates).

¹ note that there’s also a branch (@nikkiwd did you ever get around to testing it?) to make the “lemma” form not necessarily be the first form, in which case the function input should probably also be the lemma, not necessarily the first form?

I didn't 😔 Maybe you can bug me about it again after WikidataCon? 😅

By the way, a couple of years ago I made myself a little browser extension which generates forms - https://github.com/nikkiwd/extension-lexeme-forms. It's essentially the same concept as this, but with the functions in the extension. I never finished adding tests or cleaning up the uncommitted changes and local branches, so I'm still not totally happy with it, but it's been working well enough for my purposes.

The way I did it, I add buttons to the top right. Most of the time there's only one, but sometimes I have multiple for different types of declension. Clicking on a button takes the input of the first field (since that's what the Lexeme Forms tool currently uses as the lemma, or in edit mode it can also find the lemma from the heading) and calls the function associated with that button. The function returns an array of forms and those are used to fill in any gaps in the template (it doesn't overwrite anything). Then I can check it and make any changes that are needed before submitting it.

Here's what it looks like:

screenshot

For English, it says "guess forms" since there are plenty of irregular plurals it will get wrong, whereas for Esperanto it says "generate forms" because those forms are regular. (Probably a subtle difference but that's why the text isn't the same)

For German, since there are various ways to form plurals, the buttons are labelled based on the suffixes used for the genitive singular and nominative plural forms.

@lucaswerkmeister
Copy link
Owner

Hm, I see. I was wondering if we could use the Wikilambda function labels for the buttons (so that they can be translated into the user language), but looking at that screenshot I don’t think that would work out… so let’s make it part of the template (in the template language), I guess. Something like:

'template-name': {
    # ...
    'forms': [
        {
            'label': '...',
            'example': '...',
            'grammatical_features_item_ids': [...],
            'wikifunctions': {
                '-s/-n': 'Z12345',
                '-s/-s': 'Z123456',
                # ...
            },
        },
    ],
},

I also wonder whether we can already accommodate non-first forms as the lemma. Currently, you can use advanced mode to create a plurale tantum lexeme with the first plural form as the lemma. You could imagine having a function that generates, say, the genitive plural from the nominative plural – but this would presumably be a different function than the one that generates the genitive plural from the nominative singular. So the specification would be some sort of set of wikifunctions specifying the input forms for them… on second thought, let’s leave that out for now and start with the simpler case ^^

lucaswerkmeister added a commit that referenced this issue Oct 29, 2023
Each template form can optionally list one or more Wikifunctions that
can be used to generate this form from the first form. (There is no
support for generating forms from any other forms yet, even though that
may be useful for e.g. pluralia tantum. Maybe later.) A new API
(internal, as far as I’m concerned, though I’m not preventing anyone
else from using it I suppose) can be used to call all the functions and
generate the other forms, returning them together – this way, the
network traffic for the (potentially many) function calls happens inside
the Wikimedia network and should thus be faster.

At the moment, users have to opt into this feature, by creating the page
[[Special:MyPage//wikidata-lexeme-forms-opt-into-wikifunctions.js]] [1]
– the .js in the title ensures that, apart from interface admins, nobody
else can opt a user into or out of the feature against their will (user
JS pages are protected by default). I expect this will be made available
to everyone before long.

Only the English nouns template has a function added yet, but I’ll
probably try to add more quite soon (the catalogue [2] lists some more
for other languages).

Part of #140.

[1]: https://www.wikifunctions.org/wiki/Special:MyPage/wikidata-lexeme-forms-opt-into-wikifunctions.js
[2]: https://www.wikifunctions.org/wiki/Wikifunctions:Catalogue
@lucaswerkmeister
Copy link
Owner

Alright, I added the functions for Croatian nouns, and doing all the function calls is quite a bit faster on Toolforge than on local development, so I think the reduction in runtime from doing all the calls in Python is real – I’ll keep this approach then. (Also, with the button as suggested by @nikkiwd rather than the automatic action I had in mind, I think it’s more acceptable to have the user wait until the results are there – they’re waiting for the result of an action they explicitly initiated.)

lucaswerkmeister referenced this issue Oct 29, 2023
Needs to be checked with a Croatian speaker (probably Denny) – I notice
for “miš”, not all the results match the placeholders in the template –
but as long as this is experimental and opt-in, I think this is okay to
already add, so I can demo it :)
lucaswerkmeister referenced this issue Oct 29, 2023
Noticed while testing the functions for Croatian feminine nouns. To be
investigated further.
lucaswerkmeister added a commit that referenced this issue Oct 29, 2023
IIUC they only work for regular words, but let’s still have them for
now, I think. Part of #140.
@lucaswerkmeister
Copy link
Owner

An experimental version of this is now deployed (opt in by creating this Wikifunctions user JS page); see the documentation, and the announcement for some next steps.

lucaswerkmeister added a commit that referenced this issue Nov 4, 2023
German will have several buttons for different declensions (as seen in
@nikkiwd’s comment on #140), so to keep the button labels short without
making the feature impossible to understand, introduce an optional
wikifunctions_intro string that can be put before the individual
buttons. (A test ensures that templates without wikifunctions aren’t
allowed to have a wikifunctions_intro either.) To make enough space for
this intro, move the buttons below the heading (and no longer floating
right). Also, put a noscript hint after the intro.
lucaswerkmeister added a commit that referenced this issue Nov 4, 2023
lucaswerkmeister added a commit that referenced this issue Nov 4, 2023
With that, we have a pretty good set of German feminine noun
Wikifunctions, I think: Katze→Katzen; Kamera→Kameras; Kuh→Kühe. (Also
fix an inaccurate label of an earlier set of Wikifunctions.)

Some of these functions can also be used for other German nouns, but
I’ll figure that out later – the demo of the mini hackathon
(https://nl.wikimedia.org/wiki/Mini_Hackathon_November_2023) is
approaching! \o/

Part of #140.
lucaswerkmeister added a commit that referenced this issue Nov 19, 2023
-en/-um isn’t the most common case, but it’s relatively simple. More
will follow as part of #140.

However, this does require support for slashes in the function name, so
swap the function name and lemma around in the API URL and instruct
Flask to parse the function name as a “path” (meaning it can contain
slashes) rather than the lemma. Note that the frontend JS code was
already splitting the input on slashes and only sending the first one to
the API, so there was no point for the backend to support slashes there;
if there really is a use case for slashes in the very first form (I
doubt it, to be honest – ffb3d07 didn’t really justify it), then I
suppose we can handle it in the JS code by just making several
consecutive API calls.
lucaswerkmeister added a commit that referenced this issue Nov 19, 2023
Still not done but getting closer. Part of #140.

Example word for -s/-s: Taxi; example word for -s/-n: Römer.
lucaswerkmeister added a commit that referenced this issue Nov 19, 2023
lucaswerkmeister added a commit that referenced this issue Nov 19, 2023
I was just totally wrong when I wrote in the previous commit that that
was the last set of neuter functions from the extension; I missed this
one. Still part of #140. Example words: Kind, Kalb, Haus, Loch, Buch…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants