-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Wikifunctions Beta for generating default forms #140
Comments
It would be helpful to have some examples for these.
Would this be part of the templates (hard-coded for each template) or part of the user interface (users can specify arbitrary ZIDs)? |
Here are a few examples:
The functions would be hardcoded for each template and each form. The end user would not need to see ZIDs nor would they need to select functions. They are created once when the template is created and stored with the template. I made some slideware to show what I mean: |
Here's an example page of how the templates could look like for German female substantives: https://www.wikidata.org/wiki/Wikidata:Wikidata_Lexeme_Forms/German_with_ZID |
Does this sound like something you'd want to have in the tool? |
Maybe, yes. But at least while it’s targeting the Beta cluster, it should probably be an opt-in setting, which means I should finish the |
I guess we can rethink this now that non-Beta Wikifunctions launched :D (and also allows anonymous function calls, because limiting the feature to users who have a certain user right would be a bit annoying) So: optionally, a non-first¹ form in a template can specify a Wikifunction that’s called whenever the first form¹ changes, to generate the other forms? I think I’d only want to do this in the JS frontend, so that users can see (and potentially correct) the output before submitting it to the server; we should probably show some kind of loading indicator next to the first form¹ while making the function calls, so the user knows that they should wait instead of starting to enter the next forms. (Actually, users are probably already waiting for a little bit to see if the duplicate warning shows up or not. I wonder if the duplicate warning should also come with a loading indicator, so that you have an indication when it finished looking for duplicates and none were found 🤔) ¹ note that there’s also a branch (@nikkiwd did you ever get around to testing it?) to make the “lemma” form not necessarily be the first form, in which case the function input should probably also be the lemma, not necessarily the first form? |
I wonder what’s better: making the function call API requests from JS or from Python?
I think I’ll try the Python approach first and see how bad it is to have to wait for all the evaluations to finish. (I can test it for one of the really long templates, like Czech adjectives – the functions can be fake.) |
Very quickly slapped together, doesn’t work with lemmas that aren’t first forms yet (which I keep forgetting are already possible without the lemma branch, in advanced mode). Proof of concept. The workflow isn’t great either. At the moment, this fires as soon as you “commit” a change to the first input, and when the results come in, overwrites any inputs that are empty. But this means that, if you go back and change the lemma, the recomputatiaon won’t update anything, since all the inputs will still be nonempty with the computation. So the computation should probably track what the value of each input at the beginning of the computation was, and update it when it’s still the same afterwards. (The point of the check at all is that I don’t want to overwrite what the user did if they started typing in the meantime.) Also the console.log should be something properly visible (spinner?). See #140.
Pushed a proof-of-concept on the The string identity function is nice for initial testing, but are there any functions on Wikifunctions yet that we could use for real templates? (Also, what’s a good verb for this feature? I went with “compute” in the above commit, but that was just the first thing that came into my head.) |
It's effectively instantaneous for me, so I would find a loading indicator more annoying than useful. If you really want to add something like that, I think it would make more sense to display the final state (no duplicates found, duplicates were found, or wasn't able to check for duplicates).
I didn't 😔 Maybe you can bug me about it again after WikidataCon? 😅 By the way, a couple of years ago I made myself a little browser extension which generates forms - https://github.com/nikkiwd/extension-lexeme-forms. It's essentially the same concept as this, but with the functions in the extension. I never finished adding tests or cleaning up the uncommitted changes and local branches, so I'm still not totally happy with it, but it's been working well enough for my purposes. The way I did it, I add buttons to the top right. Most of the time there's only one, but sometimes I have multiple for different types of declension. Clicking on a button takes the input of the first field (since that's what the Lexeme Forms tool currently uses as the lemma, or in edit mode it can also find the lemma from the heading) and calls the function associated with that button. The function returns an array of forms and those are used to fill in any gaps in the template (it doesn't overwrite anything). Then I can check it and make any changes that are needed before submitting it. Here's what it looks like: For English, it says "guess forms" since there are plenty of irregular plurals it will get wrong, whereas for Esperanto it says "generate forms" because those forms are regular. (Probably a subtle difference but that's why the text isn't the same) For German, since there are various ways to form plurals, the buttons are labelled based on the suffixes used for the genitive singular and nominative plural forms. |
Hm, I see. I was wondering if we could use the Wikilambda function labels for the buttons (so that they can be translated into the user language), but looking at that screenshot I don’t think that would work out… so let’s make it part of the template (in the template language), I guess. Something like: 'template-name': {
# ...
'forms': [
{
'label': '...',
'example': '...',
'grammatical_features_item_ids': [...],
'wikifunctions': {
'-s/-n': 'Z12345',
'-s/-s': 'Z123456',
# ...
},
},
],
}, I also wonder whether we can already accommodate non-first forms as the lemma. Currently, you can use advanced mode to create a plurale tantum lexeme with the first plural form as the lemma. You could imagine having a function that generates, say, the genitive plural from the nominative plural – but this would presumably be a different function than the one that generates the genitive plural from the nominative singular. So the specification would be some sort of set of wikifunctions specifying the input forms for them… on second thought, let’s leave that out for now and start with the simpler case ^^ |
Each template form can optionally list one or more Wikifunctions that can be used to generate this form from the first form. (There is no support for generating forms from any other forms yet, even though that may be useful for e.g. pluralia tantum. Maybe later.) A new API (internal, as far as I’m concerned, though I’m not preventing anyone else from using it I suppose) can be used to call all the functions and generate the other forms, returning them together – this way, the network traffic for the (potentially many) function calls happens inside the Wikimedia network and should thus be faster. At the moment, users have to opt into this feature, by creating the page [[Special:MyPage//wikidata-lexeme-forms-opt-into-wikifunctions.js]] [1] – the .js in the title ensures that, apart from interface admins, nobody else can opt a user into or out of the feature against their will (user JS pages are protected by default). I expect this will be made available to everyone before long. Only the English nouns template has a function added yet, but I’ll probably try to add more quite soon (the catalogue [2] lists some more for other languages). Part of #140. [1]: https://www.wikifunctions.org/wiki/Special:MyPage/wikidata-lexeme-forms-opt-into-wikifunctions.js [2]: https://www.wikifunctions.org/wiki/Wikifunctions:Catalogue
Alright, I added the functions for Croatian nouns, and doing all the function calls is quite a bit faster on Toolforge than on local development, so I think the reduction in runtime from doing all the calls in Python is real – I’ll keep this approach then. (Also, with the button as suggested by @nikkiwd rather than the automatic action I had in mind, I think it’s more acceptable to have the user wait until the results are there – they’re waiting for the result of an action they explicitly initiated.) |
Needs to be checked with a Croatian speaker (probably Denny) – I notice for “miš”, not all the results match the placeholders in the template – but as long as this is experimental and opt-in, I think this is okay to already add, so I can demo it :)
IIUC they only work for regular words, but let’s still have them for now, I think. Part of #140.
An experimental version of this is now deployed (opt in by creating this Wikifunctions user JS page); see the documentation, and the announcement for some next steps. |
German will have several buttons for different declensions (as seen in @nikkiwd’s comment on #140), so to keep the button labels short without making the feature impossible to understand, introduce an optional wikifunctions_intro string that can be put before the individual buttons. (A test ensures that templates without wikifunctions aren’t allowed to have a wikifunctions_intro either.) To make enough space for this intro, move the buttons below the heading (and no longer floating right). Also, put a noscript hint after the intro.
With that, we have a pretty good set of German feminine noun Wikifunctions, I think: Katze→Katzen; Kamera→Kameras; Kuh→Kühe. (Also fix an inaccurate label of an earlier set of Wikifunctions.) Some of these functions can also be used for other German nouns, but I’ll figure that out later – the demo of the mini hackathon (https://nl.wikimedia.org/wiki/Mini_Hackathon_November_2023) is approaching! \o/ Part of #140.
-en/-um isn’t the most common case, but it’s relatively simple. More will follow as part of #140. However, this does require support for slashes in the function name, so swap the function name and lemma around in the API URL and instruct Flask to parse the function name as a “path” (meaning it can contain slashes) rather than the lemma. Note that the frontend JS code was already splitting the input on slashes and only sending the first one to the API, so there was no point for the backend to support slashes there; if there really is a use case for slashes in the very first form (I doubt it, to be honest – ffb3d07 didn’t really justify it), then I suppose we can handle it in the JS code by just making several consecutive API calls.
Still not done but getting closer. Part of #140. Example word for -s/-s: Taxi; example word for -s/-n: Römer.
The last from https://github.com/nikkiwd/extension-lexeme-forms. Test word: Schaf. Part of #140.
I was just totally wrong when I wrote in the previous commit that that was the last set of neuter functions from the extension; I missed this one. Still part of #140. Example words: Kind, Kalb, Haus, Loch, Buch…
Proposal / Feature request
Regular forms can often be generated by a function. Wikifunctions Beta already has a number functions that in most cases generate the right form for a given lemma. The suggestion is:
A warning that the forms should still be checked would be good, as these functions are rarely always right.
A JavaScript function that runs in the browser and can call a function on Wikifunctions can be found here:
https://github.com/vrandezo/formcheck/blob/e0aa0e6bca94f35f66877e485df1aea6d529ecbd/index.html#L600
(I wanted to code this up myself and send a pull request, alas, I couldn't get lexeme-forms to run. Sorry)
The text was updated successfully, but these errors were encountered: