Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

References in Danish (or other languages?) #207

Open
fnatsger opened this issue Jan 30, 2023 · 2 comments
Open

References in Danish (or other languages?) #207

fnatsger opened this issue Jan 30, 2023 · 2 comments

Comments

@fnatsger
Copy link

When referencing in Danish, we use "I" instead of "In". This causes the "I" to show up in titles etc. in the exported references, which you then have to manually remove.
A solution might be to either include other languages in the model, add a label to designate these translations or to have an option to delete them.

@inukshuk
Copy link
Owner

The model already includes different languages and we're happy to add more, since the default model aims for versatility. (Obviously, if you know your data set is guaranteed to be monolingual it may always be beneficial to use a custom model).

Ideally, we should add a handful of Danish references featuring the I in core.xml. If the usage is similar to the in of English references then the I will typically be at the start of editor or container-title tags so adding samples to the training set will help the model use I as good marker for both of these.

And then we should also make the names and title normalizers aware of this fact. Since "I " could easily occur at the start of titles or names in other languages this is a little more tricky. If you could post some real-world examples maybe we can come up with some good rules (e.g., in the editors tag we could strip it only in combination with common ways to designate editors; in container titles we could look for similar syntactical patterns).

In general, with the normalizers we don't need to worry too much either way, because they can be tweaked at runtime.

@fnatsger
Copy link
Author

Here is an example:

Harvey, P. (2018). 3. Infrastructures in and out of Time: The Promise of Roads in Contemporary Peru. I N. Anand, A. Gupta, & H. Appel (Red.), Promise of Infrastructure (s. 80–101). Duke University Press.

N. Anand is imported as I.N. Anand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants