Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the code to handle a selection of Wikidata items differently #30

Open
egonw opened this issue Oct 5, 2020 · 7 comments
Open

Comments

@egonw
Copy link
Member

egonw commented Oct 5, 2020

Starting with the Wikidata items for cortisol, which has two Wikidata items because there are two Wikipedia pages for them.

The code should check if a mapping is added for wikidata:Q26981430 and then to be replaced by wikidata:Q190875

@Chris-Evelo
Copy link

WikiPedia actually has a (user) mechanism to disambiguate such double pages, is it possible to use that to automate this process? Doesn't sound like a good idea to enter all these exceptions manually.

@egonw
Copy link
Member Author

egonw commented Oct 5, 2020

Plz check the Wikipedia pages.

@Chris-Evelo
Copy link

I think we are talking about different things. I meant a general mechanism using WikiPedia's disambiguation methods, if these are available to us, not specific pages. But I am also not sure which specific pages you want me to check.

@egonw
Copy link
Member Author

egonw commented Oct 5, 2020

Sorry, Chris, you lost me. I guess I do not understand your comment. Can you explain why Wikipedia disambiguation pages are relevant here?

@Chris-Evelo
Copy link

What I understood is that the problem of multiple Wikidata items occurs because there are multiple Wikipedia entries for the same compound. I thought that using Wikipedia's own mechanism for disambiguation (the mechanism that cerates those pages, not the pages themselves I would think)might be useful to detect such instances and to find out what the main Wikipedia entry should be and thus which Wikidata entry to use and not use. I have no clue whether that is feasible though.

@egonw
Copy link
Member Author

egonw commented Oct 5, 2020

The made a deliberate choice here. Wikipedia writes: "Hydrocortisone is the name for the hormone cortisol when supplied as a medication." and on the other page "When used as a medication, it is known as hydrocortisone."

@Chris-Evelo
Copy link

Which makes it a "scientific lenses" problem, right? If looking at a biological pathway you would use "cortisol" if looking at drug extensions in a network, e.g. via CyTargetLinker, you would use "hydrocortisone".

But you probably meant to say, "it is not practical to automate tracing such cases". Yes, if this is typical then I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants