Releases: cthoyt/cthoyt.github.io
Blog Post: Easy ORCID
The Open Researcher and Contributor Identifier (ORCID) database is an invaluable resource that supports the unambiguous identification of researchers. However, its first party data dump is too complex, verbose, and unstandardized for many use cases. This post describes open source software I wrote that automates downloading, processing, and exporting ORCID into a more usable form. I put the results on Zenodo under the CC0 license.
Blog Post: Discussions and Follow-ups from Biocuration 2024
I've just returned from the 17th Annual International Biocuration Conference at the Indian Biological Data Centre (IBDC) in Faridabad, India. I wanted to highlight some of the interesting conversations I had while I was there, and ideas for follow-up. Most were centered around the Bioregistry and the Semantic Mapping Assembler and Reasoner (SeMRA), which I gave an oral presentation on.
What's Changed
Full Changelog: books-2023...biocuration2024-discussions
Blog Post: Books I Read in 2023
Spoilers: it's a lot of Brandon Sanderson
Blog Post: Unlocking UMLS
The Unified Medical Language System (UMLS) is a widely used biomedical and clinical vocabulary maintained by the United States National Library of Medicine. However, it is notoriously difficult to access and work with due to licensing restrictions and its complex download system. In the same vein as my previous posts about DrugBank and ChEMBL, this post describes open source software I’ve developed for downloading and working with this data. It also works for RxNorm, SemMedDB, SNOMED-CT, and any other data accessible through the UMLS Terminology Services (UTS) ticket granting system.
Blog Post: Reproducibility Pilot in the Journal of Cheminformatics
I’ve been working on improving reproducibility in the field of cheminformatics for some time now. For example, I’ve written posts about making data from DrugBank and ChEMBL more actionable. Over the last year, I’ve been preparing a concept with the editors of the Journal of Cheminformatics on how to include an assessment of reproducibility to reviews of manuscripts submitted to the journal. This has resulted in an editorial Improving reproducibility and reusability in the Journal of Cheminformatics as well as a call for papers. In this post, I want to summarize the first generation review criteria we developed, give an example of it applied in practice.
Blog: Querying Journals and Publishers in Wikidata
This post is about three SPARQL queries I wrote to get bibliometric information about journals and publishers out of Wikidata.
Blog: Modeling and Querying Awards in Wikidata
I was recently nominated for the International Society for Biocuration’s Excellence in Biocuration Early Career Award. This made me curious about how to model nominations and awards on Wikidata. In this post, I’ll describe how to curate awards, nominations, recipients, and how to make SPARQL queries to get them.
View the full post here.
Blog: Re-implementing the N2T ARK (Meta)Resolver
n2t-ark-resolver Update 2023-04-11-n2t-ark-resolver.md