The Chan Zuckerberg Initiative hosted a hackathon on Mapping the Impact of Research Software in Science (October 24-27, 2023). This repository serves as a project index page to facilitate the discovery and preservation of the output of this event.
Mapping the usage and impact of research software in science remains a challenge due to the lack of canonical data/ infrastructure and inconsistent software citation practices in the scientific literature. The lack of a “software citation graph” means that it’s very hard to answer questions such as:
- Which software tools are most frequently used by scientists in any given field?
- How does the use of open source compare to proprietary software in any given field?
- Are emerging new tools replacing legacy ones?
- What is the prevalent programming language in any given field?
- Which software tools should be part of a student’s computational curriculum?
- Which software projects should funders prioritize as critical infrastructure for science?
In recent years, several attempts have been made to answer these questions by mining the scientific literature, by analyzing electronic notebooks or code repositories. With this event, our goal is to convene practitioners in different areas of computer science / data science / ML, as well as organizations active in this space, to develop comprehensive datasets, methods, approaches, and resources to map the adoption and impact of research software in science (specifically scientific open source software).
Hackathon Participants: Please submit a pull request to this repo in which you edit this readme to include the appropriate link to your project repo, or add your repo to the appropriate list if it is not already present.
- Determining the citation intent for software repositories
- Identifying missing software citations with LLMs
- Tracing the dependencies of open source software mentioned in the biomedical literature
- Disciplinary differences in software usage and mention
- Gold dataset
- Bidirectional paper-repository traceability
- Improving tool mention clustering
- Linking research software to research organizations
The following is a list of preprints, papers, and other research outputs based on hackathon projects:
- Daniel Garijo, Miguel Arroyo, Esteban Gonzalez, Christoph Treude, and Nicola Tarocco. 2024. Bidirectional Paper-Repository Tracing in Software Engineering. In 21st International Conference on Mining Software Repositories (MSR ’24), April 15–16, 2024, Lisbon, Portugal. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3643991.3644876
- Andrew Nesbitt, Boris Veytsman, Daniel Mietchen, Eva Maxfield Brown, James Howison, João Felipe Pimentel, Laurent Hèbert-Dufresne, and Stephan Druskat. 2024. Biomedical Open Source Software: Crucial Packages and Hidden Heroes. arXiv, https://doi.org/10.48550/arXiv.2404.06672
- Ana-Maria Istrate, Joshua Fisher, Xinyu Yang, Kara Moraw, Kai Li and Donghui Li. 2024. Scientific Software Citation Intent Classification using Large Language Models. In 1st Workshop on Natural Scientific Language Processing and Research Knowledge Graphs (NSLP 2024), May 27, 2024, Hersonissos, Crete, Greece. https://github.com/NFDI4DS/nslp2024/blob/main/accepted_papers/NSLP_2024_paper_20.pdf
This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.
If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.