This is the Digital Science Center component of the SciDatBench Science Data Benchmarking project. It works with the MLPerf Science Data Experimental Working Group.
The importance of Big Data is now recognized across a broad of scientific, societal, and commercial problems. Analysis of this data requires new research in both the data analysis methods and the information technology hardware and software to use in the analysis. SciDatBench is establishing a new collection of important and representative Big scientific datasets together with typical software implementations of the machine learning algorithms that are needed for best practice analysis. It generates particular instances and is establishing a sustainable process for maintaining and enhancing them. This collection includes both standalone examples and end to end examples needing multiple components that are seen in the analysis of many science experiments. SciDatBench is affiliated as an approved Science Data working group with the very successful MLPerf activity with 80 organizational members looking at Industry machine learning benchmarks. The state-of-the-art examples in SciDatBench are contributing to progress in scientific discovery that advances the national health, prosperity, and welfare, as stated by NSF's mission. The project is proactively involving under-represented communities in its activities.
The SciDatBench collection is accompanied by documentation allowing it to be used in the training of researchers in the rapidly evolving Big Data analysis techniques. SciDatBench pursues performance, quality, and pedagogical goals. The heart of the project is a set of virtual working group meetings associated with Science Data and other MLPerf activities of importance to SciDatBench. The project naturally impacts a broad range of scientific disciplines including eventually material sciences, environmental sciences, life sciences including epidemiology, fusion, particle physics, astronomy, earthquake, and earth sciences, with more than one representative problem from each of these domains. SciDatBench supports comparative studies and identifies requirements for future cyberinfrastructure to support scientific data analysis. The benchmarks not only record time to a solution but also multiple measures of the quality of the solution.
- Early deliverables include building a community interested in Science Data Benchmarks and MLPerf,
- Weekly working group meetings
- Jupyter notebook approach to accessing Science and he other MLPerf benchmarks
- Initial Benchmarks including many collected at the Rutherford Laboratory, UK by Tony Hey, and Jeyan Thiyagalingam
- Tutorial material built around benchmarks
Useful Links are
- The DSC SciDatBench project at Indiana University This GitHub Page
- Main MLPerf page MLPerf Home Page
- Join MLPerf and Working Groups Get Involved with MLPerf
- MLPerf Science Data Google Group
- Presentation on the Science Data working group activities
- Contact Geoffrey Fox
SciDatBench at IU is funded by NSF through an EAGER Grant NSF-OAC-2038007