TheMaterialParser is Rail based web application that allows you to semi-automatically extract material compositions from PDF documents. A manual is available online.
This project that has been led in the context of a research project at the Ecole Des Mines de Saint-Etienne, among the SMS research group.
Its original motivation is the fact that material manufacturers worldwide usually provide datasheets for their material products as unstandardized PDF datasheets. However, being able to gather and unify those data from different sources to be able to input them, for example, in statistical processes can be very important in the context of real experiments, because they are products that can be bought directly to providers.
Extract empirical material information from PDF publications and books can also be interesting.
This project is highly dependent on the Tabula project. See process section.
- Create and manage documents categories.
- Upload PDF documents to the server.
- Apply our Tabula-based process to datasheets subsets.
- Save results to a relationnal database, and consult and dowload them as .csv.
- Support for properties other than composition.
- Support for other language to look for valid data.
- Auto-detect potential properties locations.
Our process is based directly on the Tabula project, and more precisely its tabula-java core. Basically, you can perform several selections on any number of selected datasheets. The algorithm then tries to extract potential composition tables from all the datasheets with all the selections using Tabula. A composition is finally considered as valid if it is composed of valid elements from the periodic table. For now, only english element names (and official symbols) are supported.
Go to the directory in which you want to install TheMaterialParser, and run :
TheMaterialParser uses the jRuby implementation of Ruby 2.5.0
. This allows us to call tabula-java
from our Ruby code.
In order to install the good jRuby interpreter, you can use rvm. Check the rvm documentation to know how to install it. Once installed, run :
rvm install jruby-9.2.5.0
rvm use jruby-9.2.5.0
Go at the root of the directory that you previously cloned, and run the following command to install dependencies :
jruby -S bundle install
Then, to create the sqlite3 embedded database :
jruby -S rails db:create db:migrate
Finally, run :
jruby -S rails server
The app will now be accessible from your web browser at http://localhost:3000. When you need to re-launch the app, you just need to use this last command from the root directory of the project.
Copyright 2019 Paul Breugnot. Available under MIT License. See LICENSE.