diff --git a/README.md b/README.md index 0ec4d3d..75fb509 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ Available on PyPI! Simply `pip install scrapemed`. ScrapeMed is designed to make large-scale data science projects relying on PubMed Central (PMC) easy. The raw XML that can be downloaded from PMC is inconsistent and messy, and ScrapeMed aims to solve that problem at scale. ScrapeMed downloads, validates, cleans, and parses data from nearly all PMC articles into `Paper` objects which can then be used to build datasets (`paperSet`s), or investigated in detail for literature reviews. -Beyond the heavy-lifting performed behind the scenes by ScrapeMed to standardize data scraped from PMC, a number of features are included to make data science and literature review work easier. A few are listed below: +Beyond the heavy-lifting performed behind the scenes by ScrapeMed to standardize data scraped from PMC, a number of features are included to make data science and literature review work easier. A few are listed below: - `Paper`s can be queried with natural language [`.query()`], or simply chunked and embedded for storage in a vector DB [`.vectorize()`]. `Paper`s can also be converted to pandas Series easily [`.to_relational()`] for data science workflows. @@ -50,7 +50,7 @@ Beyond the heavy-lifting performed behind the scenes by ScrapeMed to standardiz ## Documentation -[ScrapeMed documentation](https://scrapemed.readthedocs.io/en/latest/) is hosted on Read The Docs! +The [docs](https://scrapemed.readthedocs.io/en/latest/) are hosted on Read The Docs! ## Sponsorship @@ -59,26 +59,4 @@ Beyond the heavy-lifting performed behind the scenes by ScrapeMed to standardiz If you'd like to sponsor a feature or donate to the project, reach out to me at danielfrees@g.ucla.edu. -## Developer Usage -*License: MIT* - -Feel free to fork and continue work on the ScrapeMed package, it is licensed under the MIT license to promote collaboration, extension, and inheritance. - -Make sure to create a conda environment and install the necessary requirements before developing this package. - -ie: `$ conda create --name myenv --file requirements.txt` - -Add a `.env` file in your base scrapemed directory with a variable defined as follows: `PMC_EMAIL=youremail@example.com`. This is necessary for several of the test scripts and may be useful for your development in general. - -You will need to install clang++ for `chromadb` and `Paper` vectorization to work. You also need to make sure you have `python 3.10.2` or later installed and active in your dev environment. - -***Now an overview of the package structure:*** - -Under `examples` you can find some example work using the scrapemed modules, which may provide some insight into usage possibilities. - -Under `examples/data` you will find some example downloaded data (XML from Pubmed Central). It is recommended that any time you download data while working out of the notebooks, it should go here. Downloads will also go here by default when passing `download=True` to the scrapemed module functions which allow you to do so. - -Under `scrapemed/tests` you will find several python scripts which can be run using pytest. If you also clone and update the `.github/workflows/test-scrapemed.yml` for your forked repo, these tests will be automatically run on `git push`. Under `scrapemed/test/testdata` are some XML data crafted for the purpose of testing scrapemed. This data is necessary to run some of the testing scripts. - -Each of the scrapemed python modules has a docstring at the top describing its general purpose and usage. All functions should also have descriptive docstrings and descriptions of input/output. Please contact me if any documentation is unclear. diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle index 16c6c9c..a98f0d7 100644 Binary files a/docs/build/doctrees/environment.pickle and b/docs/build/doctrees/environment.pickle differ diff --git a/docs/build/doctrees/index.doctree b/docs/build/doctrees/index.doctree index b3cfcd9..538513d 100644 Binary files a/docs/build/doctrees/index.doctree and b/docs/build/doctrees/index.doctree differ diff --git a/docs/build/doctrees/scrapemed.doctree b/docs/build/doctrees/scrapemed.doctree index 8048dbc..65cb5e1 100644 Binary files a/docs/build/doctrees/scrapemed.doctree and b/docs/build/doctrees/scrapemed.doctree differ diff --git a/docs/build/html/_modules/index.html b/docs/build/html/_modules/index.html index 3d07dde..d13c870 100644 --- a/docs/build/html/_modules/index.html +++ b/docs/build/html/_modules/index.html @@ -9,7 +9,7 @@ - + @@ -17,17 +17,17 @@ - + - +