Skip to content
#

pymupdf

Here are 83 public repositories matching this topic...

UVA Data Science Capstone project for Internet Archive. This project aimed to classify PDFs as research or non-research documents using an image and text-based approach. For the image-based models, we leveraged CNN transfer learning and used XGBoost for text-based approach.

  • Updated May 7, 2021
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the pymupdf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pymupdf topic, visit your repo's landing page and select "manage topics."

Learn more