Skip to content

This script renames PDF files in a directory to their citation based on Chicago bibliography style and adds metadata to the PDF

License

Notifications You must be signed in to change notification settings

JGKarlin/re-namePDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

renamePDF.py

PDF Citation Renamer and Metadata Generator

Overview

This script renames PDF files based on their citation in Chicago bibliography style and adds the citation to the PDF metadata. It utilizes OpenAI's GPT-4o to generate the citation from the text extracted from the PDF files. If there is a disagreement between the citation based on the sample and the embedded metatdata or if there is any missing data from the citation, the script will confirm the complete and accurate citation with the Crossref API.

Features

  • Extracts text from the first 5 pages of PDF files (but this variable can be changed since it affects the number of tokens processed)
  • Generates a citation in JSON format using OpenAI's GPT-4o.
  • Renames PDF files to their citation in Chicago bibliography style.
  • Adds bibliographic metadata to PDF files.
  • Supports user choice for renaming files, adding metadata, or both.

Requirements

  • Python 3.6+
  • openai library
  • PyMuPDF (also known as fitz)
  • dotenv library
  • habanero library (for Crossref API acccess)

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/pdf-citation-renamer.git
    cd pdf-citation-renamer
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the required packages:

    pip install -r requirements.txt
  4. Set up your OpenAI API key:

    Create a .env file in the root directory of the project and add your OpenAI API key:

    OPENAI_API_KEY=your_openai_api_key_here
    

Usage

  1. Run the script:

    python renamepdf.py
  2. Follow the prompts:

    • Enter the directory path containing the PDF files.
    • Choose how to add bibliographic information:
      • 1: Rename files only
      • 2: Add metadata only
      • 3: Both rename files and add metadata
      • 4: Quit

Example

Enter the directory path containing the PDF files: /path/to/pdf/files
Found 5 PDF files in the directory.
Do you want to proceed with processing the files? (y/n): y

What bibliographic information would you like to add:
1. File name
2. Metadata
3. Both
4. Quit
Enter your choice (1, 2, 3, 4): 3

About

This script renames PDF files in a directory to their citation based on Chicago bibliography style and adds metadata to the PDF

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages