-
Objective:
- Build a PDF extractor to pull relevant details from CVs in PDF format, and match them against the job descriptions from the Hugging Face dataset.
-
Dataset Used
-
PDF Extractor(
01_pdf-data-extraction.ipynb
)-
PDF Extraction was done using
pdfplumber
library -
Python
regex
was used to extractSkills
andEducation
Sections from the extracted text. -
Finally the relevent information was stores in a csv file
pdf_extracted_skills_education.csv
-
Challenges Faced: Using
regex
I could extract theseSkills
andEducation
part. But it was hard to generalise this over resumes of different format. So, more research will be needed to efficently extract these. ExtractingExperience
was really tough, I couldn't think of anyregex
that could extractCompany_Name, Start_Date, End_Date
within multiple headers(i.e., people with multiple experiences) -
Propesed Solution: One thing that I know is training Custom Named Entity Recogniton(NER) but for this we need a custom tagged dataset for skills, Education, and Experience.
-
-
CV-JD Matching(
02_cv-jd-matching.ipynb
)- Got the JD dataset from hugging face
datasets
library. 15 JDs from the dataset were selected for this project. - Basic text cleaning like lower_case, removing punctuations/emails/phone_numbers was done on the extracted resumes.
- Tokenization and Embeddings for JDs & CVs were created using
DistilBertTokenizer, DistilBertModel
fromtransformers
library. - For matching CV-JD,
cosine_similarity
was used fromsklearn
library. - Finally, Top-5 Candidates were extrated for the respective Job Descriptions, acoording to the respective similarity score.
- Got the JD dataset from hugging face
-
Overall Challenges Faced:
- This was my first time working with
PyTorch
andtransformers
library - More than Modelling, Extracting the neccessary data is more tough as mentioned above in PDF Extractor.
- This was my first time working with
-
Notifications
You must be signed in to change notification settings - Fork 1
Extracting details from Resume(CVs) and matching with Job Description(JDs) using pretrained model like DistilBERT and ranking them using cosine similarity.
avr2002/CV-JD-Matching
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Extracting details from Resume(CVs) and matching with Job Description(JDs) using pretrained model like DistilBERT and ranking them using cosine similarity.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published