Huggingface Model Search Tool

This repository provides tools to search Huggingface models based on specific keywords and extract metadata from the model cards. It is designed for users interested in fields such as software engineering, programming, development, and other technical domains, leveraging the Huggingface API for seamless searching and metadata extraction.

Features

Keyword-Based Search: Search for models containing terms related to software engineering and development.
Filter by Research Papers: Extract models linking to papers (e.g., Arxiv) and obtain additional information such as Arxiv codes.
Concurrent Processing: Multithreading is used to handle a large volume of models efficiently.
Model Card Metadata Extraction: Extracts detailed metadata from model cards, aiding researchers and developers in finding models relevant to their work.
Excel Export: Saves results in an Excel file for further analysis.

How it Works

Jupyter Notebooks

This repository contains Jupyter notebooks for an interactive experience, allowing for model searching, metadata extraction, and CSV cleaning:

Search Terms: A predefined list of terms related to software engineering and development, including:
- "software engineering", "programming", "development", "MLOps", "DevOps", etc.
Huggingface API: Uses the HfApi from Huggingface to fetch models with metadata, including tags and links to papers.
Concurrent Processing: Threads are used to speed up data extraction, making the search and extraction process more efficient.
Paper Filter: Filters models that mention "arxiv" or "paper," pulling Arxiv codes if available.
Excel Output: The extracted data—including model name, matching keywords, download counts, and Arxiv codes (if available)—is saved to an Excel file for convenient analysis.

Example Usage

To search for models, interact with the cells in the notebook:

Search for models related to software engineering:

software_engineering_models = api.list_models(search="software engineering", cardData=True, sort='downloads', direction=-1)

Search for models related to software development:

software_development_models = api.list_models(search="software development", cardData=True, sort='downloads', direction=-1)

Search by specific tags:

software_engineering_models_tags = api.list_models(tags=["software engineering"], cardData=True, sort='downloads', direction=-1)

Applications

This tool is valuable for:

Researchers: Finding models linked with research papers for specific domains.
Developers: Discovering pre-trained models in technical domains.
Data Scientists: Exploring models by tags and metadata for suitable project models.

By leveraging model cards, users gain deeper insights into model training details and potential applications.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
code_gen		code_gen
search		search
.gitignore		.gitignore
README.md		README.md
ibm-granite-test pipeline.ipynb		ibm-granite-test pipeline.ipynb
ibm-granite-test.ipynb		ibm-granite-test.ipynb
model_selection.csv		model_selection.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Huggingface Model Search Tool

Features

How it Works

Jupyter Notebooks

Example Usage

Applications

About

Releases

Packages

Languages

jjosorioc/huggingface_search

Folders and files

Latest commit

History

Repository files navigation

Huggingface Model Search Tool

Features

How it Works

Jupyter Notebooks

Example Usage

Applications

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages