Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Import from NVD data via GitHub - Script Added #1611

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Rishi-source
Copy link

@Rishi-source Rishi-source commented Oct 11, 2024

Summary

This pull request enhances the vulnerability data import by allowing data to be imported from a json data stored in the local directory NVD_Data. It also considers robust error handling to manage duplicate package entries and missing data. Additionally, the code supports storing a simplified "plain" package URL without qualifiers or subpaths.

Related Issues

#1437

Testing Instructions

Run the import process with the following command:

python manage.py import_data

Test importing data from a local folder i.e NVD_Data.

Check that duplicate package entries are gracefully handled and skipped.
Review the logs to ensure they accurately reflect each step of the process, including any errors or skipped files.
Verify that plain package URLs (without qualifiers and subpaths) are being correctly stored in the database.

Changes With New commit

Now NVD data is not included in the code directory I can directly import the Data by using Git Tree REST API and will import data from whole 6400 files.

Signed-off-by: Rishi Garg <rishigarg2503@gmail.com>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of reusing a local cache, but please DO NOT include the NVD data here.

@Rishi-source
Copy link
Author

I have tried to import the data directly from the github api but that was quite a slow process and also it was not importing full data from almost 6400 files it was able to import from 1800 files.Maybe there is rate limit problem of github api. So what would you recommend to do in this situation.

Rishi-garg03 and others added 2 commits October 11, 2024 20:43
@Rishi-source
Copy link
Author

@pombredanne The recent changes are working fine you just have to put your GitHub PAT in the code and you can run python manage.py import_nvd_data the import will start and when I have runned the script to import the data on my local machine then almost cumulatively 25k data entries were created in the Package Related Vulnerabilities , package, vulnerability,vulnerability reference Models.

@Rishi-source Rishi-source changed the title Data Import from NVD data Script Added Data Import from NVD data via GitHub - Script Added Oct 12, 2024
@Rishi-source Rishi-source marked this pull request as draft October 17, 2024 16:01
@Rishi-source Rishi-source marked this pull request as draft October 17, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants