Skip to content

Commit

Permalink
efactor(scraper): remove ParserManager and switch to mdformat for mar…
Browse files Browse the repository at this point in the history
…kdown conversion

Removed the `ParserManager` class and its associated file. Updated the `Scraper` class to use `mdformat` for converting HTML to Markdown directly, simplifying the process and reducing dependencies. Updated `requirements.txt` to include necessary `mdformat` packages.
  • Loading branch information
obeone committed Jun 13, 2024
1 parent a42aa30 commit e888329
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 381 deletions.
370 changes: 0 additions & 370 deletions parser_manager.py

This file was deleted.

7 changes: 6 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,9 @@ beautifulsoup4==4.12.3
coloredlogs==15.0.1
tqdm==4.66.4
requests==2.32.2
trafilatura==1.8.1
trafilatura==1.10.0
mdformat==0.7.17
mdformat-gfm==0.3.6
mdformat_footnote==0.1.1
mdformat_frontmatter==2.0.8
mdformat_tables==0.4.1
Loading

0 comments on commit e888329

Please sign in to comment.