A Python package that automatically downloads and manages images referenced in markdown files, storing them locally in an _attachments
folder. This script is particularly useful for maintaining local copies of images in markdown documentation and ensuring consistent image availability.
Or just for Obsidian's Readwise export, which I made this for.
Previously hosted on GitHub Gist. Moved here to allow for easier maintenance and contributions, if any. Also published to PyPI for convenience.
Python 3.9+
Install directly from PyPI using pip:
pip install markdown-image-downloader
Run the package from the command line, providing the folder containing your markdown files as an argument:
markdown-image-downloader <folder_name>
markdown-image-downloader ../Readwise/Articles
This will:
- Scan all markdown files in the
../Readwise/Articles
folder - Download any images referenced in the markdown files
- Store them in
../Readwise/Articles/_attachments
- Update the markdown files to reference the local copies
- Uses custom HTTP headers to avoid download blocks
- Downloads images from URLs referenced in markdown files
- Creates local copies of images in an
_attachments
directory - Automatically updates links in the markdown files with new local image paths
- Compresses large images to reduce storage space
- Supports multithreaded concurrent downloads
- Uses rate limit to prevent server overload and download blocks
- Progress bar for tracking download status
- Maintains detailed logging of error operations
- Sanitizes filenames for cross-platform compatibility
- Supports for rerunning the script without re-downloading images
- Scanning: The script scans all
.md
files in the specified folder for image references. - Downloading: For each image URL found:
- Downloads the image if it's not already in
_attachments
- Compresses images larger than 500KB while maintaining quality
- Generates unique filenames based on content hash
- Downloads the image if it's not already in
- Organization: Creates an
_attachments
folder to store all images - Updating: Updates markdown files to reference the local copies in
_attachments
- Automatically compresses large images
- Maintains reasonable quality through progressive compression
- Converts RGBA images to RGB with white background
- Preserves original filenames
- Sanitizes filenames for cross-platform compatibility
- Uses ThreadPoolExecutor for parallel downloads
- Includes progress bar for tracking downloads
- Implements rate limiting to prevent server overload
- Comprehensive logging of all operations
- Graceful handling of download failures
- Skips already processed images
The script creates detailed logs in a logs
directory:
- Location:
./logs/image_downloader.log
- Includes timestamps, operation details, and error messages
- New log file created for each run
- Only processes image links in markdown format:
![alt](url)
- Requires internet connection for downloading external images
- May be rate-limited or just straight denied by some servers
- SVG files are downloaded but not compressed
Feel free to submit issues, fork the repository, and create pull requests for any improvements.
This project is available under the MIT License.