-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data subset pull script is problematic. #114
Comments
Thanks for the suggestion! It would definitely be easier if we could pull huggingface shards through their http server. My experience with this is that the http server eventually rate limits you, which is not the case for git lfs pulls. This was true last year and could be different now though. |
@Jackmin801 So I used this script to help a LOT of people download L3.1 405B when Git LFS timed out for them. You can also pass the HF read token from the env file to prevent rate limiting but I have never seen that with this script even pulling as anonymous. |
Ah ok. Lets change it then. Do you think you could write the PR for this? Otherwise I can do it tmr. |
I'm going to take a shot at the PR as I have integrated it before and use it extensively personally. |
@Jackmin801 I have submitted a PR and fixed a possible "security" issue introduced by the current ~/10B/H100.toml. It is best to verify that the subset logic is working as intended. |
Related to PR #111
The script assumes that the user has
Git LFS
installed but it's not in the documentation on how to install then setup then initializeGit LFS
. For most users this is very likely going to break if they attempt to use said script. A likely much better implementation would be to use this Github gist, https://gist.github.com/padeoe/697678ab8e528b85a2a7bddafea1fa4f#file-hfd-sh, integrated into the project as it uses onlygit
andaria2c/wget
as dependencies. There are also other advantages to integrating this script as you would not need to use a SSH keys as that adds complexity to the project and issues with ports as it's likely that port may be busy as I SSH into my cloud instance to setup prime.The text was updated successfully, but these errors were encountered: