IFU master 2024 01 24 #4

pnunna93 · 2024-01-27T00:38:25Z

This PR pulls upstream changes for 0.42.0 version.

Resolved merge conflicts - conflicts_diff.txt
Updated hipified files for new kernels and ops from upstream
Fixed build errors and ran unittests, summary shown below
Skipped failing unit tests for future review. There are around 200 additional tests added in functional module, which contributed to major chunk of new failures.

Unit test summary:
PreIFU:

Module	Passed	Failed	Skipped
autograd	1616	624	0
cuda_setup_evaluator	0	1	0
functional	270	23	43
linear8bitlt	9	9	0
modules	10	4	0
optim	125	26	26
triton	0	2	0
Total	2030	689	69

PostIFU:

Module	Passed	Failed	Skipped
autograd	1592	648	0
cuda_setup_evaluator	0	1	0
functional	313	233	54
linear8bitlt	9	9	0
modules	14	4	0
optim	124	27	26
triton	0	2	0
generation	8	8	0
linear4bit	32	0	0
Total	2092	932	80

Changed misleading Hardware requirements from "2018 or older" to "2018 or newer"

Added scipy to requirements.txt as it is used but not added to requirements

…yer-device Add `device` parameter to `Linear` subclasses and `Embedding`

Update README.md

Fix parameter name in error message

Add version attribute as per Python convention

…_permissionerror_order Make sure bitsandbytes handles permission errors in the right order

Added scipy to requirements.txt

Fix typo "quanitze"

fix array index out of bounds in kgetColRowStats

…dation#922)

…sbelkada-delete-workflow Delete .github/workflows/delete_doc_commment.yml

@TimDettmers

This PR adds initial FSDP support for training QLoRA models. It enables basic FSDP and CPU Offload support, with low memory training via FSDP.sync_module_states option unsupported. This PR builds off of bitsandbytes-foundation#840 commit 8278fca and BNB FSDP by @TimDettmers and @Titus-von-Koeller. An example of using this PR to finetune QLoRA models with FSDP can be found in the demo repo: AnswerDotAi/fsdp_qlora. * Minimal changes for fp32 4bit storage from BNB commit 8278fca * Params4bit with selectable storage dtype * possible fix for double quantizing linear weight & quant storage dtype * minor fixes in Params4bit for peft tests * remove redundant * add float16 * update test * Remove float16 quant cast as there are fp32, bf16, & fp16 quant kernels --------- Co-authored-by: Kerem Turgutlu <keremturgutlu@gmail.com>

…ytes-foundation#703), Sort compute capabilities sets to select max * Add support for CUDA 12.1 * Update README to include CUDA 12.1 version * Add support for >= 12x Co-authored-by: Jeongseok Kang <jskang@lablup.com> * Temporary version of bitsandbytes PR 527: Sort compute capabilities sets to select max * Modify PR 506 to support C++20 * Add Cuda 12.2 --------- Co-authored-by: PriNova <info@prinova.de> Co-authored-by: PriNova <31413214+PriNova@users.noreply.github.com> Co-authored-by: Jeongseok Kang <jskang@lablup.com>

…tion#975)

Remove redundant key

* Added install requirements to setup * Update setup.py Co-authored-by: Aarni Koskela <akx@iki.fi> --------- Co-authored-by: Aarni Koskela <akx@iki.fi>

…ation#983)

* implicitly skip any test that implicitly uses CUDA on a non-CUDA box * add a `requires_cuda` fixture

amathews-amd

LGTM

@Lzy17 , please review.

rapsealk and others added 30 commits April 25, 2023 17:00

fix: Replace libcudart with pytorch api

97b2567

fix: Get CUDA compiled version through pytorch

eb54c55

fix: Use raw int

9836b0b

fix: Remove unused code

f511026

fix: Get device's compute capability

2b4cc25

Update README.md

dae7041

Fix typo "quanitze"

6b26402

Added lookup table.

b7f04e2

Added debugging functions.

e54d273

Add device parameter to Linear subclasses

9cac5dd

Add device parameter to Embedding

db49ad4

Update README.md

ea0f793

Changed misleading Hardware requirements from "2018 or older" to "2018 or newer"

Added scipy to requirements.txt

237ad49

Added scipy to requirements.txt as it is used but not added to requirements

Initial 4-bit naive batch size 1, 81 vs 185.

f89ff93

Fixed missing Embedding export

c2494a6

Vectorized loads, conflict free NF4; 52 vs 172.

dfe6900

Added bfloat16 quantizations and tests.

02fd80c

Merge branch 'main' into fix/libcuda-to-torch

a24aae3

[BugFix] replace view+continuous with reshape

463630d

Added warp_shuffle indexing 185 vs 54.

7e49b5b

Turning optimization (float accumulation). 185 vs 50.

eefbf60

Added abitrary data types; fixed a bug for small matrices.

4b88d69

Added FP4 fast inference support.

94168d7

Added double quantization support and tests.

0f0390a

Fixed a bug where gemv_4bit would return a wrongly sized tensor.

6a905be

Added test for Param4bit.to() and fixed double quant behavior.

cef519c

Added fp32 compute type for gemv_4bit.

5fab673

Merge pull request bitsandbytes-foundation#469 from shadeMe/linear-la…

196d6f5

…yer-device Add `device` parameter to `Linear` subclasses and `Embedding`

Merge remote-tracking branch 'origin/inference'

5f492d4

Fixed Makefile and added CUDA 12.2 install.

73aa4e0

TimDettmers and others added 24 commits January 1, 2024 18:07

Merge pull request bitsandbytes-foundation#494 from pranavgitb11/main

b26454c

Update README.md

Merge pull request bitsandbytes-foundation#402 from alexrs/patch-1

ef4b079

Update README.md

Merge pull request bitsandbytes-foundation#832 from michaelmior/patch-1

bfb030f

Fix parameter name in error message

Merge pull request bitsandbytes-foundation#710 from rasbt/version-info

e8a42e4

Add version attribute as per Python convention

Merge pull request bitsandbytes-foundation#622 from fozziethebeat/fix…

6ba3e62

…_permissionerror_order Make sure bitsandbytes handles permission errors in the right order

Merge pull request bitsandbytes-foundation#525 from dulalbert/patch-1

8c5c668

Added scipy to requirements.txt

Merge pull request bitsandbytes-foundation#436 from akx/quanitze

947db7c

Fix typo "quanitze"

Merge pull request bitsandbytes-foundation#905 from LucQueen/outofbounds

3e70603

fix array index out of bounds in kgetColRowStats

Fixed bnb input in setup.py. Bumped version for release.

4870580

Move to CPU before attempting to convert to numpy. (bitsandbytes-foun…

d05b508

…dation#922)

initial doc-builder skeleton (bitsandbytes-foundation#965)

bdb2449

Delete .github/workflows/delete_doc_commment.yml

3bb7a1c

Merge pull request bitsandbytes-foundation#967 from TimDettmers/youne…

1b3d311

…sbelkada-delete-workflow Delete .github/workflows/delete_doc_commment.yml

disabled stale-bot, as requested by Tim

64a28d0

Quote folder and filename in find_file_recursive (bitsandbytes-founda…

3cefd82

…tion#975)

Update env_vars.py (bitsandbytes-foundation#951)

407a8d3

Remove redundant key

Added install requirements to setup (bitsandbytes-foundation#488)

1e64210

* Added install requirements to setup * Update setup.py Co-authored-by: Aarni Koskela <akx@iki.fi> --------- Co-authored-by: Aarni Koskela <akx@iki.fi>

Fix Remove redundant dependency keyword parameter (bitsandbytes-found…

53f8af8

…ation#983)

Tests: improve CUDA support detection (bitsandbytes-foundation#985)

f1c7574

* implicitly skip any test that implicitly uses CUDA on a non-CUDA box * add a `requires_cuda` fixture

Merge remote-tracking branch 'upstream/main' into IFU-master-2024-01-24

b1d484a

Update hip files with upstream changes

0e91e48

Skip failing tests for now

1295d53

pnunna93 requested review from Lzy17, amathews-amd, dllehr-amd and jpvillam-amd January 27, 2024 00:38

amathews-amd approved these changes Jan 29, 2024

View reviewed changes

amathews-amd merged commit 48b7fa9 into rocm_enabled Jan 30, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IFU master 2024 01 24 #4

IFU master 2024 01 24 #4

pnunna93 commented Jan 27, 2024 •

edited

Loading

amathews-amd left a comment

IFU master 2024 01 24 #4

IFU master 2024 01 24 #4

Conversation

pnunna93 commented Jan 27, 2024 • edited Loading

amathews-amd left a comment

Choose a reason for hiding this comment

pnunna93 commented Jan 27, 2024 •

edited

Loading