Layer repeats via a config file. #270

dnhkng · 2024-01-09T21:10:52Z

dnhkng
Jan 9, 2024

I really liked the discussion on Reddit localllama about the potential of easily creating Frankenstein models using exllama.

I think that would open up new areas of research without huge compute and memory requirements.

Could we start a discussion on 1) minimal Proof of Concept code modifications to test the ideas, and if the results are positive, 2) how we could make the inference efficient.

On the first topic, where in codebase would I start?

bjj · 2024-01-30T02:38:10Z

bjj
Jan 30, 2024

@epolewsky did this in https://github.com/epolewski/EricLLM

1 reply

dnhkng Jan 30, 2024
Author

Since I posted, it's been added to a few forks of Oobabooga (mine included).

turboderp · 2024-01-30T11:12:44Z

turboderp
Jan 30, 2024
Maintainer

I actually had my own version of this in mind before the recent PR. Not sure which direction to take this right now. I'm a little hesitant because the idea of easily creating Frankenstein models puts me off a little bit. I mean, I wouldn't be too surprised if HF starts banning merged models soon because they don't actually have an unlimited budget for hosting all the files.

I've yet to hear a compelling reason for why Frankenstein models should work in the first place, and I've yet to see an objective (or blind) test showing that they do. But they're definitely crowding the space, so... idk.

2 replies

dnhkng Jan 30, 2024
Author

I have objective results, just writing up a preprint.

Using various LLMs as judges (had to develop a good LLM Evaluation system in the process of doing this), there does appear to be benefits.

Happy to chat about the results privately.

I would like to have tried it on larger models, but the biggest I can test is on Yi34B models due to passing the data between GPUs on my dual 4090 system. I assume addressing that issue is what you mean with 'your own version'?

silphendio Feb 16, 2024

What do you think about incorporating LoRAs?

I posted my own version of this on reddit, and someone asked about the "possibility of laying a Lora trained on the static merge (say a 120B) on top of a dynamic 70B merge".

I think this could be a good next step.

But after taking a closer look at the source, hacking together a solution would be a bit more involved that just copying layers and changing layer_idx.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer repeats via a config file. #270

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Layer repeats via a config file. #270

dnhkng Jan 9, 2024

Replies: 2 comments · 3 replies

bjj Jan 30, 2024

dnhkng Jan 30, 2024 Author

turboderp Jan 30, 2024 Maintainer

dnhkng Jan 30, 2024 Author

silphendio Feb 16, 2024

dnhkng
Jan 9, 2024

Replies: 2 comments 3 replies

bjj
Jan 30, 2024

dnhkng Jan 30, 2024
Author

turboderp
Jan 30, 2024
Maintainer

dnhkng Jan 30, 2024
Author