reward-models
Here are 11 public repositories matching this topic...
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
-
Updated
May 20, 2024 - Python
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
-
Updated
Apr 28, 2023 - Python
[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
-
Updated
Oct 12, 2024 - Python
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
-
Updated
Apr 28, 2023 - Python
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
-
Updated
Nov 19, 2024 - Jupyter Notebook
ZYN: Zero-Shot Reward Models with Yes-No Questions
-
Updated
Aug 15, 2023 - Python
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
-
Updated
Mar 9, 2024 - Python
GenRM-CoT: Data release for verification rationales
-
Updated
Oct 16, 2024
Official implementation for "MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge?"
-
Updated
Jun 7, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to the reward-models topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the reward-models topic, visit your repo's landing page and select "manage topics."