#

reward-models

Here are 11 public repositories matching this topic...

RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

llm rlhf reward-models llama3

Updated Nov 19, 2024
Python

jackaduma / Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

pytorch llama gpt lora finetune ppo peft vicuna llm chatgpt rlhf reward-models vicuna-7b

Updated May 20, 2024
Python

jackaduma / ChatGLM-LoRA-RLHF-PyTorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

pytorch llama gpt lora finetune ppo peft deepspeed llm chatgpt rlhf reward-models chatglm chatglm-6b

Updated Apr 28, 2023
Python

ExplainableML / ReNO

[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

text-to-image text-to-image-generation stable-diffusion reward-models

Updated Oct 12, 2024
Python

jackaduma / Alpaca-LoRA-RLHF-PyTorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

pytorch llama gpt lora alpaca finetune ppo peft deepspeed llm chatgpt rlhf reward-models

Updated Apr 28, 2023
Python

MJ-Bench / MJ-Bench

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

reward-models multimodal-foundation-model llm-benchmarking llm-as-a-judge multimodal-judge

Updated Nov 19, 2024
Jupyter Notebook

vicgalle / zero-shot-reward-models

ZYN: Zero-Shot Reward Models with Yes-No Questions

reinforcement-learning zero-shot llm rlhf reward-models trlx rlaif

Updated Aug 15, 2023
Python

tlc4418 / llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

deep-learning ensembles best-of-n large-language-models reinforcement-learning-from-human-feedback reward-models

Updated Mar 9, 2024
Python

genrm-star / genrm-critiques

GenRM-CoT: Data release for verification rationales

reasoning llm reward-models

Updated Oct 16, 2024

BillChan226 / MJ-Bench

Official implementation for "MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge?"

benchmark reward-models multimodal-large-language-models

Updated Jun 7, 2024
Jupyter Notebook

chrisliu298 / Skywork-Reward

Rank 1 and 3 reward models on RewardBench

alignment reward-models

Updated Oct 14, 2024

Improve this page

Add a description, image, and links to the reward-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the reward-models topic, visit your repo's landing page and select "manage topics."