Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 446 Bytes

README.md

File metadata and controls

19 lines (12 loc) · 446 Bytes

This is a group project developed by a team of three individuals.

FineTune-LLM-OnlineRL

Game: Xiangqi

Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :

  1. Pre-trained LLMs are used as starting policy for RL agent
  2. Observations from environments are converted to text
  3. Text observations triggers an action and subsequently updates the RL agent’s policy

Other Methods Implemented:

  1. Random
  2. Greedy
  3. DQN
  4. DDQN