This is a group project developed by a team of three individuals.

FineTune-LLM-OnlineRL

Game: Xiangqi

Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :

Pre-trained LLMs are used as starting policy for RL agent
Observations from environments are converted to text
Text observations triggers an action and subsequently updates the RL agent’s policy

Other Methods Implemented:

Random
Greedy
DQN
DDQN