FineTune-LLM-OnlineRL

This is a group project developed by a team of three individuals.

Game: Xiangqi

Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :

Pre-trained LLMs are used as starting policy for RL agent
Observations from environments are converted to text
Text observations triggers an action and subsequently updates the RL agent’s policy

Other Methods Implemented:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
RL_Project.ipynb		RL_Project.ipynb

Provide feedback