This is a group project developed by a team of three individuals.
Game: Xiangqi
Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :
- Pre-trained LLMs are used as starting policy for RL agent
- Observations from environments are converted to text
- Text observations triggers an action and subsequently updates the RL agent’s policy
Other Methods Implemented:
- Random
- Greedy
- DQN
- DDQN