Sparks of Cooperative Reasoning

Evaluating LLMs as strategic agents in the game of Hanabi.

Research Overview

*Under Review at ICLR 2026 Accepted to NeurIPS 2025 LAW & ICML 2025 MAS Workshops*

This research evaluates the capabilities of Large Language Models (LLMs) to act as strategic agents in Hanabi, a cooperative card game that requires complex Theory of Mind (ToM) reasoning.

Key Contributions

  • Benchmarking: Evaluated 15+ state-of-the-art LLMs (including GPT-4, Claude, Llama) on their ability to cooperate and reason about hidden information.
  • Dataset Release: Created and released a Reinforcement Learning from AI Feedback (RLAIF) dataset containing dense move ratings generated by advanced LLMs.
  • Prompt Engineering: Demonstrated that prompting agents with explicit deductive reasoning steps significantly improves their cooperative zero-shot performance.

Tech Stack

  • Large Language Models (LLMs)
  • Game Theory / Multi-Agent Systems
  • Reinforcement Learning (RLAIF)