Code for the paper:
TODO: Add paper title, authors, venue, and link here
TODO: arxiv / proceedings link
This repository implements a multi-player bilateral negotiation benchmark for evaluating AI negotiation agents. Players take turns proposing joint actions to partners in a round-robin schedule. Each proposal is either accepted (if it weakly improves the partner's payoff) or rejected. The benchmark supports exact solvers, heuristic methods, MCTS, dynamic programming lookahead, and LLM-based agents, and includes a procedural game generator for sweeping across diverse negotiation scenarios.
.
├── config/
│ └── game_configs.py # Game generation and satisfaction mask utilities
├── core/
│ ├── equilibrium.py # Nash equilibrium checking and regret computation
│ ├── game_logic.py # Payoffs, goal satisfaction, offer generation
│ ├── game_state.py # NegotiationState class (turn order, policy matrix)
│ └── one_shot_optim.py # Baseline and MIP-based one-shot optimisation
├── methods/
│ ├── baselines.py # LLM-based negotiation agent (OpenAI API)
│ ├── mcts.py # MCTS and DP-based partner selection
│ └── negotiation.py # Offer search, value estimators, exact solvers
├── experiments/
│ └── runner.py # Single-game runner and multi-method experiment loop
├── main.py # Local parallel sweep runner
└── README.md
pip install numpy cvxpy scipy joblib tqdm pandas openai tenacityMOSEK is required for the MIP-based methods (
optimize_P_via_masks_with_NEandbest_offer_linear_mip). A free academic licence is available at mosek.com.
For the LLM baseline, set your OpenAI API key:
export OPENAI_API_KEY="your-key-here"Each game is defined by:
- G — a
(N_GOALS, N_PLAYERS)matrix whereG[g, p]is playerp's valuation of goalg. - Policy matrix P — a binary
(N_PLAYERS, N_ACTIONS)matrix whereP[p, a] = 1means playerphas committed to actiona. Actions are binding: bits can only flip from 0 → 1, never 1 → 0. - Satisfaction masks — one binary matrix per goal indicating which
(player, action)pairs are required to satisfy it. - Goal types — goals are either linear (satisfaction scales with the fraction of required actions taken) or binary (satisfied only when all required actions are taken).
Players negotiate in a shuffled round-robin order. On each turn, the current proposer selects a partner and proposes a joint action. The partner accepts if the offer weakly improves their estimated terminal payoff; otherwise the turn is rejected. The game ends after all scheduled turns.
| Method | Description |
|---|---|
reward |
Greedy offer maximising proposer payoff; random partner selection |
upper |
Greedy offer using an optimistic upper-bound value estimator |
lower_tighter |
Greedy offer using a tighter pessimistic lower-bound estimator |
LLM_full |
LLM agent (GPT-4o-mini) selecting partner and offer from raw game state |
Methods are passed as configuration dicts to the runner. The how_fallback key selects the value estimator; MCTS is used for partner selection when n_sims > 0.
TODO: Add precise description of what Figure 1 and Table 1 show once the paper link is confirmed.
The results are generated by running the full parameter sweep in run_cloud.py (or run_local.py for local execution). The sweep covers:
| Parameter | Values |
|---|---|
| Structure type | adversarial, cooperative |
| Binary fraction | 0.0, 0.15, 0.30, 0.50 |
| Latent factors (k) | 5, 15 |
| Zipf complexity | 1.6, 3.0 |
| Payoff shift | negative, positive, balanced |
| Game size | small (exact solver), large (baseline) |
| Seeds | 0–49 |
This produces 9,600 tasks in total.
python run_local.pyResults are saved to ./results/<size>_<shift>_games/<uuid>.pkl.gz. Uses all available CPU cores via joblib.
Results are saved to the negotiation-results-vol Modal Volume. To download:
modal volume get negotiation-results-vol /root/cloud_data/<folder> ./local_resultsEach .pkl.gz file contains a dict with a single "results" key:
import gzip, pickle
with gzip.open("path/to/file.pkl.gz", "rb") as f:
data = pickle.load(f)
# data["results"] is a dict keyed by (method_name, game_name)
# Each value contains "payoff_vector", "sum_payoff", "is_equilibrium", etc.from config.game_configs import ScenarioProfile, generate_game_config, create_sat_masks
profile = ScenarioProfile(
structure_type="adversarial", # or "cooperative"
binary_fraction=0.2, # fraction of goals that are binary
complexity_zipf_a=2.0, # Zipf shape for goal complexity (must be > 1)
)
game_config = generate_game_config(
n_players=5,
country_idx2num_actions={i: 4 for i in range(5)},
n_goals=10,
k_factors=3,
seed=42,
profile=profile,
shift="negative", # "negative", "positive", or None
inject_pp=False, # set True to inject a poison-pill scenario
)
sat_masks = create_sat_masks(game_config)from experiments.runner import run_experiment
method_configs = [
{
"name": "reward",
"how_fallback": "reward",
"n_sims": 0,
"c_ucb": 1.0,
"use_prior": False,
"max_changes": 2,
"dp_k": 0,
"k": 1,
},
{
"name": "upper",
"how_fallback": "upper",
"n_sims": 0,
"c_ucb": 1.0,
"use_prior": False,
"max_changes": 2,
"dp_k": 0,
"k": 1,
},
]
results = run_experiment(
game_names=["my_game"],
method_configs=method_configs,
n_trials=10,
models={},
allowed_actions_dict={},
forbidden_actions_dict={},
given_configs={"my_game": game_config}, # pass your own config here
)
from experiments.runner import print_comparison_table
print_comparison_table(results)TODO: Add BibTeX entry here once the paper link is confirmed.TODO: Add licence information.