Skip to content

dtak/negotiation_benchmark_public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Negotiation Benchmark

Code for the paper:

TODO: Add paper title, authors, venue, and link here TODO: arxiv / proceedings link


Overview

This repository implements a multi-player bilateral negotiation benchmark for evaluating AI negotiation agents. Players take turns proposing joint actions to partners in a round-robin schedule. Each proposal is either accepted (if it weakly improves the partner's payoff) or rejected. The benchmark supports exact solvers, heuristic methods, MCTS, dynamic programming lookahead, and LLM-based agents, and includes a procedural game generator for sweeping across diverse negotiation scenarios.


Repository Structure

.
├── config/
│   └── game_configs.py          # Game generation and satisfaction mask utilities
├── core/
│   ├── equilibrium.py           # Nash equilibrium checking and regret computation
│   ├── game_logic.py            # Payoffs, goal satisfaction, offer generation
│   ├── game_state.py            # NegotiationState class (turn order, policy matrix)
│   └── one_shot_optim.py        # Baseline and MIP-based one-shot optimisation
├── methods/
│   ├── baselines.py             # LLM-based negotiation agent (OpenAI API)
│   ├── mcts.py                  # MCTS and DP-based partner selection
│   └── negotiation.py           # Offer search, value estimators, exact solvers
├── experiments/
│   └── runner.py                # Single-game runner and multi-method experiment loop
├── main.py                      # Local parallel sweep runner 
└── README.md

Installation

pip install numpy cvxpy scipy joblib tqdm pandas openai tenacity

MOSEK is required for the MIP-based methods (optimize_P_via_masks_with_NE and best_offer_linear_mip). A free academic licence is available at mosek.com.

For the LLM baseline, set your OpenAI API key:

export OPENAI_API_KEY="your-key-here"

Core Concepts

Game representation

Each game is defined by:

  • G — a (N_GOALS, N_PLAYERS) matrix where G[g, p] is player p's valuation of goal g.
  • Policy matrix P — a binary (N_PLAYERS, N_ACTIONS) matrix where P[p, a] = 1 means player p has committed to action a. Actions are binding: bits can only flip from 0 → 1, never 1 → 0.
  • Satisfaction masks — one binary matrix per goal indicating which (player, action) pairs are required to satisfy it.
  • Goal types — goals are either linear (satisfaction scales with the fraction of required actions taken) or binary (satisfied only when all required actions are taken).

Turn structure

Players negotiate in a shuffled round-robin order. On each turn, the current proposer selects a partner and proposes a joint action. The partner accepts if the offer weakly improves their estimated terminal payoff; otherwise the turn is rejected. The game ends after all scheduled turns.


Negotiation Methods

Method Description
reward Greedy offer maximising proposer payoff; random partner selection
upper Greedy offer using an optimistic upper-bound value estimator
lower_tighter Greedy offer using a tighter pessimistic lower-bound estimator
LLM_full LLM agent (GPT-4o-mini) selecting partner and offer from raw game state

Methods are passed as configuration dicts to the runner. The how_fallback key selects the value estimator; MCTS is used for partner selection when n_sims > 0.


Reproducing Figure 1 and Table 1

TODO: Add precise description of what Figure 1 and Table 1 show once the paper link is confirmed.

The results are generated by running the full parameter sweep in run_cloud.py (or run_local.py for local execution). The sweep covers:

Parameter Values
Structure type adversarial, cooperative
Binary fraction 0.0, 0.15, 0.30, 0.50
Latent factors (k) 5, 15
Zipf complexity 1.6, 3.0
Payoff shift negative, positive, balanced
Game size small (exact solver), large (baseline)
Seeds 0–49

This produces 9,600 tasks in total.

Running locally

python run_local.py

Results are saved to ./results/<size>_<shift>_games/<uuid>.pkl.gz. Uses all available CPU cores via joblib.

Results are saved to the negotiation-results-vol Modal Volume. To download:

modal volume get negotiation-results-vol /root/cloud_data/<folder> ./local_results

Loading results

Each .pkl.gz file contains a dict with a single "results" key:

import gzip, pickle

with gzip.open("path/to/file.pkl.gz", "rb") as f:
    data = pickle.load(f)

# data["results"] is a dict keyed by (method_name, game_name)
# Each value contains "payoff_vector", "sum_payoff", "is_equilibrium", etc.

Generating Custom Games

from config.game_configs import ScenarioProfile, generate_game_config, create_sat_masks

profile = ScenarioProfile(
    structure_type="adversarial",   # or "cooperative"
    binary_fraction=0.2,            # fraction of goals that are binary
    complexity_zipf_a=2.0,          # Zipf shape for goal complexity (must be > 1)
)

game_config = generate_game_config(
    n_players=5,
    country_idx2num_actions={i: 4 for i in range(5)},
    n_goals=10,
    k_factors=3,
    seed=42,
    profile=profile,
    shift="negative",   # "negative", "positive", or None
    inject_pp=False,    # set True to inject a poison-pill scenario
)

sat_masks = create_sat_masks(game_config)

Running a Single Experiment

from experiments.runner import run_experiment

method_configs = [
    {
        "name": "reward",
        "how_fallback": "reward",
        "n_sims": 0,
        "c_ucb": 1.0,
        "use_prior": False,
        "max_changes": 2,
        "dp_k": 0,
        "k": 1,
    },
    {
        "name": "upper",
        "how_fallback": "upper",
        "n_sims": 0,
        "c_ucb": 1.0,
        "use_prior": False,
        "max_changes": 2,
        "dp_k": 0,
        "k": 1,
    },
]

results = run_experiment(
    game_names=["my_game"],
    method_configs=method_configs,
    n_trials=10,
    models={},
    allowed_actions_dict={},
    forbidden_actions_dict={},
    given_configs={"my_game": game_config},   # pass your own config here
)

from experiments.runner import print_comparison_table
print_comparison_table(results)

Citation

TODO: Add BibTeX entry here once the paper link is confirmed.

License

TODO: Add licence information.

About

Repo containing the code associated with "A Benchmark for Multi-Party Negotiation Games from Real Negotiation Data" paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages