MiniMax M2.5 Inference Server for DGX Spark

Run MiniMax M2.5 (230B params, 10B active MoE) locally on NVIDIA DGX Spark with an OpenAI-compatible API.

Quick Start

# 1. Download model (~101GB)
huggingface-cli download unsloth/MiniMax-M2.5-GGUF \
  --local-dir ./models --include '*UD-Q3_K_XL*'

# 2. Build and start
cd docker
docker compose build   # First time only
docker compose up -d

# 3. Test
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "minimax-m2.5", "messages": [{"role": "user", "content": "Hello"}]}'

Commands

docker compose up -d      # Start
docker compose down       # Stop
docker compose logs -f    # Logs
docker compose ps         # Status

Hardware

Target: NVIDIA DGX Spark (GB10 Grace Blackwell, 128GB unified memory)

Configuration

Key settings in docker-compose.yml:

Setting	Value	Purpose
`-ngl 999`	All layers on GPU	Full GPU acceleration
`-c 131072`	128K context	Large context window
`-fa on`	Flash Attention	Memory efficiency
`--temp 1.0`	MiniMax default	Recommended sampling

Troubleshooting

docker compose logs                    # Check errors
ls -lh ../models/UD-Q3_K_XL/           # Verify model exists
curl http://localhost:8080/health      # Health check

Model

MiniMax M2.5 UD-Q3_K_XL via Unsloth
230B total params, 10B active (MoE), 200K context
80.2% SWE-Bench Verified

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
docker		docker
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
NOTICE		NOTICE
README.md		README.md
benchmark.sh		benchmark.sh
opencode.json		opencode.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniMax M2.5 Inference Server for DGX Spark

Quick Start

Commands

Hardware

Configuration

Troubleshooting

Model

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MiniMax M2.5 Inference Server for DGX Spark

Quick Start

Commands

Hardware

Configuration

Troubleshooting

Model

References

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages