Skip to content

re-cinq/minimax-m2.5-nvidia-dgx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MiniMax M2.5 Inference Server for DGX Spark

Run MiniMax M2.5 (230B params, 10B active MoE) locally on NVIDIA DGX Spark with an OpenAI-compatible API.

Quick Start

# 1. Download model (~101GB)
huggingface-cli download unsloth/MiniMax-M2.5-GGUF \
  --local-dir ./models --include '*UD-Q3_K_XL*'

# 2. Build and start
cd docker
docker compose build   # First time only
docker compose up -d

# 3. Test
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "minimax-m2.5", "messages": [{"role": "user", "content": "Hello"}]}'

Commands

docker compose up -d      # Start
docker compose down       # Stop
docker compose logs -f    # Logs
docker compose ps         # Status

Hardware

Target: NVIDIA DGX Spark (GB10 Grace Blackwell, 128GB unified memory)

Configuration

Key settings in docker-compose.yml:

Setting Value Purpose
-ngl 999 All layers on GPU Full GPU acceleration
-c 131072 128K context Large context window
-fa on Flash Attention Memory efficiency
--temp 1.0 MiniMax default Recommended sampling

Troubleshooting

docker compose logs                    # Check errors
ls -lh ../models/UD-Q3_K_XL/           # Verify model exists
curl http://localhost:8080/health      # Health check

Model

  • MiniMax M2.5 UD-Q3_K_XL via Unsloth
  • 230B total params, 10B active (MoE), 200K context
  • 80.2% SWE-Bench Verified

References

About

Running MiniMax-M2.5 on a Nvidia DGX Spark with OpenCode on the remote machine

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors