- NVIDIA Isaac GR00T
- What's New in GR00T N1.7
- Installation
- Model Checkpoints & Embodiment Tags
- Data Format
- Inference
- Fine-tuning
- Evaluation
- Contributions
- License
- Citation
|
|
|
We just released GR00T N1.7 Early Access, the latest version of GR00T N1 with a new VLM backbone (Cosmos-Reason2-2B / Qwen3-VL) and improved performance.
This is an Early Access (EA) release. You are welcome to download the model, explore the codebase, and begin building on the stack, with the understanding that support and stability guarantees are limited until the GA release.
What's available:
- Pre-trained GR00T N1.7 model weights and reference code
- Fine-tuning and inference with custom robot data or demonstrations
- Experimentation, prototyping, and research use cases
Available at GA:
- Production deployment with commercial support
- Complete benchmarks and a fully validated, stable feature set
- Pull request contributions
We welcome feedback - please feel free to raise issues in this repository.
NVIDIA Isaac GR00T N1.7 is an open vision-language-action (VLA) model for generalized humanoid robot skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.
GR00T N1.7 is trained on a diverse mixture of robot data including bimanual, semi-humanoid and an expansive humanoid dataset. It is adaptable through post-training for specific embodiments, tasks and environments.
GR00T N1.7 is fully commercially licensable under Apache 2.0. It delivers comparable performance to N1.6, with improved generalization and language-following capabilities driven by the inclusion of 20K hours of EgoScale human video data in pretraining.
The neural network architecture of GR00T N1.7 is a combination of vision-language foundation model and diffusion transformer head that denoises continuous actions. Here is a schematic diagram of the architecture:
- Prepare data — Collect robot demonstrations (video, state, action) and convert them to the GR00T LeRobot format. Demo datasets are included for quick testing.
- Run inference — Try zero-shot inference with the base model on pretrain embodiments, or use a finetuned checkpoint for benchmark tasks.
- Fine-tune — Adapt the model to your robot using
launch_finetune.pywith your own data and modality config. - Evaluate — Validate with open-loop evaluation, then test in simulation benchmarks or on real hardware via the Policy API.
- Deploy — Connect
Gr00tPolicyto your robot controller, optionally accelerated with TensorRT.
GR00T N1.7 builds on N1.6 with a new VLM backbone and code-level improvements.
- New VLM backbone: Cosmos-Reason2-2B (Qwen3-VL architecture), replacing the Eagle backbone used in N1.6. Supports flexible resolution and encodes images in their native aspect ratio without padding.
- Simplified data processing pipeline (
processing_gr00t_n1d7.py). - Added full pipeline export to ONNX and TensorRT with improved frequency.
Inference: 1 GPU with 16 GB+ VRAM (e.g., RTX 4090, L40, H100, Jetson AGX Thor/Orin, DGX Spark).
Fine-tuning: 1 or more GPUs with 40 GB+ VRAM recommended. We recommend H100 or L40 nodes for optimal performance. Other hardware (e.g., A6000) works but may require longer training time. See the Hardware Recommendation Guide for detailed specs.
GR00T relies on submodules for certain dependencies. Include them when cloning:
Note: git-lfs is required to download parquet data files in /demo_data. Install it before cloning: sudo apt install git-lfs && git lfs install.
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00TIf you've already cloned without submodules, initialize them separately:
git submodule update --init --recursiveGR00T uses uv for fast, reproducible dependency management. Install uv first:
curl -LsSf https://astral.sh/uv/install.sh | shInstall FFmpeg (required by torchcodec, the default video backend):
sudo apt-get update && sudo apt-get install -y ffmpegCreate the environment and install GR00T:
uv sync --python 3.10GPU dependencies (flash-attn, TensorRT, etc.) are included in the default install.
Verify the installation:
uv run python -c "import gr00t; print('GR00T installed successfully')"
flash-attnmessage on everyuv run: You may seeInstalling flash-attn...each time you runuv run. This is a knownuvbehavior with URL-pinned wheel sources —uvre-validates the cached wheel against the source URL on each invocation. It is not rebuilding from source; the wheel is already cached locally and the operation takes 2-3 seconds. This only affects x86_64 platforms. To suppress it, remove theflash-attnentries under[tool.uv.sources]in your localpyproject.tomlafter the initial install. But that will breakuv lockand cause flash-attn to build from source on next lock regeneration.
Alternative: pip install (without uv)
If you prefer pip/conda over uv, create a Python 3.10 virtualenv and install:
python3.10 -m venv .venv && source .venv/bin/activate
pip install -e .Note: GPU dependencies (flash-attn, TensorRT) may require manual installation with pip. The uv workflow handles these automatically.
If fine-tuning fails with
CUDA_HOME is unset: Runbash scripts/deployment/dgpu/install_deps.shonce to configure CUDA paths, or manuallyexport CUDA_HOME=/usr/local/cuda.
CUDA 13.x Users (Thor, Spark, and other CUDA 13+ platforms): PyTorch 2.7 pins Triton to 3.3.1, which does not recognize CUDA major version 13+. This causes a
RuntimeErrorin Triton'sptx_get_version(). Run the patch script to fix:uv run bash scripts/patch_triton_cuda13.sh
GB300 (sm_103) Users: Triton 3.3.1 (pinned by PyTorch 2.7) does not support the GB300 GPU architecture (sm_103).
torch.compilewill fail on GB300. Use PyTorch eager mode or TensorRT inference instead. Triton 3.5.1+ adds sm_103 support but is not yet compatible with the pinned PyTorch version.
aarch64 Video Backend: On aarch64 platforms (Thor, Orin),
torchcodecis the required video backend. Pre-built wheels are not available for aarch64, so it is built from source duringinstall_deps.sh. If you encounterNotImplementedErrorfrom the video backend, ensuretorchcodecwas built successfully during setup. Other backends (decord, pyav) are not supported on aarch64.
DGX Spark (tested with DGX Spark GB10)
bash scripts/deployment/spark/install_deps.sh
source .venv/bin/activate
source scripts/activate_spark.shSee the Spark setup guide for Docker and bare metal details.
Jetson AGX Thor (tested with JetPack 7.1)
flash-attn on older systems (e.g., Ubuntu 20.04 with glibc < 2.35): The pre-built
flash-attnwheel may fail withImportError: glibc_compat.so: cannot open shared object file. To fix this, build from source:uv pip install flash-attn==2.7.4.post1 --no-binary flash-attn --no-cacheThis compiles locally (~10-30 minutes) and avoids the glibc compatibility issue.
bash scripts/deployment/thor/install_deps.sh
source .venv/bin/activate
source scripts/activate_thor.shSee the Thor setup guide for Docker and bare metal details.
Jetson Orin (tested with JetPack 6.2)
bash scripts/deployment/orin/install_deps.sh
source .venv/bin/activate
source scripts/activate_orin.shSee the Orin setup guide for Docker and bare metal details.
For a containerized setup that avoids system-level dependency conflicts, see our Docker Setup Guide.
| Checkpoint | Type | Embodiment Tag | Description |
|---|---|---|---|
nvidia/GR00T-N1.7-3B |
Base | See pretrain tags | Base model (3B params) — zero-shot inference on pretrain embodiments, or finetune for new tasks |
nvidia/GR00T-N1.7-LIBERO |
Finetuned | LIBERO_PANDA |
Finetuned on LIBERO benchmark (Franka Panda) |
nvidia/GR00T-N1.7-DROID |
Finetuned | OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT |
Finetuned on DROID dataset |
nvidia/GR00T-N1.7-SimplerEnv-Bridge |
Finetuned | SIMPLER_ENV_WIDOWX |
Finetuned on SimplerEnv Bridge (WidowX) |
nvidia/GR00T-N1.7-SimplerEnv-Fractal |
Finetuned | SIMPLER_ENV_GOOGLE |
Finetuned on SimplerEnv Fractal (Google Robot) |
Older versions: N1.6 checkpoints | N1.5 checkpoints
Every inference or finetuning command requires an --embodiment-tag. The tag determines which modality config (state/action keys, normalization) the model uses. Tags are case-insensitive.
For the full list of pretrain and posttrain tags, see the Policy API Guide — Embodiment Tags.
GR00T uses a flavor of the LeRobot v2 dataset format with an additional meta/modality.json file that describes state/action/video structure. A dataset looks like:
my_dataset/
meta/
info.json # dataset metadata
episodes.jsonl # episode index and lengths
tasks.jsonl # language task descriptions
modality.json # state/action/video key mapping (GR00T-specific)
data/chunk-000/ # parquet files (state, action per timestep)
videos/chunk-000/ # mp4 video files per episode
The modality.json maps how the concatenated state/action arrays split into named fields (e.g., x, y, z, gripper) and which video keys are available. This is what the embodiment tag uses to interpret the data.
Included demo datasets (ready to use, no download needed):
| Dataset | Robot | Embodiment Tag | Use Case |
|---|---|---|---|
demo_data/droid_sample |
DROID (3 episodes) | OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT |
Zero-shot inference with base model |
demo_data/libero_demo |
LIBERO Panda (5 episodes) | LIBERO_PANDA |
Inference with finetuned checkpoint |
demo_data/cube_to_bowl_5 |
SO100 arm (5 episodes) | NEW_EMBODIMENT |
Fine-tuning custom embodiment example |
To generate more DROID episodes:
python scripts/download_droid_sample.py --num-episodes 10
Using your own data: Convert your demonstrations to the format above. If coming from LeRobot v3, use the conversion script: python scripts/lerobot_conversion/convert_v3_to_v2.py. See the full Data Preparation Guide for schema details and examples.
The included demo_data/droid_sample dataset works with the base model out of the box — no finetuning or checkpoint download needed:
uv run python scripts/deployment/standalone_inference_script.py \
--model-path nvidia/GR00T-N1.7-3B \
--dataset-path demo_data/droid_sample \
--embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \
--traj-ids 1 2 \
--inference-mode pytorch \
--action-horizon 8This runs open-loop inference on 2 DROID episodes, comparing predicted actions against ground truth. The base model downloads automatically from HuggingFace on first run (~6 GB).
For posttrain embodiments, use a finetuned checkpoint. Most finetuned checkpoints (e.g., DROID, SimplerEnv) have a flat file structure and can be passed directly as a HuggingFace model ID — no manual download needed:
uv run python scripts/deployment/standalone_inference_script.py \
--model-path nvidia/GR00T-N1.7-DROID \
--dataset-path demo_data/droid_sample \
--embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \
--traj-ids 1 2 \
--inference-mode pytorch \
--action-horizon 8Some checkpoints (e.g., LIBERO) use a nested folder structure with model files under a subfolder. HuggingFace does not support nested repo paths in --model-path, so you must download first:
uv run hf download nvidia/GR00T-N1.7-LIBERO \
--include "libero_10/config.json" "libero_10/embodiment_id.json" \
"libero_10/model-*.safetensors" "libero_10/model.safetensors.index.json" \
"libero_10/processor_config.json" "libero_10/statistics.json" \
--local-dir checkpoints/GR00T-N1.7-LIBEROuv run python scripts/deployment/standalone_inference_script.py \
--model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
--dataset-path demo_data/libero_demo \
--embodiment-tag LIBERO_PANDA \
--traj-ids 0 1 2 \
--inference-mode pytorch \
--action-horizon 8For real-world deployment or simulation evaluation, use the server-client architecture. The policy runs on a GPU server; a lightweight client sends observations and receives actions over ZMQ.
Terminal 1 — Start the policy server:
uv run python gr00t/eval/run_gr00t_server.py \
--model-path nvidia/GR00T-N1.7-3B \
--embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \
--device cuda:0Terminal 2 — Run open-loop evaluation as a client:
uv run python gr00t/eval/open_loop_eval.py \
--dataset-path demo_data/droid_sample \
--embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \
--host 127.0.0.1 \
--port 5555 \
--traj-ids 1 2 \
--action-horizon 8Tip: If you get
ZMQError: Address already in use, the default port 5555 is occupied. Use--port <other_port>.
For connecting to a real robot (e.g., DROID hardware), see examples/DROID/README.md. For faster inference with TensorRT, see the Deployment & Inference Guide.
See the complete Policy API Guide for documentation on observation/action formats, batched inference, and troubleshooting.
Each benchmark has a self-contained README with dataset download, finetune, and evaluation commands:
| Benchmark | Embodiment | Guide |
|---|---|---|
| LIBERO | LIBERO_PANDA |
examples/LIBERO/README.md |
| SimplerEnv (Fractal) | SIMPLER_ENV_GOOGLE |
examples/SimplerEnv/README.md |
| SimplerEnv (Bridge) | SIMPLER_ENV_WIDOWX |
examples/SimplerEnv/README.md |
| SO100 | NEW_EMBODIMENT |
examples/SO100/README.md |
To finetune GR00T on your own robot data and configuration, follow the detailed tutorial at getting_started/finetune_new_embodiment.md.
Ensure your input data follows the GR00T LeRobot format, and specify your modality configuration via --modality-config-path.
Single GPU:
CUDA_VISIBLE_DEVICES=0 uv run python \
gr00t/experiment/launch_finetune.py \
--base-model-path nvidia/GR00T-N1.7-3B \
--dataset-path demo_data/cube_to_bowl_5 \
--embodiment-tag NEW_EMBODIMENT \
--modality-config-path examples/SO100/so100_config.py \
--num-gpus 1 \
--output-dir /tmp/test_finetune \
--max-steps 2000 \
--global-batch-size 32 \
--dataloader-num-workers 4Multi-GPU (e.g., 8xH100):
uv run torchrun --nproc_per_node=8 --master_port=29500 \
gr00t/experiment/launch_finetune.py \
--base-model-path nvidia/GR00T-N1.7-3B \
--dataset-path demo_data/cube_to_bowl_5 \
--embodiment-tag NEW_EMBODIMENT \
--modality-config-path examples/SO100/so100_config.py \
--num-gpus 8 \
--output-dir /tmp/test_finetune_8gpu \
--max-steps 2000 \
--global-batch-size 32 \
--dataloader-num-workers 4Replace demo_data/cube_to_bowl_5 and examples/SO100/so100_config.py with your own dataset and modality config. See examples/SO100 for a complete walkthrough.
Note: Use
uv run torchrun(not baretorchrun) to ensure the correct virtual environment is used. Add--use-wandbto enable Weights & Biases logging. For more extensive configuration, usegr00t/experiment/launch_train.py.
- Maximize batch size for your hardware and train for a few thousand steps.
- Users may observe 5-6% variance between runs due to non-deterministic image augmentations. Keep this in mind when comparing to reported benchmarks.
--state_dropout_prob(default: 0.8 in model config, 0.0 in finetune CLI): Randomly drops state inputs during training to improve generalization and reduce state-dependency. The LIBERO and SimplerEnv finetune scripts set this to 0.8. If your task relies heavily on proprioceptive state, consider lowering this value.
Compare predicted actions against ground truth from your dataset:
uv run python gr00t/eval/open_loop_eval.py \
--dataset-path <DATASET_PATH> \
--embodiment-tag NEW_EMBODIMENT \
--model-path <CHECKPOINT_PATH> \
--traj-ids 0 \
--action-horizon 16This generates a visualization at /tmp/open_loop_eval/traj_{traj_id}.jpeg with ground truth vs. predicted actions and MSE metrics. Use --save-plot-path <dir> to save plots to a custom location.
Test your model in simulation or on real hardware using the server-client architecture:
# Start the policy server
uv run python gr00t/eval/run_gr00t_server.py \
--embodiment-tag NEW_EMBODIMENT \
--model-path <CHECKPOINT_PATH> \
--device cuda:0 \
--host 0.0.0.0 --port 5555from gr00t.policy.server_client import PolicyClient
policy = PolicyClient(host="localhost", port=5555)
env = YourEnvironment()
obs, info = env.reset()
action, info = policy.get_action(obs)
obs, reward, done, truncated, info = env.step(action)Debugging with ReplayPolicy: To verify your environment setup without a trained model, start the server with --dataset-path <DATASET_PATH> (omit --model-path) to replay recorded actions from the dataset.
See the complete Policy API Guide for observation/action formats, batched inference, and troubleshooting.
We support evaluation on public benchmarks using a server-client architecture. The policy server reuses the project root's uv environment; simulation clients have individual setup scripts.
You can use the verification script to verify that all dependencies are properly configured.
Zero-shot (evaluate with the base model, no finetuning):
- DROID — real-world DROID robot
Finetuned (evaluate with finetuned checkpoints):
- LIBERO — LIBERO benchmark (Franka Panda)
- SimplerEnv — Google Robot (Fractal) and WidowX (Bridge)
- SO100 — SO100 custom embodiment workflow
Adding a New Sim Benchmark
Each sim benchmark registers its environments under a gym env_name with the format {prefix}/{task_name} (e.g., libero_sim/LIVING_ROOM_SCENE2_put_soup_in_basket). The evaluation framework uses the prefix to look up the corresponding EmbodimentTag via a mapping in gr00t/eval/sim/env_utils.py.
Important: The env_name prefix and the
EmbodimentTagvalue are often different. For example,libero_simmaps toEmbodimentTag.LIBERO_PANDA("libero_sim"). Do not assume they match.
To add a new benchmark:
- Add an entry to
ENV_PREFIX_TO_EMBODIMENT_TAGingr00t/eval/sim/env_utils.py:ENV_PREFIX_TO_EMBODIMENT_TAG = { ... "my_new_benchmark": EmbodimentTag.MY_ROBOT, }
- If the benchmark has multiple env_name prefixes (e.g.,
my_benchmark_v1,my_benchmark_v2), all related prefixes must map to the sameEmbodimentTag. - Add corresponding test cases in
tests/gr00t/eval/sim/test_env_utils.pyand update thetest_all_known_prefixes_presenttest.
During Early Access we are not accepting pull requests while the codebase stabilizes. If you encounter issues or have suggestions, please open an Issue in this repository.
Support during Early Access is best-effort. We will continue iterating toward a more stable General Availability (GA) release.
- Code: Apache 2.0 — see LICENSE
- Model weights: NVIDIA Open Model License
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
@inproceedings{gr00tn1_2025,
archivePrefix = {arxiv},
eprint = {2503.14734},
title = {{GR00T} {N1}: An Open Foundation Model for Generalist Humanoid Robots},
author = {NVIDIA and Johan Bjorck and Fernando Castañeda, Nikita Cherniadev and Xingye Da and Runyu Ding and Linxi "Jim" Fan and Yu Fang and Dieter Fox and Fengyuan Hu and Spencer Huang and Joel Jang and Zhenyu Jiang and Jan Kautz and Kaushil Kundalia and Lawrence Lao and Zhiqi Li and Zongyu Lin and Kevin Lin and Guilin Liu and Edith Llontop and Loic Magne and Ajay Mandlekar and Avnish Narayan and Soroush Nasiriany and Scott Reed and You Liang Tan and Guanzhi Wang and Zu Wang and Jing Wang and Qi Wang and Jiannan Xiang and Yuqi Xie and Yinzhen Xu and Zhenjia Xu and Seonghyeon Ye and Zhiding Yu and Ao Zhang and Hao Zhang and Yizhou Zhao and Ruijie Zheng and Yuke Zhu},
month = {March},
year = {2025},
booktitle = {ArXiv Preprint},
}



