CVAT + YOLOE + SAM3

Custom fork of CVAT with YOLOE Visual Prompt and SAM3 (Segment Anything Model 3) integration for AI-assisted annotation.

✨ Features

Model	Description	Capabilities
YOLOE Visual Prompt	Detection by visual examples	Rectangle, OBB (rotated), Polygon (segmentation)
SAM3	Text-prompted segmentation	Text-to-Segment, Text-to-Detect, Text-to-Track

📋 Requirements

Docker and Docker Compose
NVIDIA GPU with CUDA 12.4+ (minimum 8GB VRAM)
nuctl v1.13.0 (Nuclio CLI)

# Install nuctl
wget https://github.com/nuclio/nuclio/releases/download/1.13.0/nuctl-1.13.0-linux-amd64
chmod +x nuctl-1.13.0-linux-amd64
sudo mv nuctl-1.13.0-linux-amd64 /usr/local/bin/nuctl

For SAM3 (optional)

SAM3 requires access to the model on HuggingFace:

# Install HuggingFace CLI
curl -LsSf https://hf.co/cli/install.sh | bash

# Login and download model (requires approval at https://huggingface.co/facebook/sam3)
huggingface-cli login
huggingface-cli download facebook/sam3

🚀 Installation

# Clone repository
git clone https://github.com/mvaldi/cvat-yoloe-sam.git
cd cvat-yoloe-sam

# Start CVAT with all models
./zup.sh

# Or YOLOE only (without SAM3)
./zup.sh --no-sam3

# Or SAM3 only (without YOLOE)
./zup.sh --no-yoloe

# Base CVAT only (no AI models)
./zup.sh --no-sam3 --no-yoloe

Access CVAT at: http://localhost:8080

Custom host (remote server)

./zup.sh --host $(hostname -I | awk '{print $1}')

🛑 Stop

# Stop containers
./zdown.sh

# Stop and clean Nuclio functions
./zdown.sh --clean

📖 Using the Models

YOLOE Visual Prompt

Create a Task and upload images/video
Manually annotate some reference frames (minimum 1)
Go to AI Tools → YOLOE
Select reference frames and click Generate VPE
Navigate to an unannotated frame
Select Output Type: Rectangle | OBB | Polygon
Adjust Confidence and click Detect
Review and apply detections

SAM3 (Segment Anything 3)

Go to AI Tools → SAM3
Enter a text prompt (e.g., "person", "car", "dog")
Select mode:
- Segment: Segment specific object
- Detect: Detect all instances
- Track: Track object in video
Adjust confidence and apply results

⚠️ Considerations

GPU Memory

Configuration	Required VRAM
YOLOE only	~4 GB
SAM3 only	~6 GB
YOLOE + SAM3	~10 GB

Note: With GPUs <12GB VRAM, use only one model at a time.

First startup

The first ./zup.sh will download models and build Docker images. This may take 10-30 minutes depending on your connection.

Troubleshooting

# View server logs
docker logs cvat_server --tail 50

# View YOLOE logs
docker logs nuclio-nuclio-pth-ultralytics-yoloe-visual-prompt --tail 50

# View SAM3 logs
docker logs nuclio-nuclio-pth-facebookresearch-sam3-gpu --tail 50

# Check Nuclio functions
nuctl get function --platform local

📚 Additional Documentation

For complete CVAT documentation (formats, API, SDK, CLI):

📄 License

MIT License - See LICENSE for details.

This project includes models with additional licenses:

YOLOE: Ultralytics License
SAM3: Meta AI License

Name		Name	Last commit message	Last commit date
Latest commit History 5,668 Commits
.github		.github
.regal		.regal
.vscode		.vscode
ai-models		ai-models
backend_entrypoint.d		backend_entrypoint.d
changelog.d		changelog.d
components		components
cvat-canvas		cvat-canvas
cvat-canvas3d		cvat-canvas3d
cvat-cli		cvat-cli
cvat-core		cvat-core
cvat-data		cvat-data
cvat-sdk		cvat-sdk
cvat-ui		cvat-ui
cvat		cvat
dev		dev
helm-chart		helm-chart
serverless		serverless
site		site
supervisord		supervisord
tests		tests
utils		utils
.bandit		.bandit
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc.cjs		.eslintrc.cjs
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.nycrc		.nycrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.pylintrc		.pylintrc
.remarkignore		.remarkignore
.remarkrc.js		.remarkrc.js
.stylelintrc.json		.stylelintrc.json
.yarnrc.yml		.yarnrc.yml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
Dockerfile.ci		Dockerfile.ci
Dockerfile.ui		Dockerfile.ui
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
_typos.toml		_typos.toml
backend_entrypoint.sh		backend_entrypoint.sh
docker-compose.ci.yml		docker-compose.ci.yml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.external_db.yml		docker-compose.external_db.yml
docker-compose.https.yml		docker-compose.https.yml
docker-compose.yml		docker-compose.yml
lint-staged.config.js		lint-staged.config.js
manage.py		manage.py
package.json		package.json
pyproject.toml		pyproject.toml
rqscheduler.py		rqscheduler.py
wait_for_deps.sh		wait_for_deps.sh
yarn.lock		yarn.lock
zdown.sh		zdown.sh
zup.sh		zup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVAT + YOLOE + SAM3

✨ Features

📋 Requirements

For SAM3 (optional)

🚀 Installation

Custom host (remote server)

🛑 Stop

📖 Using the Models

YOLOE Visual Prompt

SAM3 (Segment Anything 3)

⚠️ Considerations

GPU Memory

First startup

Troubleshooting

📚 Additional Documentation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CVAT + YOLOE + SAM3

✨ Features

📋 Requirements

For SAM3 (optional)

🚀 Installation

Custom host (remote server)

🛑 Stop

📖 Using the Models

YOLOE Visual Prompt

SAM3 (Segment Anything 3)

⚠️ Considerations

GPU Memory

First startup

Troubleshooting

📚 Additional Documentation

📄 License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages