Skip to content

mvaldi/cvat-yoloe-sam

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5,668 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CVAT Platform

CVAT + YOLOE + SAM3

Custom fork of CVAT with YOLOE Visual Prompt and SAM3 (Segment Anything Model 3) integration for AI-assisted annotation.

✨ Features

Model Description Capabilities
YOLOE Visual Prompt Detection by visual examples Rectangle, OBB (rotated), Polygon (segmentation)
SAM3 Text-prompted segmentation Text-to-Segment, Text-to-Detect, Text-to-Track

📋 Requirements

  • Docker and Docker Compose
  • NVIDIA GPU with CUDA 12.4+ (minimum 8GB VRAM)
  • nuctl v1.13.0 (Nuclio CLI)
# Install nuctl
wget https://github.com/nuclio/nuclio/releases/download/1.13.0/nuctl-1.13.0-linux-amd64
chmod +x nuctl-1.13.0-linux-amd64
sudo mv nuctl-1.13.0-linux-amd64 /usr/local/bin/nuctl

For SAM3 (optional)

SAM3 requires access to the model on HuggingFace:

# Install HuggingFace CLI
curl -LsSf https://hf.co/cli/install.sh | bash

# Login and download model (requires approval at https://huggingface.co/facebook/sam3)
huggingface-cli login
huggingface-cli download facebook/sam3

🚀 Installation

# Clone repository
git clone https://github.com/mvaldi/cvat-yoloe-sam.git
cd cvat-yoloe-sam

# Start CVAT with all models
./zup.sh

# Or YOLOE only (without SAM3)
./zup.sh --no-sam3

# Or SAM3 only (without YOLOE)
./zup.sh --no-yoloe

# Base CVAT only (no AI models)
./zup.sh --no-sam3 --no-yoloe

Access CVAT at: http://localhost:8080

Custom host (remote server)

./zup.sh --host $(hostname -I | awk '{print $1}')

🛑 Stop

# Stop containers
./zdown.sh

# Stop and clean Nuclio functions
./zdown.sh --clean

📖 Using the Models

YOLOE Visual Prompt

  1. Create a Task and upload images/video
  2. Manually annotate some reference frames (minimum 1)
  3. Go to AI ToolsYOLOE
  4. Select reference frames and click Generate VPE
  5. Navigate to an unannotated frame
  6. Select Output Type: Rectangle | OBB | Polygon
  7. Adjust Confidence and click Detect
  8. Review and apply detections

SAM3 (Segment Anything 3)

  1. Go to AI ToolsSAM3
  2. Enter a text prompt (e.g., "person", "car", "dog")
  3. Select mode:
    • Segment: Segment specific object
    • Detect: Detect all instances
    • Track: Track object in video
  4. Adjust confidence and apply results

⚠️ Considerations

GPU Memory

Configuration Required VRAM
YOLOE only ~4 GB
SAM3 only ~6 GB
YOLOE + SAM3 ~10 GB

Note: With GPUs <12GB VRAM, use only one model at a time.

First startup

The first ./zup.sh will download models and build Docker images. This may take 10-30 minutes depending on your connection.

Troubleshooting

# View server logs
docker logs cvat_server --tail 50

# View YOLOE logs
docker logs nuclio-nuclio-pth-ultralytics-yoloe-visual-prompt --tail 50

# View SAM3 logs
docker logs nuclio-nuclio-pth-facebookresearch-sam3-gpu --tail 50

# Check Nuclio functions
nuctl get function --platform local

📚 Additional Documentation

For complete CVAT documentation (formats, API, SDK, CLI):

📄 License

MIT License - See LICENSE for details.

This project includes models with additional licenses:

About

Annotate better with CVAT with YOLOE from ultralytics and SAM from facebook, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 44.3%
  • TypeScript 40.1%
  • JavaScript 10.4%
  • Mustache 2.1%
  • SCSS 1.6%
  • Open Policy Agent 0.9%
  • Other 0.6%