|
| 1 | +# Data Migration |
| 2 | + |
| 3 | +Migrates data from **Azure AI Search**, **Cosmos DB** (MongoDB API), and **Azure Blob Storage** |
| 4 | +in a source resource group into target services in a new resource group. |
| 5 | + |
| 6 | +The script automatically handles RBAC role assignment/revocation, retry logic, pagination, |
| 7 | +content-type preservation for blobs, and Windows long-path support. |
| 8 | + |
| 9 | +## Prerequisites |
| 10 | + |
| 11 | +- Azure CLI installed and logged in (`az login`) |
| 12 | +- Cosmos DB MongoDB connection strings for source and/or target |
| 13 | +- Your Azure AD account needs **Owner** or **User Access Administrator** on the source/target |
| 14 | + resource groups so the script can temporarily assign the required RBAC roles (see below) |
| 15 | + |
| 16 | + |
| 17 | +## Setup |
| 18 | + |
| 19 | +```bash |
| 20 | +cd Deployment/data_migration |
| 21 | + |
| 22 | +# Create virtual environment |
| 23 | +python -m venv .venv |
| 24 | + |
| 25 | +# Activate virtual environment |
| 26 | +# Windows (PowerShell) |
| 27 | +.venv\Scripts\Activate.ps1 |
| 28 | + |
| 29 | +# Windows (Command Prompt) |
| 30 | +.venv\Scripts\activate.bat |
| 31 | + |
| 32 | +# macOS / Linux |
| 33 | +source .venv/bin/activate |
| 34 | + |
| 35 | +# Install dependencies |
| 36 | +pip install -r requirements.txt |
| 37 | +``` |
| 38 | + |
| 39 | +## Commands for migration |
| 40 | + |
| 41 | +The script prompts interactively for required endpoints, connection strings, resource groups, |
| 42 | +and subscription ID based on the chosen command and flags. |
| 43 | + |
| 44 | +```bash |
| 45 | +# Export from source resource group |
| 46 | +python migrate.py export |
| 47 | + |
| 48 | +# Import into target resource group |
| 49 | +python migrate.py import |
| 50 | + |
| 51 | +# Full migration (both steps) |
| 52 | +python migrate.py export-import |
| 53 | +``` |
| 54 | + |
| 55 | +**Optional flags:** `--search-only`, `--cosmos-only`, `--blob-only`, `--verbose` |
| 56 | + |
| 57 | +### Example — full export |
| 58 | + |
| 59 | +``` |
| 60 | +$ python migrate.py export |
| 61 | +Enter SOURCE Search Endpoint (e.g. https://<name>.search.windows.net): https://my-source.search.windows.net |
| 62 | +Enter SOURCE Cosmos DB connection string: mongodb://... |
| 63 | +Enter SOURCE Storage Account name: mysourcestorage |
| 64 | +Enter your Azure Subscription ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
| 65 | +Enter the resource group for source search service 'my-source': rg-source |
| 66 | +Enter the resource group for source storage account 'mysourcestorage': rg-source |
| 67 | +... |
| 68 | +``` |
| 69 | + |
| 70 | +### Example — blob-only import |
| 71 | + |
| 72 | +``` |
| 73 | +$ python migrate.py import --blob-only |
| 74 | +Enter TARGET Storage Account name: mytargetstorage |
| 75 | +Enter your Azure Subscription ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
| 76 | +Enter the resource group for target storage account 'mytargetstorage': rg-target |
| 77 | +... |
| 78 | +``` |
| 79 | + |
| 80 | +### Obtaining Azure Credentials |
| 81 | + |
| 82 | +| Value | How to Find | |
| 83 | +|---|---| |
| 84 | +| **Search Endpoint** | Azure Portal → **AI Search** service → **Overview** → **URL** (e.g. `https://<name>.search.windows.net`) | |
| 85 | +| **Cosmos DB Connection String** | Azure Portal → **Azure Cosmos DB** account → **Settings** → **Connection strings** → **Primary Connection String** | |
| 86 | +| **Storage Account Name** | Azure Portal → **Storage accounts** → the account name shown in the list (e.g. `mystorage123`, not a URL) | |
| 87 | +| **Subscription ID** | Azure Portal → **Subscriptions** → copy the **Subscription ID** column | |
| 88 | +| **Resource Group** | Azure Portal → the resource's **Overview** page → **Resource group** field | |
| 89 | + |
| 90 | +## What Gets Migrated |
| 91 | + |
| 92 | +| Service | Data | |
| 93 | +|---|---| |
| 94 | +| Azure AI Search | All index schemas + all documents (including embeddings) | |
| 95 | +| Cosmos DB | `DPS.ChatHistory` and `DPS.Documents` collections | |
| 96 | +| Azure Blob Storage | All containers and blobs with content-type metadata preserved | |
| 97 | + |
| 98 | + |
| 99 | +## Export Format |
| 100 | + |
| 101 | +``` |
| 102 | +exported_data/ |
| 103 | +├── search/ |
| 104 | +│ ├── <index>_schema.json # Index definition |
| 105 | +│ └── <index>_documents.jsonl # One JSON document per line |
| 106 | +├── cosmos/ |
| 107 | +│ ├── ChatHistory.jsonl |
| 108 | +│ ├── ChatHistory.checksum |
| 109 | +│ ├── Documents.jsonl |
| 110 | +│ └── Documents.checksum |
| 111 | +└── blobstorage/ |
| 112 | + ├── <container>/ |
| 113 | + │ ├── __blob_metadata__.json # Content-type sidecar |
| 114 | + │ └── <blob files...> # Original directory structure preserved |
| 115 | + └── <container>/ |
| 116 | + └── ... |
| 117 | +``` |
| 118 | + |
| 119 | +## Configuration |
| 120 | + |
| 121 | +The script prompts for these values interactively based on the command: |
| 122 | + |
| 123 | +| Prompt | When Asked | Description | |
| 124 | +|---|---|---| |
| 125 | +| Source Search Endpoint | export | Source Azure AI Search endpoint URL | |
| 126 | +| Source Cosmos DB connection string | export | Source Cosmos DB MongoDB connection string | |
| 127 | +| Source Storage Account name | export | Source Azure Blob Storage account name | |
| 128 | +| Target Search Endpoint | import | Target Azure AI Search endpoint URL | |
| 129 | +| Target Cosmos DB connection string | import | Target Cosmos DB MongoDB connection string | |
| 130 | +| Target Storage Account name | import | Target Azure Blob Storage account name | |
| 131 | +| Azure Subscription ID | export/import (Search or Blob) | Subscription for RBAC role management | |
| 132 | +| Resource group | export/import (Search or Blob) | Resource group for each Search/Storage service | |
| 133 | + |
| 134 | +--- |
| 135 | +For complete deployment instructions, refer to the [Deployment Guide](../../docs/DeploymentGuide.md). |
0 commit comments