Skip to content

Commit 6597ce5

Browse files
Cretae Script to migrate data from existing deployment to new deployment
1 parent 54983ec commit 6597ce5

4 files changed

Lines changed: 1253 additions & 0 deletions

File tree

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Data Migration
2+
3+
Migrates data from **Azure AI Search**, **Cosmos DB** (MongoDB API), and **Azure Blob Storage**
4+
in a source resource group into target services in a new resource group.
5+
6+
The script automatically handles RBAC role assignment/revocation, retry logic, pagination,
7+
content-type preservation for blobs, and Windows long-path support.
8+
9+
## Prerequisites
10+
11+
- Azure CLI installed and logged in (`az login`)
12+
- Cosmos DB MongoDB connection strings for source and/or target
13+
- Your Azure AD account needs **Owner** or **User Access Administrator** on the source/target
14+
resource groups so the script can temporarily assign the required RBAC roles (see below)
15+
16+
17+
## Setup
18+
19+
```bash
20+
cd Deployment/data_migration
21+
22+
# Create virtual environment
23+
python -m venv .venv
24+
25+
# Activate virtual environment
26+
# Windows (PowerShell)
27+
.venv\Scripts\Activate.ps1
28+
29+
# Windows (Command Prompt)
30+
.venv\Scripts\activate.bat
31+
32+
# macOS / Linux
33+
source .venv/bin/activate
34+
35+
# Install dependencies
36+
pip install -r requirements.txt
37+
```
38+
39+
## Commands for migration
40+
41+
The script prompts interactively for required endpoints, connection strings, resource groups,
42+
and subscription ID based on the chosen command and flags.
43+
44+
```bash
45+
# Export from source resource group
46+
python migrate.py export
47+
48+
# Import into target resource group
49+
python migrate.py import
50+
51+
# Full migration (both steps)
52+
python migrate.py export-import
53+
```
54+
55+
**Optional flags:** `--search-only`, `--cosmos-only`, `--blob-only`, `--verbose`
56+
57+
### Example — full export
58+
59+
```
60+
$ python migrate.py export
61+
Enter SOURCE Search Endpoint (e.g. https://<name>.search.windows.net): https://my-source.search.windows.net
62+
Enter SOURCE Cosmos DB connection string: mongodb://...
63+
Enter SOURCE Storage Account name: mysourcestorage
64+
Enter your Azure Subscription ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
65+
Enter the resource group for source search service 'my-source': rg-source
66+
Enter the resource group for source storage account 'mysourcestorage': rg-source
67+
...
68+
```
69+
70+
### Example — blob-only import
71+
72+
```
73+
$ python migrate.py import --blob-only
74+
Enter TARGET Storage Account name: mytargetstorage
75+
Enter your Azure Subscription ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
76+
Enter the resource group for target storage account 'mytargetstorage': rg-target
77+
...
78+
```
79+
80+
### Obtaining Azure Credentials
81+
82+
| Value | How to Find |
83+
|---|---|
84+
| **Search Endpoint** | Azure Portal → **AI Search** service → **Overview****URL** (e.g. `https://<name>.search.windows.net`) |
85+
| **Cosmos DB Connection String** | Azure Portal → **Azure Cosmos DB** account → **Settings****Connection strings****Primary Connection String** |
86+
| **Storage Account Name** | Azure Portal → **Storage accounts** → the account name shown in the list (e.g. `mystorage123`, not a URL) |
87+
| **Subscription ID** | Azure Portal → **Subscriptions** → copy the **Subscription ID** column |
88+
| **Resource Group** | Azure Portal → the resource's **Overview** page → **Resource group** field |
89+
90+
## What Gets Migrated
91+
92+
| Service | Data |
93+
|---|---|
94+
| Azure AI Search | All index schemas + all documents (including embeddings) |
95+
| Cosmos DB | `DPS.ChatHistory` and `DPS.Documents` collections |
96+
| Azure Blob Storage | All containers and blobs with content-type metadata preserved |
97+
98+
99+
## Export Format
100+
101+
```
102+
exported_data/
103+
├── search/
104+
│ ├── <index>_schema.json # Index definition
105+
│ └── <index>_documents.jsonl # One JSON document per line
106+
├── cosmos/
107+
│ ├── ChatHistory.jsonl
108+
│ ├── ChatHistory.checksum
109+
│ ├── Documents.jsonl
110+
│ └── Documents.checksum
111+
└── blobstorage/
112+
├── <container>/
113+
│ ├── __blob_metadata__.json # Content-type sidecar
114+
│ └── <blob files...> # Original directory structure preserved
115+
└── <container>/
116+
└── ...
117+
```
118+
119+
## Configuration
120+
121+
The script prompts for these values interactively based on the command:
122+
123+
| Prompt | When Asked | Description |
124+
|---|---|---|
125+
| Source Search Endpoint | export | Source Azure AI Search endpoint URL |
126+
| Source Cosmos DB connection string | export | Source Cosmos DB MongoDB connection string |
127+
| Source Storage Account name | export | Source Azure Blob Storage account name |
128+
| Target Search Endpoint | import | Target Azure AI Search endpoint URL |
129+
| Target Cosmos DB connection string | import | Target Cosmos DB MongoDB connection string |
130+
| Target Storage Account name | import | Target Azure Blob Storage account name |
131+
| Azure Subscription ID | export/import (Search or Blob) | Subscription for RBAC role management |
132+
| Resource group | export/import (Search or Blob) | Resource group for each Search/Storage service |
133+
134+
---
135+
For complete deployment instructions, refer to the [Deployment Guide](../../docs/DeploymentGuide.md).

0 commit comments

Comments
 (0)