Skip to content

Commit f99e2c1

Browse files
authored
Revise DGX Spark documentation
1 parent e35c3e5 commit f99e2c1

1 file changed

Lines changed: 7 additions & 13 deletions

File tree

docs/dgx_spark.md

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,8 @@ COPY requirements.txt .
4343
RUN pip install -r requirements.txt
4444
4545
# Install other major Python libraries in separate layers for better caching
46-
RUN pip install --upgrade torch torchvision
4746
RUN pip install "jax[cuda13-local]==0.7.2"
4847
49-
# Install Transformer Engine and its dependency pybind11
50-
RUN pip install pybind11 && \
51-
export NVTE_FRAMEWORK=jax && \
52-
pip install --no-build-isolation 'transformer_engine[jax]==2.6.0'
53-
5448
# --- Application Code Layer ---
5549
# Now, copy your application source code. This layer is rebuilt only when your code changes.
5650
COPY . .
@@ -125,9 +119,9 @@ The script will initialize, use the models from your mounted cache, and begin th
125119

126120
## Part 4: Accessing Your Generated Image
127121

128-
The generation script saves the final image to its working directory (/app) inside the container. Here is the complete workflow to get that image onto your Mac.
122+
The generation script saves the final image to its working directory (/app) inside the container. Here is the complete workflow to get that image onto your Laptop.
129123

130-
### Step 1: Copy the Image from Container to VM
124+
### Step 1: Copy the Image from Container to DGX Spark
131125

132126
Open a new terminal window. Do not close the terminal where the container is running.
133127
First, find your container's ID:
@@ -140,22 +134,22 @@ Look for the container with the image maxdiffusion-arm-gpu and note its ID (e.g.
140134
Now, copy the image from the container to a temporary location on DGX Spark and fix its permissions.
141135

142136
```bash
143-
# Copy the file to the /tmp/ directory on the VM
137+
# Copy the file to the /tmp/ directory on DGX Spark
144138
docker cp 9049895399fc:/app/flux_0.png /tmp/flux_0.png
145139

146140
# Change the file's owner to your user to avoid permission errors
147141
sudo chown username:username /tmp/flux_0.png
148142
```
149143

150-
### Step 2: Copy the Image from VM to Your MAC
144+
### Step 2: Copy the Image from DGX Spark to Your Laptop
151145

152146
Now, open the Terminal app on your Laptop and use the scp (secure copy) command to download the file from DGX Spark.
153147

154148
```bash
155149
scp username@spark:/tmp/flux_0.png .
156150
```
157151

158-
This command will download flux_0.png to the current directory on your Mac. You can now view your generated image!
152+
This command will download flux_0.png to the current directory on your Laptop. You can now view your generated image!
159153

160154
## Troubleshooting and Common Pitfalls
161155

@@ -171,14 +165,14 @@ Here are solutions to common issues you might encounter:
171165
- **Solution**: This is solved by launching the container with the `-v ~/.cache/huggingface:/root/.cache/huggingface` flag, which gives the container access to your local model cache.
172166
- Error: `open ... permission denied` when trying to access a copied file.
173167
- **Cause**: Files copied from a Docker container with docker cp are owned by the root user by default.
174-
- **Solution**: After copying the file to the VM, immediately run `sudo chown your_user:your_user /path/to/file` to take ownership before trying to access or transfer it.
168+
- **Solution**: After copying the file to the DGX Spark, immediately run `sudo chown your_user:your_user /path/to/file` to take ownership before trying to access or transfer it.
175169
- Can't find the generated image.
176170
- **Cause**: The script may not be saving the image to the directory specified by the output_dir argument.
177171
- **Solution**: Always check the script's source code to confirm the final save location. As we discovered, generate_flux.py saves to the current working directory (/app), not /tmp. Knowing this allows you to copy the file from the correct location.
178172
- If a process requires more memory than the available RAM, your system will crash with an "Out-of-Memory" (OOM) error.
179173
- `Swap memory is your safety net.` It's a designated space on your hard drive that the operating system uses as a "virtual" extension of your RAM. When RAM is full, the system moves less active data to the slower swap space, freeing up RAM for the immediate task. While it's slower than RAM, it's infinitely better than a system crash, ensuring your long-running training or generation jobs can complete successfully. For a machine with 119GB of RAM, adding 64GB of swap provides a robust buffer for memory-intensive operations.
180174
- Step 1: Create a 64GB Swap File
181-
- Run these commands on your spark-1c91 VM to create, format, and enable a permanent 64GB swap file.
175+
- Run these commands on your DGX Spark to create, format, and enable a permanent 64GB swap file.
182176

183177
```bash
184178
# Instantly allocate a 64GB file

0 commit comments

Comments
 (0)