You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -262,6 +262,17 @@ After installation completes, run the training script.
262
262
- For use on GPU it is recommended to enable the cudnn_te_flash attention kernel for optimal performance.
263
263
- Best performance is achieved with the use of batch parallelism, which can be enabled by using the ici_fsdp_batch_parallelism axis. Note that this parallelism strategy does not support fractional batch sizes.
264
264
- ici_fsdp_batch_parallelism and ici_fsdp_parallelism can be combined to allow for fractional batch sizes. However, padding is not currently supported for the cudnn_te_flash attention kernel and it is therefore required that the sequence length is divisible by the number of devices in the ici_fsdp_parallelism axis.
265
+
- For benchmarking training performance on multiple data dimension input without downloading/re-processing the dataset, the synthetic data iterator is supported.
266
+
- Set dataset_type='synthetic' and synthetic_num_samples=null to enable the synthetic data iterator.
267
+
- The following overrides on data dimensions are supported:
0 commit comments