Claim support for Engram and mHC

shuningjin · shuningjin · commit 92f55e3d8123 · 2026-03-06T23:28:02.000Z
diff --git a/README.md b/README.md
@@ -41,6 +41,7 @@ See our guide on running MaxText in decoupled mode, without any GCP dependencies
 
 ## 🔥 Latest news 🔥
 
+* \[March 6, 2026\] New features from DeepSeek-AI are now supported: Conditional Memory via Scalable Lookup ([Engram](https://arxiv.org/abs/2601.07372)) and Manifold-Constrained Hyper-Connections ([mHC](https://arxiv.org/abs/2512.24880)). Try them out with our [deepseek-custom](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/configs/models/deepseek-custom.yml) starter config.
 * \[March 5, 2026\] New `tpu-post-train` [target in PyPI](https://pypi.org/project/maxtext). Please also use this installation option for running vllm_decode. See the [MaxText installation instructions](https://maxtext.readthedocs.io/en/latest/install_maxtext.html) for more info.
 * \[March 5, 2026\] [Qwen3-Next](https://github.com/AI-Hypercomputer/maxtext/blob/7656eb8d1c9eb0dd91e617a6fdf6ad805221221a/tests/end_to_end/tpu/qwen/next/run_qwen3_next.md) is now supported.
 * \[February 27, 2026\] New MaxText structure! MaxText has been restructured according to [RESTRUCTURE.md](https://github.com/AI-Hypercomputer/maxtext/blob/1b9e38aa0a19b6018feb3aed757406126b6953a1/RESTRUCTURE.md). Please feel free to share your thoughts and feedback. 
diff --git a/src/maxtext/configs/models/deepseek-custom.yml b/src/maxtext/configs/models/deepseek-custom.yml
@@ -13,6 +13,10 @@
 # limitations under the License.
 
 # Small model config for testing (derived from DeepSeek V3.2 - 671B)
+# Included modules: DeepSeek Sparse Attention, Engram, mHC
+
+# Example command:
+# python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=demo model_name=deepseek-custom scan_layers=True attention=flash use_tokamax_splash=True enable_checkpointing=false async_checkpointing=false dataset_type=synthetic steps=5 per_device_batch_size=4 max_target_length=1024 dtype=bfloat16 weight_dtype=bfloat16 tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V3.2 hf_access_token=${HF_TOKEN}
 
 base_emb_dim: 1024             # Reduced from 7168
 base_num_query_heads: 16       # Reduced from 128