Skip to content

Commit 92f55e3

Browse files
committed
Claim support for Engram and mHC
1 parent 781d30a commit 92f55e3

2 files changed

Lines changed: 5 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ See our guide on running MaxText in decoupled mode, without any GCP dependencies
4141

4242
## 🔥 Latest news 🔥
4343

44+
* \[March 6, 2026\] New features from DeepSeek-AI are now supported: Conditional Memory via Scalable Lookup ([Engram](https://arxiv.org/abs/2601.07372)) and Manifold-Constrained Hyper-Connections ([mHC](https://arxiv.org/abs/2512.24880)). Try them out with our [deepseek-custom](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/configs/models/deepseek-custom.yml) starter config.
4445
* \[March 5, 2026\] New `tpu-post-train` [target in PyPI](https://pypi.org/project/maxtext). Please also use this installation option for running vllm_decode. See the [MaxText installation instructions](https://maxtext.readthedocs.io/en/latest/install_maxtext.html) for more info.
4546
* \[March 5, 2026\] [Qwen3-Next](https://github.com/AI-Hypercomputer/maxtext/blob/7656eb8d1c9eb0dd91e617a6fdf6ad805221221a/tests/end_to_end/tpu/qwen/next/run_qwen3_next.md) is now supported.
4647
* \[February 27, 2026\] New MaxText structure! MaxText has been restructured according to [RESTRUCTURE.md](https://github.com/AI-Hypercomputer/maxtext/blob/1b9e38aa0a19b6018feb3aed757406126b6953a1/RESTRUCTURE.md). Please feel free to share your thoughts and feedback.

src/maxtext/configs/models/deepseek-custom.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@
1313
# limitations under the License.
1414

1515
# Small model config for testing (derived from DeepSeek V3.2 - 671B)
16+
# Included modules: DeepSeek Sparse Attention, Engram, mHC
17+
18+
# Example command:
19+
# python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=demo model_name=deepseek-custom scan_layers=True attention=flash use_tokamax_splash=True enable_checkpointing=false async_checkpointing=false dataset_type=synthetic steps=5 per_device_batch_size=4 max_target_length=1024 dtype=bfloat16 weight_dtype=bfloat16 tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V3.2 hf_access_token=${HF_TOKEN}
1620

1721
base_emb_dim: 1024 # Reduced from 7168
1822
base_num_query_heads: 16 # Reduced from 128

0 commit comments

Comments
 (0)