Skip to content

Fetch from nvidia Megatron-LM#5

Open
RaymondLi0 wants to merge 7237 commits intoElementAI:load-iterfrom
NVIDIA:main
Open

Fetch from nvidia Megatron-LM#5
RaymondLi0 wants to merge 7237 commits intoElementAI:load-iterfrom
NVIDIA:main

Conversation

@RaymondLi0
Copy link
Copy Markdown

No description provided.

Akshat8510 and others added 29 commits April 3, 2026 17:15
Signed-off-by: Akshat Kumar <akshat230405@gmail.com>
…#4084)

Signed-off-by: Deyu Fu <deyuf@nvidia.com>
Co-authored-by: Tom Long <tolong@oci-hsg-cs-001-vscode-02.cm.cluster>
Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Deyu Fu <deyuf@nvidia.com>
…dance (#4035)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…4140)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dlock (#4139)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: meg miranda <mmiranda@nvidia.com>
…ubscriptable`) by not saving a checkpoint after a transient NaN / Inf (#3981)

Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Cory Ye <cye@nvidia.com>
Co-authored-by: Cory Ye <cye@nvidia.com>
Co-authored-by: conver334 <conver334@gmail.com>
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
…graphs) (#4085)

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
… workflow (#4199)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ment-wise distributed optimizer (#4138)

Signed-off-by: Hao Wu <skyw@nvidia.com>
maanug-nv and others added 30 commits May 4, 2026 16:41
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: guihong-nv <guihongl@nvidia.com>
Co-authored-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: Deepak Narayanan <deepakn94@gmail.com>
Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
… and re-simplify ep_sync accidentally reverted by #4306 (#4587)

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Co-authored-by: Siddharth Singh <sidsingh@nvidia.com>
Co-authored-by: root <root@nvl72098-T11.cm.cluster>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: root <root@nvl72078-T18.cm.cluster>
Co-authored-by: William Dykas <wdykas@oci-hsg-cs-001-vscode-03.cm.cluster>
Co-authored-by: root <root@nvl72102-T05.cm.cluster>
Signed-off-by: meg miranda <mmiranda@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…n_dev (#4639)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.