Zero to AI Researcher: A Complete Learning Journey

This repository contains a comprehensive course in the _course folder and scientific research on LLMs, including training language models from scratch. A complete learning journey that takes you from absolute beginner to conducting cutting-edge AI research.

YouTube videos and self-study materials working together to create ultimate # Zero to AI Researcher roadmap.

🎬 YouTube Course Coming Soon

🎯 What You'll Learn

Python Programming: Master Python from variables to advanced OOP, preparing you for AI/ML development with NumPy, Pandas, and PyTorch
AI Mathematics: Build intuition for functions, derivatives, gradients, vectors, and matrices - the mathematical foundation that powers all AI
Neural Networks: Understand and implement everything from single neurons to complex multi-layer networks, including backpropagation
Transformers: Build complete transformer architectures from scratch, including attention mechanisms, positional encoding, and feedforward layers
Research Skills: Learn to design controlled experiments, conduct ablation studies, and analyze results like a professional AI researcher
State-of-the-Art Techniques: Implement cutting-edge architectures including DeepSeek's latent attention and GLM-4's Mixture of Experts
Practical Implementation: Write production-ready code, handle real datasets, and optimize model performance through systematic experimentation

📚 Complete Curriculum

🚀 Getting Started

Start Here - Course introduction and learning philosophy

📖 Module 1: Python Fundamentals

Master the programming language that powers AI

Python Basics - Variables, data types, and functions
Control Flow and Loops - If statements, loops, and program flow
Lists and Data Structures - Lists, dictionaries, tuples, and sets
File Handling and Modules - Working with files and Python modules
Error Handling and Debugging - Exception handling and debugging techniques
Object-Oriented Programming - Classes, inheritance, and OOP concepts
Advanced Python Features - Decorators, generators, and context managers
Preparing for AI/ML - NumPy, Pandas, Matplotlib, and Scikit-learn
Python Best Practices - Code quality, testing, and performance

🧮 Module 2: Math Not Scary

The mathematical foundations of AI, explained simply

Functions - Understanding mathematical functions
Derivatives - The foundation of optimization
Gradients - Multi-dimensional derivatives
Vectors - Vector operations and properties
Matrices - Matrix operations and linear algebra

🔥 Module 3: PyTorch Fundamentals

Master the deep learning framework

Creating Tensors - The building blocks of deep learning
Tensor Addition - Basic tensor operations
Matrix Multiplication - The core operation of neural networks
Transposing - Reshaping tensors for operations
Reshaping Tensors - Changing tensor dimensions
Indexing and Slicing - Accessing tensor elements
Concatenation - Combining tensors
Creating Special Tensors - Ones, zeros, and random tensors
Tokenization and Embeddings - Converting text to numbers

🧠 Module 4: Neuron from Scratch

Understanding the basic building block of AI

What is a Neuron? - The fundamental unit of neural networks
The Linear Step - Weighted sum computation
The Activation Function - Adding non-linearity
Building a Neuron in Python - Implementation from scratch
Making a Prediction - Using neurons for inference
The Concept of Loss - Measuring prediction errors
The Concept of Learning - How neurons improve over time

⚡ Module 5: Activation Functions

The non-linear functions that make neural networks powerful

ReLU - Rectified Linear Unit
Sigmoid - The S-shaped function
Tanh - Hyperbolic tangent
SiLU - Sigmoid Linear Unit
SwiGLU - Swish-Gated Linear Unit
Softmax - Probability distributions

🕸️ Module 6: Neural Network from Scratch

Building complete networks and understanding backpropagation

Architecture of a Network - How layers connect
Building a Layer - Implementing network layers
Implementing a Network - Complete network implementation
The Chain Rule - Mathematical foundation of backpropagation
Calculating Gradients - Computing derivatives
Backpropagation in Action - How networks learn
Implementing Backpropagation - Code implementation

🎯 Module 7: Attention Mechanism

The breakthrough that revolutionized AI

What is Attention? - Understanding the attention concept
Self-Attention from Scratch - Building attention step by step
Calculating Attention Scores - Query, key, and value operations
Applying Attention Weights - Weighted combinations
Multi-Head Attention - Multiple attention mechanisms
Attention in Code - Complete implementation

🔄 Module 8: Transformer Feedforward

The feedforward layers and Mixture of Experts

The Feedforward Layer - Standard MLP layers
What is Mixture of Experts? - Introduction to MoE
The Expert - Individual expert networks
The Gate - Expert selection mechanism
Combining Experts - Weighted expert outputs
MoE in a Transformer - Integration with attention
MoE in Code - Implementation
The DeepSeek MLP - Advanced MLP design

🏗️ Module 9: Building a Transformer

Assembling the complete architecture

Transformer Architecture - High-level overview
RoPE Positional Encoding - Rotary positional embeddings
Building a Transformer Block - Attention + feedforward
The Final Linear Layer - Output projection
Full Transformer in Code - Complete implementation
Training a Transformer - Training process overview

🚀 Module 10: DeepSeek Latent Attention

Advanced attention mechanisms from DeepSeek models

What is Latent Attention? - Understanding latent attention
DeepSeek Attention Architecture - DeepSeek's specific design
Implementation in Code - Building DeepSeek attention

🎭 Module 11: GLM-4 Mixture of Experts

State-of-the-art MoE implementation

Revisiting Mixture of Experts - MoE fundamentals recap
The GLM-4 MoE Architecture - GLM-4's MoE design
Implementation in Code - Building GLM-4 MoE

🔬 Research Methodology

After mastering the fundamentals, this course teaches you how to conduct real AI research through hands-on experiments.

Research Design Principles

Hypothesis Formation: Start with clear, testable hypotheses
Controlled Experiments: Isolate variables to understand their effects
Ablation Studies: Systematically remove components to understand contributions
Baseline Comparisons: Always compare against established baselines

Experimental Framework

Our research experiments follow a structured approach:

Experiment 1: Simplified Ablation Study

Purpose: Compare different architectural components at a manageable scale
Models: 5 variants (baseline, MLP, attention+MLP, MoE, attention+MoE)
Scale: 512 hidden dimensions for efficient experimentation
Evaluation: HellaSwag benchmark integration
Key Learning: Understanding how different components contribute to performance

Experiment 2: Learning Rate Search

Purpose: Find optimal learning rates for different architectures
Focus: DeepSeek attention + MLP combinations
Method: Systematic learning rate exploration
Metrics: Validation loss, accuracy, perplexity
Key Learning: How hyperparameters affect different architectures

Experiment 3: Expert Configuration Search

Purpose: Optimize MoE configurations
Focus: DeepSeek attention + GLM4 MoE
Variables: Expert count, learning rates, top-k values
Method: Grid search with validation
Key Learning: How to scale MoE models effectively

Research Skills You'll Develop

Experimental Design: Creating meaningful, controlled experiments
Data Analysis: Interpreting results and drawing conclusions
Benchmarking: Using standard evaluation metrics
Reproducibility: Writing code that others can replicate
Documentation: Communicating research findings clearly

How to Run the Research Experiments

# Experiment 1: Simplified Ablation Study
cd experiments/exp1_simplified_ablation_study
python exp1_trainer.py

# Experiment 2: Learning Rate Search
cd experiments/exp2_deepseek_attn_mlp_lr_search
python lr_search.py

# Experiment 3: Expert Configuration Search
cd experiments/exp3_deepseek_attn_glm4_moe_lr_expert_search
python expert_search.py

🚀 Getting Started

Clone and install:

git clone <repository-url> && cd zero-to-ai-researcher
pip install -r requirements.txt

Start learning: Begin with Start Here
Follow the path: Complete modules 1-11 in order, then run the research experiments

🤝 Contributing

We welcome contributions to improve this course:

Content Improvements: Better explanations, examples, or exercises
New Modules: Additional topics or advanced concepts
Research Experiments: New experimental designs
Documentation: Clearer instructions or additional resources
Bug Fixes: Code corrections or improvements

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

DeepSeek: For the advanced attention architecture
GLM-4: For the MoE implementation inspiration
HuggingFace: For the transformer library foundation
PyTorch: For the deep learning framework
OpenAI: For the transformer architecture
Google: For the attention mechanism

📞 Support and Community

GitHub Issues: Report problems or suggest improvements
Discussions: Connect with other learners
Code Reviews: Get feedback on your implementations
Research Collaboration: Work together on experiments

Ready to start your journey from zero to AI researcher? Begin with Start Here and remember: every expert was once a beginner. Take your time, practice regularly, and don't hesitate to experiment!

Happy Learning and Researching! 🚀🧠

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
.github		.github
_course		_course
configs		configs
data		data
experiments		experiments
models		models
optimizers		optimizers
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai_research_paper.md		ai_research_paper.md
configuration_deepseek.py		configuration_deepseek.py
deepseek_modeling.py		deepseek_modeling.py
gpu_monitor.py		gpu_monitor.py
requirements.txt		requirements.txt
train_moe.py		train_moe.py

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Zero to AI Researcher: A Complete Learning Journey

🎬 YouTube Course Coming Soon

🎯 What You'll Learn

📚 Complete Curriculum

🚀 Getting Started

📖 Module 1: Python Fundamentals

🧮 Module 2: Math Not Scary

🔥 Module 3: PyTorch Fundamentals

🧠 Module 4: Neuron from Scratch

⚡ Module 5: Activation Functions

🕸️ Module 6: Neural Network from Scratch

🎯 Module 7: Attention Mechanism

🔄 Module 8: Transformer Feedforward

🏗️ Module 9: Building a Transformer

🚀 Module 10: DeepSeek Latent Attention

🎭 Module 11: GLM-4 Mixture of Experts

🔬 Research Methodology

Research Design Principles

Experimental Framework

Experiment 1: Simplified Ablation Study

Experiment 2: Learning Rate Search

Experiment 3: Expert Configuration Search

Research Skills You'll Develop

How to Run the Research Experiments

🚀 Getting Started

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support and Community

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages