This repository contains a comprehensive course in the _course folder and scientific research on LLMs, including training language models from scratch. A complete learning journey that takes you from absolute beginner to conducting cutting-edge AI research.
YouTube videos and self-study materials working together to create ultimate # Zero to AI Researcher roadmap.
- Python Programming: Master Python from variables to advanced OOP, preparing you for AI/ML development with NumPy, Pandas, and PyTorch
- AI Mathematics: Build intuition for functions, derivatives, gradients, vectors, and matrices - the mathematical foundation that powers all AI
- Neural Networks: Understand and implement everything from single neurons to complex multi-layer networks, including backpropagation
- Transformers: Build complete transformer architectures from scratch, including attention mechanisms, positional encoding, and feedforward layers
- Research Skills: Learn to design controlled experiments, conduct ablation studies, and analyze results like a professional AI researcher
- State-of-the-Art Techniques: Implement cutting-edge architectures including DeepSeek's latent attention and GLM-4's Mixture of Experts
- Practical Implementation: Write production-ready code, handle real datasets, and optimize model performance through systematic experimentation
- Start Here - Course introduction and learning philosophy
Master the programming language that powers AI
- Python Basics - Variables, data types, and functions
- Control Flow and Loops - If statements, loops, and program flow
- Lists and Data Structures - Lists, dictionaries, tuples, and sets
- File Handling and Modules - Working with files and Python modules
- Error Handling and Debugging - Exception handling and debugging techniques
- Object-Oriented Programming - Classes, inheritance, and OOP concepts
- Advanced Python Features - Decorators, generators, and context managers
- Preparing for AI/ML - NumPy, Pandas, Matplotlib, and Scikit-learn
- Python Best Practices - Code quality, testing, and performance
The mathematical foundations of AI, explained simply
- Functions - Understanding mathematical functions
- Derivatives - The foundation of optimization
- Gradients - Multi-dimensional derivatives
- Vectors - Vector operations and properties
- Matrices - Matrix operations and linear algebra
Master the deep learning framework
- Creating Tensors - The building blocks of deep learning
- Tensor Addition - Basic tensor operations
- Matrix Multiplication - The core operation of neural networks
- Transposing - Reshaping tensors for operations
- Reshaping Tensors - Changing tensor dimensions
- Indexing and Slicing - Accessing tensor elements
- Concatenation - Combining tensors
- Creating Special Tensors - Ones, zeros, and random tensors
- Tokenization and Embeddings - Converting text to numbers
Understanding the basic building block of AI
- What is a Neuron? - The fundamental unit of neural networks
- The Linear Step - Weighted sum computation
- The Activation Function - Adding non-linearity
- Building a Neuron in Python - Implementation from scratch
- Making a Prediction - Using neurons for inference
- The Concept of Loss - Measuring prediction errors
- The Concept of Learning - How neurons improve over time
The non-linear functions that make neural networks powerful
- ReLU - Rectified Linear Unit
- Sigmoid - The S-shaped function
- Tanh - Hyperbolic tangent
- SiLU - Sigmoid Linear Unit
- SwiGLU - Swish-Gated Linear Unit
- Softmax - Probability distributions
Building complete networks and understanding backpropagation
- Architecture of a Network - How layers connect
- Building a Layer - Implementing network layers
- Implementing a Network - Complete network implementation
- The Chain Rule - Mathematical foundation of backpropagation
- Calculating Gradients - Computing derivatives
- Backpropagation in Action - How networks learn
- Implementing Backpropagation - Code implementation
The breakthrough that revolutionized AI
- What is Attention? - Understanding the attention concept
- Self-Attention from Scratch - Building attention step by step
- Calculating Attention Scores - Query, key, and value operations
- Applying Attention Weights - Weighted combinations
- Multi-Head Attention - Multiple attention mechanisms
- Attention in Code - Complete implementation
The feedforward layers and Mixture of Experts
- The Feedforward Layer - Standard MLP layers
- What is Mixture of Experts? - Introduction to MoE
- The Expert - Individual expert networks
- The Gate - Expert selection mechanism
- Combining Experts - Weighted expert outputs
- MoE in a Transformer - Integration with attention
- MoE in Code - Implementation
- The DeepSeek MLP - Advanced MLP design
Assembling the complete architecture
- Transformer Architecture - High-level overview
- RoPE Positional Encoding - Rotary positional embeddings
- Building a Transformer Block - Attention + feedforward
- The Final Linear Layer - Output projection
- Full Transformer in Code - Complete implementation
- Training a Transformer - Training process overview
Advanced attention mechanisms from DeepSeek models
- What is Latent Attention? - Understanding latent attention
- DeepSeek Attention Architecture - DeepSeek's specific design
- Implementation in Code - Building DeepSeek attention
State-of-the-art MoE implementation
- Revisiting Mixture of Experts - MoE fundamentals recap
- The GLM-4 MoE Architecture - GLM-4's MoE design
- Implementation in Code - Building GLM-4 MoE
After mastering the fundamentals, this course teaches you how to conduct real AI research through hands-on experiments.
- Hypothesis Formation: Start with clear, testable hypotheses
- Controlled Experiments: Isolate variables to understand their effects
- Ablation Studies: Systematically remove components to understand contributions
- Baseline Comparisons: Always compare against established baselines
Our research experiments follow a structured approach:
- Purpose: Compare different architectural components at a manageable scale
- Models: 5 variants (baseline, MLP, attention+MLP, MoE, attention+MoE)
- Scale: 512 hidden dimensions for efficient experimentation
- Evaluation: HellaSwag benchmark integration
- Key Learning: Understanding how different components contribute to performance
- Purpose: Find optimal learning rates for different architectures
- Focus: DeepSeek attention + MLP combinations
- Method: Systematic learning rate exploration
- Metrics: Validation loss, accuracy, perplexity
- Key Learning: How hyperparameters affect different architectures
- Purpose: Optimize MoE configurations
- Focus: DeepSeek attention + GLM4 MoE
- Variables: Expert count, learning rates, top-k values
- Method: Grid search with validation
- Key Learning: How to scale MoE models effectively
- Experimental Design: Creating meaningful, controlled experiments
- Data Analysis: Interpreting results and drawing conclusions
- Benchmarking: Using standard evaluation metrics
- Reproducibility: Writing code that others can replicate
- Documentation: Communicating research findings clearly
# Experiment 1: Simplified Ablation Study
cd experiments/exp1_simplified_ablation_study
python exp1_trainer.py
# Experiment 2: Learning Rate Search
cd experiments/exp2_deepseek_attn_mlp_lr_search
python lr_search.py
# Experiment 3: Expert Configuration Search
cd experiments/exp3_deepseek_attn_glm4_moe_lr_expert_search
python expert_search.py- Clone and install:
git clone <repository-url> && cd zero-to-ai-researcher
pip install -r requirements.txt-
Start learning: Begin with Start Here
-
Follow the path: Complete modules 1-11 in order, then run the research experiments
We welcome contributions to improve this course:
- Content Improvements: Better explanations, examples, or exercises
- New Modules: Additional topics or advanced concepts
- Research Experiments: New experimental designs
- Documentation: Clearer instructions or additional resources
- Bug Fixes: Code corrections or improvements
This project is licensed under the MIT License - see the LICENSE file for details.
- DeepSeek: For the advanced attention architecture
- GLM-4: For the MoE implementation inspiration
- HuggingFace: For the transformer library foundation
- PyTorch: For the deep learning framework
- OpenAI: For the transformer architecture
- Google: For the attention mechanism
- GitHub Issues: Report problems or suggest improvements
- Discussions: Connect with other learners
- Code Reviews: Get feedback on your implementations
- Research Collaboration: Work together on experiments
Ready to start your journey from zero to AI researcher? Begin with Start Here and remember: every expert was once a beginner. Take your time, practice regularly, and don't hesitate to experiment!
Happy Learning and Researching! 🚀🧠