Skip to content

vukrosic/zero-to-ai-researcher

Repository files navigation

Zero to AI Researcher: A Complete Learning Journey

Discord

This repository contains a comprehensive course in the _course folder and scientific research on LLMs, including training language models from scratch. A complete learning journey that takes you from absolute beginner to conducting cutting-edge AI research.

YouTube videos and self-study materials working together to create ultimate # Zero to AI Researcher roadmap.

🎬 YouTube Course Coming Soon

🎯 What You'll Learn

  • Python Programming: Master Python from variables to advanced OOP, preparing you for AI/ML development with NumPy, Pandas, and PyTorch
  • AI Mathematics: Build intuition for functions, derivatives, gradients, vectors, and matrices - the mathematical foundation that powers all AI
  • Neural Networks: Understand and implement everything from single neurons to complex multi-layer networks, including backpropagation
  • Transformers: Build complete transformer architectures from scratch, including attention mechanisms, positional encoding, and feedforward layers
  • Research Skills: Learn to design controlled experiments, conduct ablation studies, and analyze results like a professional AI researcher
  • State-of-the-Art Techniques: Implement cutting-edge architectures including DeepSeek's latent attention and GLM-4's Mixture of Experts
  • Practical Implementation: Write production-ready code, handle real datasets, and optimize model performance through systematic experimentation

📚 Complete Curriculum

🚀 Getting Started

  • Start Here - Course introduction and learning philosophy

📖 Module 1: Python Fundamentals

Master the programming language that powers AI

🧮 Module 2: Math Not Scary

The mathematical foundations of AI, explained simply

  • Functions - Understanding mathematical functions
  • Derivatives - The foundation of optimization
  • Gradients - Multi-dimensional derivatives
  • Vectors - Vector operations and properties
  • Matrices - Matrix operations and linear algebra

🔥 Module 3: PyTorch Fundamentals

Master the deep learning framework

🧠 Module 4: Neuron from Scratch

Understanding the basic building block of AI

⚡ Module 5: Activation Functions

The non-linear functions that make neural networks powerful

  • ReLU - Rectified Linear Unit
  • Sigmoid - The S-shaped function
  • Tanh - Hyperbolic tangent
  • SiLU - Sigmoid Linear Unit
  • SwiGLU - Swish-Gated Linear Unit
  • Softmax - Probability distributions

🕸️ Module 6: Neural Network from Scratch

Building complete networks and understanding backpropagation

🎯 Module 7: Attention Mechanism

The breakthrough that revolutionized AI

🔄 Module 8: Transformer Feedforward

The feedforward layers and Mixture of Experts

🏗️ Module 9: Building a Transformer

Assembling the complete architecture

🚀 Module 10: DeepSeek Latent Attention

Advanced attention mechanisms from DeepSeek models

🎭 Module 11: GLM-4 Mixture of Experts

State-of-the-art MoE implementation

🔬 Research Methodology

After mastering the fundamentals, this course teaches you how to conduct real AI research through hands-on experiments.

Research Design Principles

  • Hypothesis Formation: Start with clear, testable hypotheses
  • Controlled Experiments: Isolate variables to understand their effects
  • Ablation Studies: Systematically remove components to understand contributions
  • Baseline Comparisons: Always compare against established baselines

Experimental Framework

Our research experiments follow a structured approach:

Experiment 1: Simplified Ablation Study

  • Purpose: Compare different architectural components at a manageable scale
  • Models: 5 variants (baseline, MLP, attention+MLP, MoE, attention+MoE)
  • Scale: 512 hidden dimensions for efficient experimentation
  • Evaluation: HellaSwag benchmark integration
  • Key Learning: Understanding how different components contribute to performance

Experiment 2: Learning Rate Search

  • Purpose: Find optimal learning rates for different architectures
  • Focus: DeepSeek attention + MLP combinations
  • Method: Systematic learning rate exploration
  • Metrics: Validation loss, accuracy, perplexity
  • Key Learning: How hyperparameters affect different architectures

Experiment 3: Expert Configuration Search

  • Purpose: Optimize MoE configurations
  • Focus: DeepSeek attention + GLM4 MoE
  • Variables: Expert count, learning rates, top-k values
  • Method: Grid search with validation
  • Key Learning: How to scale MoE models effectively

Research Skills You'll Develop

  • Experimental Design: Creating meaningful, controlled experiments
  • Data Analysis: Interpreting results and drawing conclusions
  • Benchmarking: Using standard evaluation metrics
  • Reproducibility: Writing code that others can replicate
  • Documentation: Communicating research findings clearly

How to Run the Research Experiments

# Experiment 1: Simplified Ablation Study
cd experiments/exp1_simplified_ablation_study
python exp1_trainer.py

# Experiment 2: Learning Rate Search
cd experiments/exp2_deepseek_attn_mlp_lr_search
python lr_search.py

# Experiment 3: Expert Configuration Search
cd experiments/exp3_deepseek_attn_glm4_moe_lr_expert_search
python expert_search.py

🚀 Getting Started

  1. Clone and install:
git clone <repository-url> && cd zero-to-ai-researcher
pip install -r requirements.txt
  1. Start learning: Begin with Start Here

  2. Follow the path: Complete modules 1-11 in order, then run the research experiments

🤝 Contributing

We welcome contributions to improve this course:

  • Content Improvements: Better explanations, examples, or exercises
  • New Modules: Additional topics or advanced concepts
  • Research Experiments: New experimental designs
  • Documentation: Clearer instructions or additional resources
  • Bug Fixes: Code corrections or improvements

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • DeepSeek: For the advanced attention architecture
  • GLM-4: For the MoE implementation inspiration
  • HuggingFace: For the transformer library foundation
  • PyTorch: For the deep learning framework
  • OpenAI: For the transformer architecture
  • Google: For the attention mechanism

📞 Support and Community

  • GitHub Issues: Report problems or suggest improvements
  • Discussions: Connect with other learners
  • Code Reviews: Get feedback on your implementations
  • Research Collaboration: Work together on experiments

Ready to start your journey from zero to AI researcher? Begin with Start Here and remember: every expert was once a beginner. Take your time, practice regularly, and don't hesitate to experiment!

Happy Learning and Researching! 🚀🧠

About

Research on training an LLM with DeepSeek & Kimi architecture

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •  

Packages

 
 
 

Contributors