Python → AI · ML · DL

6-Month Complete Roadmap

A structured learning path from beginner Python to production AI — with free courses, verified links, and specialized tracks for Researchers and Developers

6
Months
5
Phases
40+
Topics
50+
Resources
2
Career Tracks
1
Month 1
Python
Basics + OOP
2
Month 2
NumPy, Pandas
Matplotlib + Math
3
Month 3
Machine Learning
scikit-learn
4
Month 4
Deep Learning
PyTorch / TF
5
Month 5
CNNs, RNNs
Transformers
6
Month 6
Specialize
Research or Dev
📈 Skills You'll Build
Python Programming
Data Analysis
Data Visualization
Classical ML
Deep Learning
NLP/Transformers
MLOps/Deployment
📌 How to Use This Roadmap
  • Follow phases in order — each builds on the last
  • Spend 2–4 hours/day studying and coding
  • Complete mini-projects at end of each phase
  • Audit courses free on Coursera (click Audit option)
  • Use Google Colab for free GPU access
  • In Month 6, branch into Research or Developer track
  • Build GitHub portfolio throughout
  • Join: Kaggle, fast.ai forums, Hugging Face community
🐍

Phase 1 — Python Fundamentals

Month 1 · ~10–12 hrs/week · Start here, zero experience needed

🔤 Basics & Syntax

  • Variables, data types, operators
  • Strings and string methods
  • Input/output, f-strings
  • Comments, indentation

🔁 Control Flow

  • if / elif / else
  • for loops and while loops
  • break, continue, pass
  • List comprehensions

📦 Data Structures

  • Lists and list methods
  • Tuples — immutable sequences
  • Dictionaries — key-value pairs
  • Sets — unique collections

⚙️ Functions

  • Defining and calling functions
  • Parameters, *args, **kwargs
  • Return values and scope
  • Lambda functions

🗂️ Files & Exceptions

  • Reading/writing files
  • try / except / finally
  • Custom exceptions
  • Context managers (with)

🏗️ Object-Oriented Programming

  • Classes and objects
  • __init__, self, methods
  • Inheritance and polymorphism
  • Encapsulation

📚 Intermediate Python

  • Modules and packages, pip
  • Generators and iterators
  • Decorators and closures
  • Virtual environments

🔢 Python for Data

  • Working with JSON & CSV
  • Regular expressions (re)
  • datetime module
  • os and sys modules
💡 Phase 1 Tips

Practice by writing small scripts daily. Use Google Colab (free) — no installation needed. Don't memorize syntax — focus on problem-solving logic. Build at least one mini-project: a calculator, word frequency counter, or simple quiz app.

📊

Phase 2 — Data Science Libraries

Month 2 · NumPy · Pandas · Matplotlib · Math & Statistics

🔢 NumPy

  • Creating arrays (1D, 2D, nD)
  • Indexing, slicing, reshaping
  • Mathematical operations
  • Broadcasting rules
  • Linear algebra (dot, matrix mult)
  • Random number generation

🐼 Pandas

  • Series and DataFrames
  • Reading CSV, Excel, JSON
  • Data selection and filtering
  • Handling missing values (fillna, dropna)
  • Groupby and aggregation
  • Merging, joining, concatenating
  • Time series basics

📈 Matplotlib

  • Line, bar, scatter, pie charts
  • Subplots and figure layout
  • Customizing: labels, colors, styles
  • Histograms and box plots
  • Saving figures

🎨 Seaborn

  • Statistical visualizations
  • Heatmaps and pairplots
  • Distribution plots (KDE, violin)
  • Regression plots
  • Categorical plots

🧮 Linear Algebra for ML

  • Vectors and vector operations
  • Matrix multiplication
  • Transpose, inverse
  • Eigenvalues & eigenvectors
  • Singular Value Decomposition (SVD)

📐 Calculus & Optimization

  • Derivatives and gradients
  • Partial derivatives
  • Chain rule
  • Gradient descent intuition
  • Convex vs. non-convex functions

📊 Probability & Statistics

  • Probability axioms, Bayes' theorem
  • Random variables, distributions
  • Normal, Bernoulli, Poisson
  • Expected value, variance, std
  • Hypothesis testing basics
  • Correlation and covariance

🧪 EDA (Exploratory Data Analysis)

  • Data profiling and inspection
  • Handling outliers
  • Feature distributions
  • Correlation matrices
  • Missing data analysis
💡 Phase 2 Tips

Work with real datasets from Kaggle. Practice loading, cleaning, and visualizing CSV files daily. Math doesn't need to be perfect before moving on — build intuition with 3Blue1Brown's visual explanations, then deepen as needed. Do a full EDA project on a dataset of your choice.

🤖

Phase 3 — Machine Learning

Month 3 · Classical ML · scikit-learn · Model evaluation

📐 Supervised Learning

  • Linear Regression
  • Logistic Regression
  • Decision Trees
  • Random Forests
  • Support Vector Machines (SVM)
  • K-Nearest Neighbors (KNN)
  • Naive Bayes
  • Gradient Boosting / XGBoost

🔍 Unsupervised Learning

  • K-Means Clustering
  • Hierarchical Clustering
  • DBSCAN
  • Principal Component Analysis (PCA)
  • t-SNE visualization
  • Anomaly Detection

⚙️ Model Evaluation

  • Train/Test/Validation splits
  • Cross-validation (k-fold)
  • Accuracy, Precision, Recall, F1
  • ROC curve and AUC
  • Confusion matrix
  • MSE, RMSE, MAE, R²

🛠️ Feature Engineering

  • Feature scaling (StandardScaler)
  • One-hot encoding, label encoding
  • Handling missing values
  • Feature selection methods
  • Polynomial features
  • scikit-learn Pipelines

🎛️ Hyperparameter Tuning

  • GridSearchCV, RandomizedSearch
  • Bias-variance tradeoff
  • Overfitting vs. underfitting
  • Regularization (L1/L2, Ridge, Lasso)
  • Learning curves analysis

🏆 Ensemble Methods

  • Bagging and Boosting
  • Random Forests (in depth)
  • AdaBoost, Gradient Boosting
  • XGBoost, LightGBM, CatBoost
  • Stacking and blending
💡 Phase 3 Tips

Build every ML algorithm first with scikit-learn, then try to understand the math behind it. Participate in a Kaggle competition — Titanic or House Prices are great starters. Andrew Ng's Machine Learning Specialization is the gold standard for building intuition. Read the scikit-learn user guide for each algorithm.

🧠

Phase 4 — Deep Learning Foundations

Month 4 · Neural Networks · PyTorch · CNNs · RNNs

🧩 Neural Network Basics

  • Perceptrons and neurons
  • Activation functions (ReLU, Sigmoid, Softmax)
  • Forward pass computation
  • Loss functions (MSE, CrossEntropy)
  • Backpropagation algorithm
  • Gradient descent (SGD, Adam, RMSprop)

🏗️ Building Neural Networks

  • Multi-layer perceptrons (MLP)
  • Weight initialization (Xavier, He)
  • Batch normalization
  • Dropout regularization
  • Early stopping
  • Learning rate scheduling

⚡ PyTorch (Primary)

  • Tensors and autograd
  • torch.nn module
  • DataLoader and custom datasets
  • Training/validation loops
  • GPU training with CUDA
  • Saving and loading models

🖼️ CNNs

  • Conv layers, pooling layers
  • Feature maps and filters
  • Classic architectures (LeNet, VGG)
  • ResNet & skip connections
  • Transfer learning
  • Image augmentation

🔄 RNNs & LSTMs

  • Sequence modeling problems
  • Vanilla RNNs & vanishing gradients
  • LSTM (Long Short-Term Memory)
  • GRU (Gated Recurrent Unit)
  • Bidirectional RNNs
  • Time series forecasting

🟢 TensorFlow/Keras (Alternative)

  • Keras Sequential API
  • Functional API for complex models
  • Model compilation
  • Callbacks (ModelCheckpoint, TB)
  • TF data pipelines
💡 Phase 4 Tips

PyTorch is now dominant in both research and increasingly in industry. Implement backpropagation from scratch once to really understand it. Use Google Colab for free GPU access. fast.ai's approach (top-down, code-first) is excellent if you learn better by doing before understanding theory.

🔭

Phase 5 — Advanced Deep Learning

Month 5 · Transformers · NLP · Computer Vision · GenAI

⚡ Attention & Transformers

  • Self-attention mechanism
  • Multi-head attention
  • Positional encoding
  • Encoder-decoder architecture
  • "Attention is All You Need" paper
  • BERT, GPT, T5 architectures

💬 NLP Tasks

  • Text preprocessing & tokenization
  • Word embeddings (Word2Vec, GloVe)
  • Sentiment analysis
  • Named entity recognition (NER)
  • Text classification
  • Machine translation, summarization
  • Question answering

👁️ Advanced Computer Vision

  • Object detection (YOLO, Faster R-CNN)
  • Semantic segmentation (U-Net)
  • Vision Transformers (ViT)
  • CLIP (vision-language models)
  • Image generation with GANs
  • Diffusion models overview

🤗 Hugging Face

  • Transformers library
  • Pre-trained model hub
  • Datasets library
  • Tokenizers
  • Fine-tuning BERT/GPT
  • Inference pipelines
  • Gradio for demos

🎓 Generative AI

  • Generative Adversarial Networks (GAN)
  • Variational Autoencoders (VAE)
  • Diffusion models (DDPM, stable diffusion)
  • Large Language Models (LLMs)
  • Prompt engineering
  • In-context / few-shot learning

🧪 Training Best Practices

  • Experiment tracking (W&B, MLflow)
  • Mixed precision training (FP16)
  • Gradient accumulation & checkpointing
  • Efficient fine-tuning (LoRA, QLoRA)
  • Model evaluation and benchmarks
  • RLHF overview
💡 Phase 5 Tips

The Transformer architecture is the foundation of modern AI. Spend real time on it. 3Blue1Brown's visual explanation of Transformers (at 3blue1brown.com/topics/neural-networks) is excellent. After understanding the theory, fine-tune a BERT or GPT-2 model on a custom task using Hugging Face.

🔬 Research Track — Month 6+

For those pursuing AI research, academia, or building novel AI systems. Focus on deep theory, reading papers, and contributing to the research community.

📐 Step 1 — Deepen Mathematical Foundations

Months 6–7: Rigorous math for research-level ML

1
Advanced Linear Algebra

Eigendecomposition, SVD, matrix factorization, spectral theory, positive definite matrices

2
Convex Optimization

Convex sets and functions, Lagrangian methods, KKT conditions, gradient-based optimization

3
Probabilistic Graphical Models

Bayesian networks, MRFs, variational inference, expectation-maximization (EM), MCMC

4
Information Theory

Entropy, KL divergence, mutual information, rate-distortion theory

📄 Step 2 — Read Research Papers

Build systematic paper-reading skills

1
Foundational Papers (Must-Read)

Attention is All You Need (2017) · ResNet (2015) · GAN (2014) · BERT (2018) · GPT-3 (2020)

2
Paper Reading Strategy

Pass 1: title, abstract, conclusion. Pass 2: figures, method. Pass 3: full detail + reproduce

3
Track Current Research

Follow arXiv (cs.LG, cs.CL), Papers with Code, Yannic Kilcher YouTube, AI Twitter/X

4
Reproduce a Paper

Implement a simpler paper from scratch. Document and share on GitHub / Papers with Code

🔬 Step 3 — Specialize

Choose a specific research direction

A
NLP / Large Language Models

Pre-training, RLHF, alignment, reasoning, long context, multimodal LLMs, agents

B
Computer Vision

Diffusion models, CLIP, NeRF, 3D vision, video understanding, multimodal

C
Reinforcement Learning

Policy gradients, PPO, model-based RL, multi-agent RL, offline RL, RLHF

D
Generative Models

Diffusion models, score-based models, flow matching, normalizing flows

📚 Research Track Resources

💻 Developer Track — Month 6+

For those building production AI systems, ML engineering, LLM applications, and deploying models at scale.

🏗️ Step 1 — ML Engineering Basics

Month 6: Build production-ready ML skills

1
Software Engineering for ML

Clean code, testing (pytest), logging, Git/GitHub, code documentation

2
Data Pipelines

ETL processes, data validation, feature stores, Apache Airflow basics

3
Experiment Tracking

MLflow, Weights & Biases (W&B), model versioning, reproducible experiments

4
Model Serving

FastAPI REST endpoints, Docker containers, model serialization (ONNX, pickle)

🚀 Step 2 — MLOps & Cloud Deployment

Deploy models to production reliably

1
CI/CD for ML

GitHub Actions workflows, automated model testing, pipeline automation

2
Cloud Platforms

AWS SageMaker, Google Cloud AI, Azure ML, Hugging Face Spaces for demos

3
Model Monitoring

Data drift detection, performance monitoring, retraining pipelines

4
Inference Optimization

Quantization, ONNX export, TensorRT, knowledge distillation, efficient batching

🤖 Step 3 — LLM Application Development

Build real-world AI applications with LLMs

1
LLM APIs & Prompt Engineering

OpenAI API, Anthropic API, system prompts, CoT prompting, few-shot examples

2
LangChain / LlamaIndex

Chains, agents, tools, memory, document loaders, vector databases

3
RAG Systems

Embeddings, vector DBs (FAISS, Pinecone, Chroma), chunking strategies, RAG evaluation

4
Fine-tuning LLMs

LoRA, QLoRA, PEFT library, instruction tuning datasets, evaluating fine-tuned models

📚 Developer Track Resources

🛠️ Portfolio Projects by Phase

Build real projects as you learn. Every project should be on GitHub with a clean README. Quality over quantity.

Month 1–2: Python & Data Projects

📊 Exploratory Data Analysis (EDA) EASY

Pick a Kaggle dataset (Titanic, Wine Quality, Iris). Load with Pandas, handle missing values, create visualizations with Seaborn, write a summary of findings. Share as a Kaggle notebook.

📈 Stock Price Visualizer EASY

Use yfinance library to fetch stock data. Create candlestick charts, moving averages, and compare multiple stocks. Practice all your Pandas and Matplotlib skills.

Month 3: Machine Learning Projects

🏠 House Price Prediction EASY

Kaggle House Prices dataset. Full ML pipeline: EDA → feature engineering → multiple models (Linear, RF, XGBoost) → cross-validation → Kaggle submission. Learn the complete ML workflow.

🛒 Customer Segmentation MEDIUM

Apply K-Means and PCA to a customer dataset. Visualize clusters with 2D PCA plots, interpret business value of each segment, try different K values and evaluate with silhouette score.

🚢 Titanic Survival (Kaggle Competition) EASY

The classic intro competition. Engineer features, handle missing values, try multiple classifiers, submit to Kaggle and iterate to improve your score. Great for learning the full pipeline.

Month 4–5: Deep Learning Projects

🖼️ Image Classifier (CNN) MEDIUM

Build a CNN in PyTorch for CIFAR-10 or custom images. Implement from scratch, then use ResNet18 transfer learning. Apply augmentation, batch norm, dropout. Compare performance.

💬 Sentiment Analysis App MEDIUM

Fine-tune DistilBERT on IMDB reviews using Hugging Face Transformers. Deploy with Gradio. Share on Hugging Face Spaces. Excellent portfolio piece showing end-to-end NLP skills.

📈 LSTM Time Series Forecasting MEDIUM

Forecast stock prices or weather. Build LSTM in PyTorch, compare to ARIMA/Prophet baseline, visualize predictions vs. actual. Great for demonstrating sequence modeling skills.

Month 6: Capstone Projects

🤖 RAG-Powered Document Q&A Chatbot HARD

Build a chatbot that answers questions from uploaded PDFs using LangChain, FAISS vector store, and an LLM (OpenAI or local Llama). Serve with FastAPI. Best developer capstone project.

🔬 Paper Reproduction Project HARD

Reproduce a seminal paper (Word2Vec, VAE, DCGAN) from scratch in PyTorch. Document code, write a blog post explaining your learnings, submit to Papers with Code. Best research portfolio piece.

💡 Portfolio Tips

Every GitHub project needs: a clear README (problem, approach, results), working code, visualizations, and a brief write-up. Share projects on LinkedIn as you complete them. Kaggle notebooks shared publicly get views and help your credibility. Aim for 5 high-quality projects over 20 mediocre ones.