|
| 1 | +<div align="center"> |
| 2 | + |
| 3 | +# 🎓 Getting Admission in College Prediction |
| 4 | + |
| 5 | +[](https://www.python.org/) |
| 6 | +[](https://scikit-learn.org/) |
| 7 | +[](https://jupyter.org/) |
| 8 | +[](https://www.kaggle.com/mohansacharya/graduate-admissions) |
| 9 | +[]() |
| 10 | +[](../LICENSE.md) |
| 11 | + |
| 12 | +> Predicts a student's **probability of graduate college admission** (as a continuous value between 0 and 1) from 7 academic and profile features — using a `GridSearchCV`-powered model comparison across 6 regression algorithms. |
| 13 | +
|
| 14 | +[🔙 Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects) |
| 15 | + |
| 16 | +</div> |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## 📌 Table of Contents |
| 21 | + |
| 22 | +- [About the Project](#-about-the-project) |
| 23 | +- [Dataset](#-dataset) |
| 24 | +- [Features](#-features) |
| 25 | +- [Methodology](#-methodology) |
| 26 | +- [Model Comparison Results](#-model-comparison-results) |
| 27 | +- [Final Model Performance](#-final-model-performance) |
| 28 | +- [Sample Predictions](#-sample-predictions) |
| 29 | +- [Project Structure](#-project-structure) |
| 30 | +- [Getting Started](#-getting-started) |
| 31 | +- [Tech Stack](#-tech-stack) |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## 🔬 About the Project |
| 36 | + |
| 37 | +Getting into a good graduate program is one of the most competitive processes for students worldwide. This project builds a **regression model** that predicts the probability of admission based on a student's GRE score, TOEFL score, CGPA, university rating, SOP, LOR, and research experience. |
| 38 | + |
| 39 | +Six regression algorithms are trained and compared using **GridSearchCV with 5-fold cross-validation** via a custom `find_best_model()` function. The best-performing model is then evaluated on a held-out test set. |
| 40 | + |
| 41 | +**What this project covers:** |
| 42 | +- Exploratory data analysis on 500 graduate applicant profiles |
| 43 | +- Custom `find_best_model()` with GridSearchCV across 6 regressors |
| 44 | +- Feature importance and correlation analysis |
| 45 | +- Linear Regression selected as the final model with **R² = 0.821** on test set |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## 📊 Dataset |
| 50 | + |
| 51 | +| Property | Details | |
| 52 | +|----------|---------| |
| 53 | +| **File** | `admission_predict.csv` | |
| 54 | +| **Source** | [Kaggle — Graduate Admissions](https://www.kaggle.com/mohansacharya/graduate-admissions) | |
| 55 | +| **Rows** | 500 student records | |
| 56 | +| **Columns** | 9 (including Serial No. and target) | |
| 57 | +| **Task** | Regression — predict `Chance of Admit` ∈ [0, 1] | |
| 58 | +| **Missing Values** | None | |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## 🔬 Features |
| 63 | + |
| 64 | +| Column | Type | Range | Description | |
| 65 | +|--------|------|:-----:|-------------| |
| 66 | +| `GRE Score` | Integer | 290–340 | Graduate Record Examination score | |
| 67 | +| `TOEFL Score` | Integer | 92–120 | Test of English as a Foreign Language score | |
| 68 | +| `University Rating` | Integer | 1–5 | Prestige rating of undergraduate university | |
| 69 | +| `SOP` | Float | 1.0–5.0 | Strength of Statement of Purpose | |
| 70 | +| `LOR` | Float | 1.0–5.0 | Strength of Letter of Recommendation | |
| 71 | +| `CGPA` | Float | 6.8–9.92 | Undergraduate GPA (out of 10) | |
| 72 | +| `Research` | Binary | 0 / 1 | Research experience (0 = No, 1 = Yes) | |
| 73 | +| `Chance of Admit` ⭐ | Float | 0.34–0.97 | **Target variable** — probability of admission | |
| 74 | + |
| 75 | +> `Serial No.` is dropped before training as it carries no predictive information. |
| 76 | +
|
| 77 | +--- |
| 78 | + |
| 79 | +## ⚙️ Methodology |
| 80 | + |
| 81 | +``` |
| 82 | +Load admission_predict.csv (500 × 9) |
| 83 | + │ |
| 84 | + ▼ |
| 85 | +EDA + Correlation Analysis |
| 86 | +(heatmap, pairplots, distributions) |
| 87 | + │ |
| 88 | + ▼ |
| 89 | +Drop 'Serial No.' column |
| 90 | +Define X (7 features) and y ('Chance of Admit') |
| 91 | + │ |
| 92 | + ▼ |
| 93 | +find_best_model(X, y) |
| 94 | +└── GridSearchCV (cv=5) over 6 models |
| 95 | + │ |
| 96 | + ▼ |
| 97 | +Select best model → Linear Regression (normalize=True) |
| 98 | + │ |
| 99 | + ▼ |
| 100 | +Train/Test Split (80/20, random_state=5) |
| 101 | +→ 400 train samples, 100 test samples |
| 102 | + │ |
| 103 | + ▼ |
| 104 | +Fit LinearRegression(normalize=True) |
| 105 | +Evaluate on test set → R² = 0.821 |
| 106 | + │ |
| 107 | + ▼ |
| 108 | +Sample Predictions |
| 109 | +``` |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## 📈 Model Comparison Results |
| 114 | + |
| 115 | +All 6 models evaluated using `GridSearchCV(cv=5)` via the custom `find_best_model()` function: |
| 116 | + |
| 117 | +| Model | Best Parameters | CV R² Score | |
| 118 | +|-------|----------------|:-----------:| |
| 119 | +| **Linear Regression** ✅ | `{'normalize': True}` | **0.8108** | |
| 120 | +| Random Forest | `{'n_estimators': 15}` | 0.7689 | |
| 121 | +| KNN | `{'n_neighbors': 20}` | 0.7230 | |
| 122 | +| SVR | `{'gamma': 'scale'}` | 0.6541 | |
| 123 | +| Decision Tree | `{'criterion': 'mse', 'splitter': 'random'}` | 0.5868 | |
| 124 | +| Lasso | `{'alpha': 1, 'selection': 'random'}` | 0.2151 | |
| 125 | + |
| 126 | +> ✅ **Linear Regression** selected as the final model — highest cross-validation R² score of **0.8108**. |
| 127 | +
|
| 128 | +> Lasso performed poorly (R² = 0.2151) because L1 regularization shrinks coefficients aggressively, which is harmful here where all 7 features are genuinely correlated with admission probability. |
| 129 | +
|
| 130 | +--- |
| 131 | + |
| 132 | +## 🏆 Final Model Performance |
| 133 | + |
| 134 | +| Metric | Value | |
| 135 | +|--------|:-----:| |
| 136 | +| Model | `LinearRegression(normalize=True)` | |
| 137 | +| 5-Fold Cross-Validation Score | **81.0%** | |
| 138 | +| Train samples | 400 | |
| 139 | +| Test samples | 100 | |
| 140 | +| **Test R² Score** | **0.8215** | |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## 🔮 Sample Predictions |
| 145 | + |
| 146 | +```python |
| 147 | +# Input: [GRE, TOEFL, Univ Rating, SOP, LOR, CGPA, Research] |
| 148 | + |
| 149 | +model.predict([[337, 118, 4, 4.5, 4.5, 9.65, 0]]) |
| 150 | +# → Chance of getting into UCLA is 92.855% |
| 151 | + |
| 152 | +model.predict([[320, 113, 2, 2.0, 2.5, 8.64, 1]]) |
| 153 | +# → Chance of getting into UCLA is 73.627% |
| 154 | +``` |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +## 📁 Project Structure |
| 159 | + |
| 160 | +``` |
| 161 | +Getting Admission in College Prediction/ |
| 162 | +│ |
| 163 | +├── Admission_prediction.ipynb # Main notebook — EDA, model comparison, training |
| 164 | +├── admission_predict.csv # Dataset (500 student records) |
| 165 | +├── requirements.txt # Python dependencies |
| 166 | +└── README.md # You are here |
| 167 | +``` |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## 🚀 Getting Started |
| 172 | + |
| 173 | +### 1. Clone the repository |
| 174 | + |
| 175 | +```bash |
| 176 | +git clone https://github.com/shsarv/Machine-Learning-Projects.git |
| 177 | +cd "Machine-Learning-Projects/Getting Admission in College Prediction" |
| 178 | +``` |
| 179 | + |
| 180 | +### 2. Set up environment |
| 181 | + |
| 182 | +```bash |
| 183 | +python -m venv venv |
| 184 | +source venv/bin/activate # Linux / macOS |
| 185 | +venv\Scripts\activate # Windows |
| 186 | + |
| 187 | +pip install -r requirements.txt |
| 188 | +``` |
| 189 | + |
| 190 | +### 3. Launch the notebook |
| 191 | + |
| 192 | +```bash |
| 193 | +jupyter notebook Admission_prediction.ipynb |
| 194 | +``` |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## 🛠️ Tech Stack |
| 199 | + |
| 200 | +| Layer | Technology | |
| 201 | +|-------|-----------| |
| 202 | +| Language | Python 3.7.4 | |
| 203 | +| ML Library | scikit-learn | |
| 204 | +| Model Selection | `GridSearchCV`, `cross_val_score` | |
| 205 | +| Models | `LinearRegression`, `Lasso`, `SVR`, `DecisionTreeRegressor`, `RandomForestRegressor`, `KNeighborsRegressor` | |
| 206 | +| Data Processing | Pandas, NumPy | |
| 207 | +| Visualization | Matplotlib | |
| 208 | +| Notebook | Jupyter | |
| 209 | + |
| 210 | +--- |
| 211 | + |
| 212 | +<div align="center"> |
| 213 | + |
| 214 | +Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv) |
| 215 | + |
| 216 | +⭐ Star the main repo if this helped you! |
| 217 | + |
| 218 | +</div> |
0 commit comments