Skip to content

Commit 7889225

Browse files
authored
Create README.md
1 parent 4c45f0d commit 7889225

1 file changed

Lines changed: 218 additions & 0 deletions

File tree

  • Getting Admission in College Prediction
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
<div align="center">
2+
3+
# 🎓 Getting Admission in College Prediction
4+
5+
[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
6+
[![scikit-learn](https://img.shields.io/badge/scikit--learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)](https://scikit-learn.org/)
7+
[![Jupyter](https://img.shields.io/badge/Jupyter-Notebook-F37626?style=for-the-badge&logo=jupyter&logoColor=white)](https://jupyter.org/)
8+
[![Dataset](https://img.shields.io/badge/Dataset-Kaggle-20BEFF?style=for-the-badge&logo=kaggle&logoColor=white)](https://www.kaggle.com/mohansacharya/graduate-admissions)
9+
[![Best R²](https://img.shields.io/badge/Best%20R²-0.821-brightgreen?style=for-the-badge)]()
10+
[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md)
11+
12+
> Predicts a student's **probability of graduate college admission** (as a continuous value between 0 and 1) from 7 academic and profile features — using a `GridSearchCV`-powered model comparison across 6 regression algorithms.
13+
14+
[🔙 Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects)
15+
16+
</div>
17+
18+
---
19+
20+
## 📌 Table of Contents
21+
22+
- [About the Project](#-about-the-project)
23+
- [Dataset](#-dataset)
24+
- [Features](#-features)
25+
- [Methodology](#-methodology)
26+
- [Model Comparison Results](#-model-comparison-results)
27+
- [Final Model Performance](#-final-model-performance)
28+
- [Sample Predictions](#-sample-predictions)
29+
- [Project Structure](#-project-structure)
30+
- [Getting Started](#-getting-started)
31+
- [Tech Stack](#-tech-stack)
32+
33+
---
34+
35+
## 🔬 About the Project
36+
37+
Getting into a good graduate program is one of the most competitive processes for students worldwide. This project builds a **regression model** that predicts the probability of admission based on a student's GRE score, TOEFL score, CGPA, university rating, SOP, LOR, and research experience.
38+
39+
Six regression algorithms are trained and compared using **GridSearchCV with 5-fold cross-validation** via a custom `find_best_model()` function. The best-performing model is then evaluated on a held-out test set.
40+
41+
**What this project covers:**
42+
- Exploratory data analysis on 500 graduate applicant profiles
43+
- Custom `find_best_model()` with GridSearchCV across 6 regressors
44+
- Feature importance and correlation analysis
45+
- Linear Regression selected as the final model with **R² = 0.821** on test set
46+
47+
---
48+
49+
## 📊 Dataset
50+
51+
| Property | Details |
52+
|----------|---------|
53+
| **File** | `admission_predict.csv` |
54+
| **Source** | [Kaggle — Graduate Admissions](https://www.kaggle.com/mohansacharya/graduate-admissions) |
55+
| **Rows** | 500 student records |
56+
| **Columns** | 9 (including Serial No. and target) |
57+
| **Task** | Regression — predict `Chance of Admit`[0, 1] |
58+
| **Missing Values** | None |
59+
60+
---
61+
62+
## 🔬 Features
63+
64+
| Column | Type | Range | Description |
65+
|--------|------|:-----:|-------------|
66+
| `GRE Score` | Integer | 290–340 | Graduate Record Examination score |
67+
| `TOEFL Score` | Integer | 92–120 | Test of English as a Foreign Language score |
68+
| `University Rating` | Integer | 1–5 | Prestige rating of undergraduate university |
69+
| `SOP` | Float | 1.0–5.0 | Strength of Statement of Purpose |
70+
| `LOR` | Float | 1.0–5.0 | Strength of Letter of Recommendation |
71+
| `CGPA` | Float | 6.8–9.92 | Undergraduate GPA (out of 10) |
72+
| `Research` | Binary | 0 / 1 | Research experience (0 = No, 1 = Yes) |
73+
| `Chance of Admit`| Float | 0.34–0.97 | **Target variable** — probability of admission |
74+
75+
> `Serial No.` is dropped before training as it carries no predictive information.
76+
77+
---
78+
79+
## ⚙️ Methodology
80+
81+
```
82+
Load admission_predict.csv (500 × 9)
83+
84+
85+
EDA + Correlation Analysis
86+
(heatmap, pairplots, distributions)
87+
88+
89+
Drop 'Serial No.' column
90+
Define X (7 features) and y ('Chance of Admit')
91+
92+
93+
find_best_model(X, y)
94+
└── GridSearchCV (cv=5) over 6 models
95+
96+
97+
Select best model → Linear Regression (normalize=True)
98+
99+
100+
Train/Test Split (80/20, random_state=5)
101+
→ 400 train samples, 100 test samples
102+
103+
104+
Fit LinearRegression(normalize=True)
105+
Evaluate on test set → R² = 0.821
106+
107+
108+
Sample Predictions
109+
```
110+
111+
---
112+
113+
## 📈 Model Comparison Results
114+
115+
All 6 models evaluated using `GridSearchCV(cv=5)` via the custom `find_best_model()` function:
116+
117+
| Model | Best Parameters | CV R² Score |
118+
|-------|----------------|:-----------:|
119+
| **Linear Regression**| `{'normalize': True}` | **0.8108** |
120+
| Random Forest | `{'n_estimators': 15}` | 0.7689 |
121+
| KNN | `{'n_neighbors': 20}` | 0.7230 |
122+
| SVR | `{'gamma': 'scale'}` | 0.6541 |
123+
| Decision Tree | `{'criterion': 'mse', 'splitter': 'random'}` | 0.5868 |
124+
| Lasso | `{'alpha': 1, 'selection': 'random'}` | 0.2151 |
125+
126+
> **Linear Regression** selected as the final model — highest cross-validation R² score of **0.8108**.
127+
128+
> Lasso performed poorly (R² = 0.2151) because L1 regularization shrinks coefficients aggressively, which is harmful here where all 7 features are genuinely correlated with admission probability.
129+
130+
---
131+
132+
## 🏆 Final Model Performance
133+
134+
| Metric | Value |
135+
|--------|:-----:|
136+
| Model | `LinearRegression(normalize=True)` |
137+
| 5-Fold Cross-Validation Score | **81.0%** |
138+
| Train samples | 400 |
139+
| Test samples | 100 |
140+
| **Test R² Score** | **0.8215** |
141+
142+
---
143+
144+
## 🔮 Sample Predictions
145+
146+
```python
147+
# Input: [GRE, TOEFL, Univ Rating, SOP, LOR, CGPA, Research]
148+
149+
model.predict([[337, 118, 4, 4.5, 4.5, 9.65, 0]])
150+
# → Chance of getting into UCLA is 92.855%
151+
152+
model.predict([[320, 113, 2, 2.0, 2.5, 8.64, 1]])
153+
# → Chance of getting into UCLA is 73.627%
154+
```
155+
156+
---
157+
158+
## 📁 Project Structure
159+
160+
```
161+
Getting Admission in College Prediction/
162+
163+
├── Admission_prediction.ipynb # Main notebook — EDA, model comparison, training
164+
├── admission_predict.csv # Dataset (500 student records)
165+
├── requirements.txt # Python dependencies
166+
└── README.md # You are here
167+
```
168+
169+
---
170+
171+
## 🚀 Getting Started
172+
173+
### 1. Clone the repository
174+
175+
```bash
176+
git clone https://github.com/shsarv/Machine-Learning-Projects.git
177+
cd "Machine-Learning-Projects/Getting Admission in College Prediction"
178+
```
179+
180+
### 2. Set up environment
181+
182+
```bash
183+
python -m venv venv
184+
source venv/bin/activate # Linux / macOS
185+
venv\Scripts\activate # Windows
186+
187+
pip install -r requirements.txt
188+
```
189+
190+
### 3. Launch the notebook
191+
192+
```bash
193+
jupyter notebook Admission_prediction.ipynb
194+
```
195+
196+
---
197+
198+
## 🛠️ Tech Stack
199+
200+
| Layer | Technology |
201+
|-------|-----------|
202+
| Language | Python 3.7.4 |
203+
| ML Library | scikit-learn |
204+
| Model Selection | `GridSearchCV`, `cross_val_score` |
205+
| Models | `LinearRegression`, `Lasso`, `SVR`, `DecisionTreeRegressor`, `RandomForestRegressor`, `KNeighborsRegressor` |
206+
| Data Processing | Pandas, NumPy |
207+
| Visualization | Matplotlib |
208+
| Notebook | Jupyter |
209+
210+
---
211+
212+
<div align="center">
213+
214+
Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv)
215+
216+
⭐ Star the main repo if this helped you!
217+
218+
</div>

0 commit comments

Comments
 (0)