Skip to content

Commit 23afd01

Browse files
authored
Create README.md
1 parent 93b1a30 commit 23afd01

1 file changed

Lines changed: 300 additions & 0 deletions

File tree

Human Activity Detection/README.md

Lines changed: 300 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
<div align="center">
2+
3+
# 🏃 Human Activity Recognition — 2D Pose + LSTM RNN
4+
5+
[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
6+
[![TensorFlow](https://img.shields.io/badge/TensorFlow-1.x-FF6F00?style=for-the-badge&logo=tensorflow&logoColor=white)](https://www.tensorflow.org/)
7+
[![LSTM](https://img.shields.io/badge/LSTM-2%20Stacked%20Layers-9B59B6?style=for-the-badge)]()
8+
[![Accuracy](https://img.shields.io/badge/Accuracy->90%25-brightgreen?style=for-the-badge)]()
9+
[![ngrok](https://img.shields.io/badge/Deployed-ngrok-1F8ACB?style=for-the-badge)]()
10+
[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md)
11+
12+
> Classifies **6 human activities** from **2D pose time series** (OpenPose keypoints) using a **2-layer stacked LSTM RNN** built in TensorFlow 1.x — achieving **>90% accuracy** in ~7 minutes of training. Deployed via ngrok with a Flask web app and `sample_video.mp4` demo.
13+
14+
[🔙 Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects)
15+
16+
</div>
17+
18+
---
19+
20+
## 📌 Table of Contents
21+
22+
- [About the Project](#-about-the-project)
23+
- [Key Idea — Why 2D Pose?](#-key-idea--why-2d-pose)
24+
- [Dataset](#-dataset)
25+
- [LSTM Architecture](#-lstm-architecture)
26+
- [Training Configuration](#-training-configuration)
27+
- [Results & Findings](#-results--findings)
28+
- [Project Structure](#-project-structure)
29+
- [Getting Started](#-getting-started)
30+
- [Tech Stack](#-tech-stack)
31+
- [References](#-references)
32+
33+
---
34+
35+
## 🔬 About the Project
36+
37+
This experiment classifies human activities using **2D pose time series data** and a **stacked LSTM RNN**. Rather than feeding raw RGB images or expensive 3D pose data into the network, it uses **2D (x, y) keypoints** extracted from video frames via OpenPose — a much lighter and more accessible input representation.
38+
39+
The core research questions:
40+
41+
- Can **2D pose** match **3D pose** accuracy for activity recognition? (removes need for RGBD cameras)
42+
- Can **2D pose** match **raw RGB image** accuracy? (smaller input = smaller model = better with limited data)
43+
- Does this approach generalize to **animal** behaviour classification for robotics applications?
44+
45+
The network architecture is based on Guillaume Chevalier's *LSTMs for Human Activity Recognition (2016)*, with key modifications for large class-ordered datasets using **random batch sampling without replacement**.
46+
47+
---
48+
49+
## 🧠 Key Idea — Why 2D Pose?
50+
51+
```
52+
Raw Video Frame (640×480 RGB)
53+
54+
55+
OpenPose Inference
56+
18 body keypoints × (x, y) coords
57+
58+
59+
36-dimensional feature vector per frame
60+
61+
▼ (32 frames = 1 time window)
62+
LSTM RNN → Activity Class
63+
```
64+
65+
| Input Type | Pros | Cons |
66+
|------------|------|------|
67+
| Raw RGB images | High information | Large models, lots of data needed |
68+
| 3D pose (RGBD) | Rich spatial info | Needs depth sensors |
69+
| **2D pose (x,y)**| Lightweight, RGB-only camera, small model | Some spatial ambiguity |
70+
71+
> Limiting the feature vector to 2D pose keypoints allows for a **smaller LSTM model** that generalises better on limited datasets — particularly relevant for future animal behaviour recognition tasks.
72+
73+
---
74+
75+
## 📊 Dataset
76+
77+
| Property | Details |
78+
|----------|---------|
79+
| **Source** | Berkeley Multimodal Human Action Database (MHAD) — 2D poses extracted via OpenPose |
80+
| **Download** | `RNN-HAR-2D-Pose-database.zip` (~19.2 MB, Google Drive) |
81+
| **Subjects** | 12 |
82+
| **Angles** | 4 camera angles |
83+
| **Repetitions** | 5 per subject per action |
84+
| **Total videos** | 1,438 (2 missing from original 1,440) |
85+
| **Total frames** | 211,200 |
86+
| **Training windows** | 22,625 (32 timesteps each, 50% overlap) |
87+
| **Test windows** | 5,751 |
88+
| **Input shape** | `(22625, 32, 36)` → windows × timesteps × features |
89+
| **Preprocessing** | ❌ None — raw, unnormalized pose coordinates |
90+
91+
### Activity Classes (6)
92+
93+
| Label | Activity |
94+
|-------|----------|
95+
| `JUMPING` | Vertical jumps |
96+
| `JUMPING_JACKS` | Jumping jacks |
97+
| `BOXING` | Boxing motions |
98+
| `WAVING_2HANDS` | Waving with both hands |
99+
| `WAVING_1HAND` | Waving with one hand |
100+
| `CLAPPING_HANDS` | Clapping hands |
101+
102+
### Data Files
103+
104+
```
105+
RNN-HAR-2D-Pose-database/
106+
├── X_train.txt # 22,625 training windows (36 comma-separated floats per row)
107+
├── X_test.txt # 5,751 test windows
108+
├── Y_train.txt # Training labels (0–5)
109+
└── Y_test.txt # Test labels (0–5)
110+
```
111+
112+
---
113+
114+
## 🏗️ LSTM Architecture
115+
116+
```
117+
Input: (batch_size, 32 timesteps, 36 features)
118+
119+
120+
Linear projection: 36 → 34 (ReLU)
121+
122+
123+
┌──────────────────────────────────┐
124+
│ BasicLSTMCell(34, forget_bias=1)│ ← Layer 1
125+
├──────────────────────────────────┤
126+
│ BasicLSTMCell(34, forget_bias=1)│ ← Layer 2
127+
└──────────────────────────────────┘
128+
tf.contrib.rnn.MultiRNNCell (stacked)
129+
tf.contrib.rnn.static_rnn (many-to-one)
130+
131+
Last output only
132+
133+
134+
Linear: 34 → 6
135+
Softmax → Activity class
136+
```
137+
138+
> **Why n_hidden = 34?** Testing across a range of hidden unit counts showed best generalisation when hidden units ≈ n_input (36). 34 was found to be optimal.
139+
140+
> **Many-to-one classifier** — only the last LSTM output (timestep 32) is used for classification, not the full sequence output.
141+
142+
---
143+
144+
## ⚙️ Training Configuration
145+
146+
| Parameter | Value |
147+
|-----------|-------|
148+
| Framework | TensorFlow 1.x (`%tensorflow_version 1.x`) |
149+
| Timesteps (`n_steps`) | 32 |
150+
| Input features (`n_input`) | 36 (18 keypoints × x, y) |
151+
| Hidden units (`n_hidden`) | 34 |
152+
| Classes (`n_classes`) | 6 |
153+
| Epochs | 300 |
154+
| Batch size | 512 |
155+
| Optimizer | Adam |
156+
| Initial learning rate | 0.005 |
157+
| LR decay | Exponential — `0.96` per 100,000 steps |
158+
| Loss | Softmax cross-entropy + L2 regularization |
159+
| L2 lambda | 0.0015 |
160+
| Batch strategy | Random sampling **without replacement** (prevents class-order bias) |
161+
| Training time | ~7 minutes (Google Colab) |
162+
163+
**L2 regularization formula:**
164+
```python
165+
l2 = lambda_loss_amount * sum(
166+
tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables()
167+
)
168+
cost = tf.reduce_mean(softmax_cross_entropy) + l2
169+
```
170+
171+
**Decayed learning rate:**
172+
```python
173+
learning_rate = init_lr * decay_rate ^ (global_step / decay_steps)
174+
# = 0.005 * 0.96 ^ (global_step / 100000)
175+
```
176+
177+
---
178+
179+
## 📈 Results & Findings
180+
181+
| Metric | Value |
182+
|--------|:-----:|
183+
| **Final Accuracy** | **> 90%** |
184+
| Training time | ~7 minutes |
185+
186+
**Confusion pairs observed:**
187+
- `CLAPPING_HANDS``BOXING` — similar upper-body motion pattern
188+
- `JUMPING_JACKS``WAVING_2HANDS` — symmetric arm movements
189+
190+
**Key conclusions:**
191+
- 2D pose achieves >90% accuracy, validating its use over more expensive 3D pose or raw RGB inputs
192+
- Hidden units ≈ n_input (34 ≈ 36) gives optimal generalisation
193+
- Random batch sampling without replacement is **critical** — ordered class batches degrade training significantly
194+
- Approach is promising for future animal behaviour estimation with autonomous mobile robots
195+
196+
---
197+
198+
## 📁 Project Structure
199+
200+
```
201+
Human Activity Detection/
202+
203+
├── 📂 images/ # Result plots and visualizations
204+
├── 📂 models/ # Saved LSTM model weights
205+
├── 📂 src/ # Helper source scripts
206+
├── 📂 templates/ # HTML templates (Flask app)
207+
208+
├── Human_Activity_Recogination.ipynb # Main notebook — dataset, LSTM, training
209+
├── Human_Action_Classification_deployment_with_ngrok.ipynb # Flask + ngrok deployment notebook
210+
├── lstm_train.ipynb # Standalone LSTM training notebook
211+
├── app.py # Flask web application
212+
├── sample_video.mp4 # Sample video for live demo
213+
└── requirements.txt # Python dependencies
214+
```
215+
216+
---
217+
218+
## 🚀 Getting Started
219+
220+
### 1. Clone the repository
221+
222+
```bash
223+
git clone https://github.com/shsarv/Machine-Learning-Projects.git
224+
cd "Machine-Learning-Projects/Human Activity Detection"
225+
```
226+
227+
### 2. Set up environment
228+
229+
```bash
230+
python -m venv venv
231+
source venv/bin/activate # Linux / macOS
232+
venv\Scripts\activate # Windows
233+
234+
pip install -r requirements.txt
235+
```
236+
237+
> ⚠️ **TensorFlow 1.x required.** The LSTM uses `tf.contrib.rnn` and `tf.placeholder` APIs from TF1.
238+
> ```bash
239+
> pip install tensorflow==1.15.0
240+
> ```
241+
242+
### 3. Download the dataset
243+
244+
The dataset is downloaded automatically in the notebook:
245+
```python
246+
!wget -O RNN-HAR-2D-Pose-database.zip \
247+
https://drive.google.com/u/1/uc?id=1IuZlyNjg6DMQE3iaO1Px6h1yLKgatynt
248+
!unzip RNN-HAR-2D-Pose-database.zip
249+
```
250+
251+
### 4. Run on Google Colab (recommended)
252+
253+
```
254+
1. Open Human_Activity_Recogination.ipynb in Google Colab
255+
2. Runtime → Change runtime type → GPU (optional, speeds training)
256+
3. Run all cells — training completes in ~7 minutes
257+
```
258+
259+
### 5. Deploy with ngrok
260+
261+
```
262+
Open Human_Action_Classification_deployment_with_ngrok.ipynb
263+
Follow the ngrok setup cells to expose the Flask app publicly
264+
```
265+
266+
---
267+
268+
## 🛠️ Tech Stack
269+
270+
| Layer | Technology |
271+
|-------|-----------|
272+
| Language | Python 3.7+ |
273+
| Deep Learning | TensorFlow 1.x (`tf.contrib.rnn`) |
274+
| Model | 2-layer stacked LSTM (`BasicLSTMCell`) |
275+
| Pose Extraction | OpenPose (CMU Perceptual Computing Lab) |
276+
| Data Processing | NumPy |
277+
| Visualization | Matplotlib |
278+
| Web Framework | Flask |
279+
| Deployment | ngrok (tunnel) |
280+
| Notebook | Jupyter / Google Colab |
281+
282+
---
283+
284+
## 📚 References
285+
286+
- Guillaume Chevalier (2016). *LSTMs for Human Activity Recognition.* [github.com/guillaume-chevalier](https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition) — MIT License
287+
- [Berkeley MHAD Dataset](http://tele-immersion.citris-uc.org/berkeley_mhad)
288+
- [OpenPose — CMU Perceptual Computing Lab](https://github.com/CMU-Perceptual-Computing-Lab/openpose)
289+
- Goodfellow et al. *"It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model..."* — basis for small batch strategy
290+
- [Andrej Karpathy — The Unreasonable Effectiveness of RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) — referenced for many-to-one classifier design
291+
292+
---
293+
294+
<div align="center">
295+
296+
Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv)
297+
298+
⭐ Star the main repo if this helped you!
299+
300+
</div>

0 commit comments

Comments
 (0)