
Human Action Classification: Reproducible Research Baselines
Hey r/MachineLearning! I built reproducible baselines for human action recognition that I wish existed when I started.
🎯 What This Is
Not an attempt to beat or compare with SOTA. This is a reference baseline for research and development. Most repos I found are unmaintained with irreproducible results, with no pretrained models. This repo provides:
- ✅ Reproducible training pipeline
- ✅ Pretrained models on HuggingFace
- ✅ Complete documentation
- ✅ Two approaches: Video (temporal) + Image (pose-based)
📊 Results
Video Models (UCF-101 – 101 classes):
- MC3-18: 87.05% accuracy (published: 85.0%)
- R3-18: 83.80% accuracy (published: 82.8%)
Image Models (Stanford40 – 40 classes):
- ResNet50: 88.5% accuracy
- Real-time: 90 FPS with pose estimation
🎬 emo (Created using test samples)
https://i.redd.it/diopygguk72g1.gif
🔗 Links
- GitHub: https://github.com/dronefreak/human-action-classification
- HuggingFace Models:
💡 Why I Built This
Every video classification paper cites UCF-101, but finding working code is painful:
- Repos abandoned 3+ years ago
- Tensorflow 1.x dependencies
- Missing training scripts
- No pretrained weights
This repo is what I needed: a clean starting point with modern PyTorch, complete training code, and published pre-trained models.
🤝 Contributions Welcome
Looking for help with:
- Additional datasets (Kinetics, AVA, etc.)
- Two-stream fusion models
- Mobile deployment guides
- Better augmentation strategies
License: Apache 2.0 – use it however you want!
Happy to answer questions!