I've always been intrigued by active learning, so I'm looking forward to applying it here to efficiently sample frames for manual labeling. But I've never witnessed it in industry, and have only ever encountered pessimistic takes on active learning in general (not the concept ofc, but the degree to which it outperforms random sampling).
As an extra layer of complexity – it seems like a manual labeler (likely myself) would have to enter labels through a browser GUI. Ideally, the labeler should produce labels concurrently as the model trains on its labels-thus-far and considers unlabeled frames to send to the labeler. Suddenly my training pipeline gets complicated!
My current plan:
* Sample training frames for labeling according to variance in predictions between adjacent frames, or perhaps dropout uncertainty. Higher uncertainty should –> worse predictions
* For the holdout val+test sets (split by video), sample frames truly at random
* In the labeling GUI, display the model's initial prediction, and just drag the skeleton around
* on't bother with concurrent labeling+training, way too much work. I care more about hours spent labeling than calendar time at this point.
I'd love to know whether it's worth all the fuss. I'm curious to hear about any cases where active learning succeeded or flopped in an industry/applied setting.
- In practice, when does active learning give a clear win over random? When will it probably be murkier?
- Recommended batch sizes/cadence and stopping criteria?
- Common pitfalls (uncertainty miscalibration, sampling bias, annotator fatigue)?