UNC Crowd Scene Analysis Dataset

Overview video

Synthetic Videos
Labels: High density, Max Pedestrian Count: 382, Good Light Condition, Indoor Environment, Simulated Background, Crowd Behavior: Aggressive, Congestion

The main goal of our work is to produce a rich labeled dataset for machine learning. In order to learn high-level features from videos or images, for example performing crowd counting and motion segmentation, a ground truth is often necessary for training or verifies the results. Eventually, we will produce over a million videos, and over 20 million images with ground truth labeled. Our video dataset is generated with a diversity of variations, including:

DensityPedestrian CountLighting Condition Camera ViewpointEnvironment (Indoor vs Outdoor)Background
(Real vs Simulated)
Crowd Behaviour (Aggresive vs Shy)