Why so many people? Explaining Nonhabitual Transport Overcrowding With Internet Data



Public transport smartcard data can be used for detection of large crowds. By comparing statistics on habitual behavior (e.g., average by time of day), one can specifically identify nonhabitual crowds, which are often very problematic for transport systems. While habitual overcrowding (e.g., peak hour) is well understood both by traffic managers and travelers, non- habitual overcrowding hotspots can become even more disruptive and unpleasant because they are generally unexpected. By quickly understanding such cases, a transport manager can react and mitigate transport system disruptions. We propose a probabilistic data analysis model that breaks each nonhabitual overcrowding hotspot into a set of explanatory components. The potential explanatory components are initially retrieved from social networks and special events websites and then processed through text- analysis techniques. Finally, for each such component, the prob- abilistic model estimates a specific share in the total overcrowding counts. We first validate with synthetic data and then test our model with real data from the public transport system (EZLink) of Singapore, focused on three case study areas. We demonstrate that it is able to generate explanations that are intuitively plausible and consistent both locally (correlation coefficient, i.e., CC, from 85% to 99% for the three areas) and globally (CC from 41.2% to 83.9%). This model is directly applicable to any other domain sensitive to crowd formation due to large social events (e.g., communications, water, energy, waste).


Information extraction, machine learning, smartcards, special events, travel demand modeling, web mining


ITS, Web mining, Machine Learning, Demand modeling

Related Project

InfoCrowds - Social Web Information Retrieval for crowds mobility management


Transactions on Intelligent Transportation Systems, IEEE, January 2015

PDF File


Cited by

No citations found