Estimating disaggregated employment size from Points-of-Interest and census data: From mining the web to model implementation and visualization



The global spread of internet access and the ubiquity of internet capable devices has lead to an increased online presence on the behalf of companies and businesses, namely in collaborative platforms called local directories, where Points-of- Interest (POIs) are usually classified with a set of categories and tags. Such information can be extremely useful, especially if aggregated under a common (shared) taxonomy.
This article proposes a complete framework for the urban planning task of disaggregated employment size estimation based on collaborative online POI data, collected using web mining techniques. In order to make the analysis possible, we present a machine learning approach to automatically classify POIs to a common taxonomy - the North American Industry Classification System. This hierarchical taxonomy is applied in many areas, particularly in urban planning, since it allows for a proper analysis of the data at different levels of detail, depending on the practical application at hand. The classified POIs are then used to estimate disaggregated employment size, at a finer level than previously possible, using a maximum likelihood estimator. We empirically show that the automatically-classified online POIs are competitive with proprietary gold-standard POI data. This fact is then supported through a set of new visualizations that allow us to understand the spatial distribution of the classification error and its relation with employment size error.


Data mining, Points of Interest, GIS, Urban planning


POI mining, machine learning, urban planning

Related Project

Crowds - Understanding urban land use from digital footprints of crowds


International Journal on Advanced Intelligent Systems, Vol. 7, #4, December 2013


Cited by

Year 2018 : 4 citations

 Folch, D. C., Spielman, S. E., & Manduca, R. (2018). Fast food data: Where user?generated content works and where it does not. Geographical Analysis, 50(2), 125-140.

 Novack, T., Peters, R., & Zipf, A. (2018). Graph-Based Matching of Points-of-Interest from Collaborative Geo-Datasets. ISPRS International Journal of Geo-Information, 7(3), 117.

 Gervasoni, L., Fenet, S., Perrier, R., & Sturm, P. (2018, October). Convolutional neural networks for disaggregated population mapping using open data. In IEEE International Conference on Data Science and Advanced Analytics (DSAA).

 Gervasoni, L., Fenet, S., & Sturm, P. (2018, January). Une méthode pour l’estimation désagrégée de données de population à l’aide de données ouvertes. In 18ème Conférence Internationale sur l'Extraction et la Gestion des Connaissances.

Year 2017 : 1 citations

 Touya, G., Antoniou, V., Olteanu-Raimond, A. M., & Van Damme, M. D. (2017). Assessing crowdsourced POI quality: Combining methods based on reference data, history, and spatial relations. ISPRS International Journal of Geo-Information, 6(3), 80.

Year 2016 : 1 citations

 Jonietz, D.; Zipf, A. Defining Fitness-for-Use for Crowdsourced Points of Interest (POI). ISPRS Int. J. Geo-Inf. 2016, 5, 149. doi:10.3390/ijgi5090149

Year 2015 : 1 citations

 DRAFT, S. 2015, Why so many people? Explaining non-habitual transport overcrowding with internet data.Montini, L., Rieser-Schüssler, N., Horni, A., & Axhausen, K. (2014). Trip purpose identification from GPS tracks. Transportation Research Record: Journal of the Transportation Research Board, (2405), 16-23.

Year 2014 : 4 citations

 Montini, L., Rieser-Schüssler, N., Horni, A., & Axhausen, K. (2014). Trip purpose identification from GPS tracks. Transportation Research Record: Journal of the Transportation Research Board, (2405), 16-23.

 Montini, L., & Rieser, N. (2014). Implementation and pretest of the trip purpose detection.

 Fine-resolution population mapping using OpenStreetMap points-of-interest
Mohamed Bakillah , Steve Liang , Amin Mobasheri , Jamal Jokar Arsanjani , Alexander Zipf
International Journal of Geographical Information Science
Vol. 28, Iss. 9, 2014

 Limits of Predictability in Commuting Flows in the Absence of Data for Calibration (Yingxiang Yang, C. Herrera-Yagüe, N. Eagle, Marta C González),Nature Collections, Scientific Reports 4, Article number: 5662 doi:10.1038/srep05662 (2014)

Year 2013 : 1 citations

 S Jiang, GA Fiore, Y Yang, J Ferreira Jrâ?¦, A review of urban computing for mobile phone traces: current methods, challenges and opportunities, Proceedings of the 2nd …, 2013