Using Natural Language Processing to Detect Privacy Violations in Online Contracts



As information systems deal with contracts and documents in essen-tial services, there is a lack of mechanisms to help organizations inprotecting the involved data subjects. In this paper, we evaluate theuse of named entity recognition as a way to identify, monitor andvalidate personally identifiable information. In our experiments,we use three of the most well-known Natural Language Processingtools (NLTK, Stanford CoreNLP, and spaCy). First, the effectivenessof the tools is evaluated in a generic dataset. Then, the tools areapplied in datasets built based on contracts that contain personallyidentifiable information. The results show that models’ performancewas highly positive in accurately classifying both the generic andthe contracts’ data. Furthermore, we discuss how our proposal caneffectively act as a Privacy Enhancing Technology.


Privacy Violations, Online Contracts, Natural Language Processing,Named Entity Recognition, Personally Identifiable Information



Related Project

PoSeID-on, EU H2020 IA – Protection and control of Secured Information by means of a privacy enhanced Dashboard


The 35th ACM/SIGAPP Symposium on AppliedComputing (SAC ’20), March 2020


Cited by

No citations found