Toward Characterizing HTML Defects on the Web



HTML is being massively used as an interface to provide services to users. Web developers are producing and changing sites at high pace, while trying to support the latest HTML standards. In this context, it is common to find websites that do not comply with the standards and fail to be correctly processed by browsers. Considering this dynamic environment and the increasingly large diversity of browsers, with frequent updates, the appearance of problems in web pages is a com- mon, sometimes severe, hard-to-track problem. In this short communication, we describe the initial design of an approach that will be used to obtain information regarding the characteristics of HTML documents on the Web and to extract indicators of representative errors made by their developers. Preliminary results show nearly 90% of the pages analyzed having at least one type of error and the prevalence of a small number of error types.


HTML, HTML Defects, Web, HTML Validation, Standards Compliance


Software: Practice and Experience, October 2017


