Towards Automated eGovernment Monitoring

Paper F - Benchmarking and improving the quality of Norwegian municipality web sites.

Author: Morten Goodwin


This chapter was originally published at the 5th International Workshop on Automated Specification and Verification of Web Systems 2009. Please see the original source for the complete paper.1

Original paper authors: Morten Goodwin Olsen, Annika Nietzio, Mikael Snaprud, and Frank Fardal

Morten Goodwin Olsen and Mikael Snaprud are with Tingtun AS, Kirkekleiva 1, 4790 Lillesand, Norway, email:

Annika Nietzio is with Forschungsinstitut Technologie und Behinderung (FTB) der Evangelischen Stiftung Volmarstein, Grundschötteler Str. 40 58300 Wetter (Ruhr), Germany.

Frank Fardal is with Agency for Public Management and eGovernment (DIFI) P.O. Box 8115 Dep, N-0032 Oslo, Norway


Automatic benchmarking can provide a reliable first insight into the accessibility status of a web site. The eGovMon project has developed a tool which can assess web sites according to a statistically sound sampling procedure. Additionally, the tool supports detailed evaluation of single web pages. This paper describes the process of data acquisition for the case of large scale accessibility benchmarking of Norwegian public web sites. An important contribution is the elaborated approach to communicate the results to the public web site owners which can help them to improve the quality of their web sites. An on-line interface enables them to perform evaluations of single web pages and receive immediate feedback. The close collaboration with the municipalities has lead to an overall higher quality both of Norwegian public web sites, the eGovMon tool and the underlying methodology.

Automated evaluation alone can not capture the whole picture, and should rather be seen as a complement to manual web accessibility evaluations. The Norwegian Agency for Public Management and eGovernment (DIFI/ carries out an annual web quality survey that includes manual assessment of web accessibility. We present a comparison between the results and the data collected by the eGovMon tool and verify the statistical correlation.



The amount of information and the number of public services available on-line has been growing steadily over the past few years. Many administrative processes can be carried out via the Internet today.

However, a recent study [1] shows that there is still a large number of citizens who are not using eGovernment services. On the one hand the reasons are that the citizens feel no need to use eGovernment because they are not frequent users of the Internet in general or because they prefer to use other channels when interacting with public administration. On the other hand there are citizens who would like to use eGovernment applications but are prevented from doing so by poor accessibility and usability or because relevant content is missing or difficult to locate.

The latter group can benefit from improved quality and more user oriented design. In the long run this can help to increase take-up and facilitate the use also by the former group of citizens.

The Norwegian project eGovMon 2 is pursuing a two-fold strategy to advance accessibility and other attributes of web site quality.

Figure 1: eGovMon System Architecture


Large scale accessibility benchmarking.

The first part of the strategy consists of large scale accessibility evaluation. Frequently updated data on the accessibility status of Norwegian public web sites provides a bird's eye view of the situation and progress. Equal conditions under the evaluation ensure comparability of the results from different web sites and across time. This process is visualised by the solid arrows in Figure 1.

On-line accessibility checker.

The second part of the strategy targets the web site maintainers on eye level. Detailed results for each web page are provided together with explanations and improvement suggestions. The pilot municipalities participating in the project have expressed a strong demand for practical support. Often the web site maintainers are not aware of accessibility problems in their web sites or do not know how to resolve them. Therefore the eGovMon project tries to provide feedback and improvement suggestions that are easy to understand, and can facilitate the communication with technical staff, software vendors and web developers.

For this user group the eGovMon project has developed an easy to use on-line tool that can check single web pages. The checker provides detailed information on the identified accessibility barriers and suggests potential solutions. This application is shown by the dashed arrows in Figure 1.

Collaboration with other initiatives.

In Norway, the initiative of the Agency for Public Management and eGovernment (DIFI) carries out an annual systematic evaluation of the quality of Norwegian public web sites [2]. The criteria of this assessment address accessibility, usability and relevance of the provided content and services. The assessments are carried out manually by trained experts.

Automatic evaluations can support experts in their work, provide intermediary results between the annual evaluations, and support policy making and awareness raising. The eGovMon project is currently developing such automatic tools in collaboration with a group of 20 Norwegian municipalities, DIFI, and several additional government agencies and research partners from Norway and across Europe.

The remainder of this paper is organised as follows: Section Automated evaluation of web accessibility in eGovMon presents the eGovMon tool for automatic evaluation of accessibility. In Section we explain the methodology used by the evaluations. In Section Results and comparison we compare the manually retrieved results from with the automatically retrieved results from eGovMon. Finally, in Section Communication of results, we present the eGovMon approach to communicating the results to Norwegian municipalities.

Automated evaluation of web accessibility in eGovMon


The large scale benchmarking approach applied in eGovMon is based on the Unified Web Evaluation Methodology (UWEM) version 1.2 [3], which was developed by the European Web Accessibility Benchmarking Cluster (WAB Cluster). The eGovMon system is an implementation of the \emph{fully automated monitoring} application scenario described in UWEM.


Web accessibility checking can be carried out in several ways along the same international standards. The evaluation methodologies used by evaluation and certification organisations in several European countries are different in subtle but meaningful ways [4], even though they are usually based on the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) [5]. UWEM offers test descriptions to evaluate WCAG 1.0 conformance covering level AA, a clear sampling scheme, several reporting options, including score cards and other instruments to help communicate the results of evaluations. UWEM was developed as the basis for web accessibility evaluation, policy support and possible certification in Europe [6].

Sampling of web pages


The eGovMon system does not evaluate all web pages within a web site. Instead, it selects a random uniform sample from each web site. A random sample can only be drawn if the underlying population is known. Therefore each web site is explored by a web crawler trying to identify as many URLs as possible before the actual evaluation starts. The crawler follows a multithreaded breadth first search strategy and stores all discovered URLs in a URL database. The number of downloaded pages is constrained by the available download capacity. The URL discovery phase stops when 6000 web pages have been found.3

In our experiments 85% of the web sites were crawled exhaustively. The remaining 15% of the sites are often considerably larger (sometimes consisting of up to several million single web pages).

In the next phase, 600 pages are randomly selected from the URL database, allowing the accessibility evaluation to be both representative of the web site as well as workable in practice.

Choosing sample size.

There is a trade-off between system performance and accuracy of the results. Clearly, a large sample size would provide more precise results. The most accurate result could be achieved by evaluating every web page from the site. However, this is impossible in practice.

The sample size of 600 has been selected experimentally. Based on a number of test run the average standard deviation of the UWEM score within a web site could be estimated to $\sigma=0.25$. Taking into account the potential values of the precision parameter $d_1 = 0.05$ and $d_2 = 0.02$ and the desired confidence intervals 95% (i.e. $z_1=1.96$) or 99% (i.e.$z_2=2.58$) we calculate the sample size as:

$$n = \frac{z^2\sigma^2}{d^2}$$

In our setup, the evaluation speed is approximately 1.42 seconds per page, or 0.7 pages per second. The number of web sites which can be evaluated daily is therefore

$$N = \frac{1}{n} \cdot 0.7\frac{\textrm{pages}}{\textrm{sec.}} \cdot 86400\frac{\textrm{sec.}}{\textrm{day}}$$

This gives us the sample sizes presented in Table 1. A sample size of 600 pages per site allows the evaluation of approximately 100 web sites daily -- an acceptable trade-off between precision and performance.

Table 1: Sample Size Calculations
$d_1=0.05$$n\approx96, N\approx630$$n\approx166,N\approx364$

Web accessibility testing

The system contains 23 web accessibility tests, which are all derived from the \emph{fully automatable} tests in UWEM. The implementation is built on the Relaxed framework [7], which uses Java to parse the the HTML source file into an HTML tree representation. Subsequently, this tree is assessed with a number of Schematron rules, which were developed specifically for UWEM.

Some restrictions apply when creating automatic measurements. Most significantly, many of the UWEM tests require human judgment. In fact only 26 of the 141 tests in UWEM are marked as automatable. As an example, automatic testing can find images without alternative text. However, to claim that an existing alternative text represents the corresponding image well, human judgment is needed. Thus, automatic evaluation can only be used to find barriers, not claim that a web site is accessible. Note that the automatic evaluation results can in some degree be used to outline to predict manual evaluation results [8].


The Schematron language 4 is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. The context of a rule is described using XPath,5 which provides a unique identification of all elements within the HTML tree. The test part of a Schematron rule consists of an XSLT statement that returns a boolean value. The rules are used to extract information about the different features of the HTML tree, e.g. presence or absence of attributes and elements or the relationship of parent, child, and sibling elements.

Schematron is only an intermediary step that provides direct assess to the HTML structure. Other tests are then conducted based on the extracted data. This includes for instance operations such as matching strings with regular expressions or comparing to features that are not part of the HTML tree structure (i.e. information from the HTTP header or general information like the URL of the inspected page).

The Schematron approach provides an accurate and flexible implementation of the UWEM tests. However, problems can arise if the HTML source has severe parsing errors which make it impossible to construct the HTML tree in the first place. In these cases no accessibility evaluation can be carried out.

Initially we applied HTML Tidy6 to clean up malformed HTML [9]. Preprocessing the HTML with HTML Tidy led to several issues. First of all, even though the use of Tidy was set to minimum, it was not clear for the users how this changed the HTML. Users could claim that the HTML evaluated was not part of the web page but something adjusted with HTML Tidy. Additionally, using HTML Tidy made it difficult to identify the location of the barrier within the HTML source. However, even with the use of HTML Tidy, several web pages still could not be parsed. Because of these issues, the use of HTML Tidy was discarded in the final version of the implementation.7


Each UWEM test is applied on each selected page. There are two possible outcomes: fail (barrier detected) and pass (no barrier detected). An example of a fail result is an image without an alternative description. This is a barrier because people who are unable to see images 8 rely on the alternative text to understand the image. When such alternative text is not present, the information conveyed in the image is lost to these users. All results are reported in the Evaluation and Report Language (EARL) [10].

Storing results.

For each web site, the EARL output is incorporated into an RDF graph representing the web site. In addition to the accessibility results part of the EARL, the RDF graph contains information about web pages downloaded, HTTP header, language, technologies used. The data is stored in an RDF database which was developed specifically according to the project needs since the existing RDF database technologies were not able to provide sufficient speed for our application.

An Extract Transform Load (ETL) component reads the RDF-data, and stores the results in a data warehouse. The data warehouse [11] supports analysis with regards to multiple research questions. The outcome of the single tests is summarised into a web site accessibility score for the whole site. The score is calculated as the ratio of failed tests among to applied tests. The larger this ratio (percentage) of barriers detected, the less accessible the web site is. In a completely accessible web site there will not be any barriers and the percentage of detected barriers will be 0%. If half the tests detected barriers, the percentage of detected barriers would be 50%, and so on. In addition to these high level results, eGovMon presents also detailed results on page level.


The Norwegian Agency for Public Management and eGovernment (DIFI) conducts a yearly survey on the quality of Norwegian public web sites [2] -- often called \emph{} after the web site where the results are published. The survey covers 34 indicators organised into three categories: accessibility, usability and relevance.

Web Accessibility testing.

Twelve of the indicators address accessibility. Out of these seven are directly related to WCAG 1.0 priority 1 and 2, three are related to WCAG 1.0 priority 3. The remaining two are not directly related to WCAG 1.0, but target other document formats such as PDF.

The evaluation is run in September and October each year and includes approximately 700 web sites at governmental and municipal level. The evaluations are carried out manually

9 by trained experts. On average, the review of one web site takes about one hour.

Sampling and score.

Most tests are applied to two or three pages from the site. Sometimes the whole site it searched for certain features (e.g. data tables, documents in other formats). A failing test scores zero points. The maximum number of points for a test ranges from two up to five. The overall rating reports the percentage of the maximum number of points that has been achieved. These percentage values are then mapped to stars. The threshold values are based on a Gaussian distribution for one to five stars, with six stars as an "extra level" for exceptionally good web sites. The threshold values are presented in Table 2.

Table 2: star rating
Stars 1 star 2 stars 3 stars 4 stars 5 stars 6 stars
Percentage $0 - 30 %$ $31 - 42 %$ $43 - 57 %$ $58 - 69 %$ $70 - 79 %$ $80 - 100%$

Results and comparison


Using the eGovMon tool, we evaluated the accessibility of web sites from 414 of 430 Norwegian municipalities in January 2009. The remaining municipalities had either no web site or the web site was not available during the evaluation10.

The eGovMon tool is based on WCAG 1.0 level AA and UWEM. In a web site not conforming to WCAG there will most likely be barriers preventing some users with disabilities from using the web site. It is worth noticing that people benefiting from accessible web sites are diverse and what may be barrier for one user may not be a barrier for others - even within the same disability group. Because of this, detecting and eliminating false positives would be very challenging. There has been some work trying to find false positive results in automatic accessibility measurements [12]. However, to the best of our knowledge, no such work has been carried out for UWEM.

Accessibility of Norwegian Municipality Web Sites

The eGovMon results for single web sites indicate the percentage of barriers. While the results on county level are averages of the municipality web site results within the county. The accessibility results on county level are presented as a map of Norway in Figure 2. The results in this evaluation range from 10% to 37% of barriers detected by the tests. The darker colour means more barriers detected, while a lighter colour means less barriers detected. The county with the fewest detected barriers is the Norwegian capital Oslo. Even here, 10% of the eGovMon tests detected barriers. This shows that the public Norwegian web sites are far from being accessible. Additionally, our findings show that some barriers are more common than others.


Figure 2: Map of Norway showing the eGovMon accessibility results from January 2009. A darker colour means more accessibility barriers found.


  1. Invalid or deprecated (X)HTML and/or CSS was the most common barrier found by the eGovMon tool and occurred in 99% of the evaluated web pages. (X)HTML and CSS are the most used technologies for web pages. The most recent version of these technologies are built with accessibility in mind, which means assistive technologies can more easily and successfully present the web page content when the latest (X)HTML and/or CSS are used correctly.
  2. Links with the same title but different target occurred in 31% of the evaluated pages. Often links do not describe the target pages well. A typical example is having links with the text ``read more'', which does not explain anything about the target page. Links should be more descriptive such as ``read more about the economic crisis'' or only ``the economic crisis''. For fast and efficient navigation, some assistive technologies present all links within a web page to the user. However, if all links have the same text such as ``read more'', this is not helpful.
  3. Graphical elements without textual alternative were detected in 24% of the evaluated pages. The most common example of this is the use of images without alternative text, which causes problems for people with visual impairments who are unable to see the pictures. Any information conveyed in an image is lost to these users whenever a textual alternative is missing.
  4. Form elements without labels occurred in 24% of the evaluated pages. An example of misuse would be not to correctly mark a search button as ``search''. The fact that the web site is searchable, is sometimes understood by the context around the search field, such as a magnifying glass nearby. People with visual impairments and dyslexia sometimes have the web page text read out load using screen readers, and may be unable to see the corresponding magnifying glass. If a text field is not clearly marked, it is challenging to know that it is intended for searching the web site.
  5. \item Mouse required occurred in 11% of the evaluated pages. Web sites requiring the use of a mouse cause problems for people with motor impairments who often have challenges using such devices. An example is web sites with menu items which can only be accessed by clicking with a mouse but not by keyboard. Often, people with motor impairment are not able to use such web sites at all.

Results from eGovMon compared to International Surveys

None of the evaluated sites passed all eGovMon tests. This result is not in correlation with web accessibility surveys existing in the literature.

The study \emph{Assessment of the Status of eAccessibility in Europe (MeAC)} [13], which has received much attention since it was published, shows that 12.5% of the web sites passed all automatic accessibility tests, and 5.3% passed all manual accessibility tests. Additionally, the United Nations Global Audit on Web Accessibility [14] indicates that 3% of the evaluated web sites pass all accessibility tests. It should be noted that the eGovMon survey only includes results from Norwegian web sites, whereas both the surveys from MeAC and United Nations evaluated web sites from the EU member states and United Nation member countries respectively. The eGoMon evaluation shows that none of the evaluated sites pass, which is clearly worse than web accessibility surveys results presented above. This discrepancy may be explained by the fact that eGovMon evaluates according to WCAG 1.0 level AA while both MeAC and United Nation survey only included results from WCAG 1.0 level A. Additionally, eGovMon evaluated up to 600 from each site while MeAC has included only 25. A more detailed comparison can be found in [15].

Results from eGovMon compared to National Survey

We are compared the result of eGovMon to the DIFI / survey. This comparison includes only the accessibility part of the survey. The results from have been discretized into six levels using the predefined threshold values shown in Table 2. The more stars a web site has received the more accessible it is.

The UWEM score is not defined to match any distribution. Instead it presents the percentage of barriers detected within the applied tests. A low value means that few barriers were detected which indicates that the corresponding web site is accessible, while a high value indicates that the corresponding web site is inaccessible.

The two methodologies have few similarities. The evaluations are conducted manually by trained experts, sometimes supported by tools, while the current eGovMon runs completely automatically. Furthermore, there exists only one test which is identical for both and eGovMon (valid (X)HTML), and only two tests which are partially overlapping.

Furthermore, the evaluations were carried out in September\slash October 2008, while the eGovMon evaluations were carried out in the beginning of January 2009. It is expected that several of the web sites have been updated in this period which will cause some inconsistency in the data.

Figure 3: Comparison between accessibility results from and eGovMon


Figure 3 presents the correlation between the expert evaluation results from and the automatically retrieved results from eGovMon. Despite the methodological differences, the figure shows that there exists a correlation between these two evaluations. This shows that there is a solid dependency between the results which indicates that both methodologies are measuring accessibility. Web sites which perform good in one survey are very likely to get a similar result by the other (and vice versa).

We can clearly see that the average eGovMon web site score is better (fewer barriers detected) the more stars appointed to the web site by This is true for all groups of stars except for the web sites which received six stars and have been categorised as exceptionally good by These web sites receive a slightly worse score from eGovMon than the web sites which received five stars. This indicates that identification of good accessibility (six stars) cannot be done by automatic evaluation alone, but needs to be supported by manual assessment.

In addition, Figure 3 shows that of the 414 evaluated web sites there are twelve outliers. In ten of these sites, the eGovMon tool detected a large amount of deprecated elements and attributes, which in eGovMon and UWEM has a significant impact on the web site results. In contrast, deprecated elements or attributes are not part of the evaluations of The remaining two outliers received a very good score by eGovMon while they only got one star by The reason for this discrepancy is not known, but the two web sites could have been updated between the and eGovMon evaluation.

Communication of results


The results of benchmarking studies are often eagerly awaited. The municipalities are interested in using them to compare and improve their web sites. To enable more targeted use of the results, more detailed information is needed.

To supplement the large scale web accessibility results, the eAccessibility Checker has been developed.11

The users themselves can use this online tool to evaluate and detect barriers on a single web page by entering a URL. Figure 4 shows an example of results presented by the eAccessibility Checker. They include additional information such as:

This allows web developers to get immediate feedback on their implementation, including how the barriers can be fixed. In the future, the tool could be integrated more tightly in a web development cycle as suggested in [16].

The tool can also be used by web site editors and owners. However, for editors who are not very familiar with (X)HTML and CSS, there exists a challenge with this approach. It is not always easy to understand which problems can be fixed by the web editors and which barriers are located in the templates of the content management systems and therefore need to be fixed by the web developers.

Most existing content management systems (CMS) require the editors to have expert knowledge on accessibility to produce accessible web content, while only few facilitate accessibility [17].

Coming back to the example of alternative text for images, existing CMS handle this quite differently. On the one hand there are systems where it is not at all possible to enter alternative texts for images. On the other hand some systems force the editors to add alternative text whenever images are uploaded. In the third set of CMS editors can choose to add alternative texts to images or not. The editors need to be aware of the web accessibility features of the CMS.

How the barrier can be removed depends on the CMS that is used. There is no universal solution for the problem. We plan to set up a wiki where developers and experts of different content management systems can submit descriptions. The information on how to fix the barriers will be linked to the results presented by the tool. A similar approach is implemented in the Dutch "Web Guidelines Quality Model" [18].

Figure 4: The eAccessibility Checker present a list of results. Each result is linked to further details.


Conclusion and future work

The eGovMon tool has a two-fold strategy for presenting accessibility results. It provides both survey results from large scale evaluations and an interface for detecting barriers in single web pages. This strategy makes it possible to both provide data on a high level -- e.g. how accessible is my county compared to others, and the possibility to find individual barriers on the evaluated pages.

DIFI / provides a yearly benchmarking survey on the quality of public Norwegian web sites, including accessibility. In contrast to eGovMon, these measurements are done manually by trained experts.

Even though the two methodologies both aim at measuring accessibility,

there are many differences. Indeed, there are in total only three overlapping tests. Despite of this, we have shown that there is a correlation between the results produced by the two different methodologies. Web sites which receive a good or bad result in one of the survey are very likely to get a similar result by the other.


1. [The paper has been published in the proceedings of the 5th International Workshop on Automated Specification and Verification of Web Systems 2009: 115-128.]

2. [The eGovMon project ( is co-funded by the Research Council of Norway under the VERDIKT program. Project no.: Verdikt183392/S10]

3. [For performance reasons eGovMon discovers URLs rather than download web pages. Discovering means detecting and finding any URL within the web site, and only downloading enough pages to detect 6000 URLs. For any web site with more than 6000 pages, there is a significant performance improvement gained by only detecting URLs compared to downloading.]

4. []

5. []

6. []

7. [Web pages which could not be parsed are deliberately removed from the evaluation since we do not have any reliable results for these. Note that there is no direct indication that web pages we cannot parse are inaccessible.]

8. [People who are unable to see images include for example people with visual impairments, people using mobile phones with images turned off to reduce data traffic, or people using web browsers without graphical user interface.]

9. [Some tests rely on tool support, e.g. tests for sufficient colour contrast and validity of (X)HTML.]

10. [Some municipality web sites deliberately prevent access from tools which are not known with the help of the robots exclusion standard.]

11. [The checker is available at .]


[1] Study on user satisfaction and impact in EU27,European Commission, DG Information Society and Media,Draft Final report

[2] Quality of public web sites,Norwegian Agency for Public Management and eGovernment (Difi),Retrieved September 2009 from

[3] Web Accessibility Benchmarking Cluster,2007,Retrieved November 4th, 2009, from

[4] Large Scale Web Accessibility Evaluation - A European Perspective,Mikael Snaprud and Agata Sawicka,150-159,2007,HCI (7)

[5] Web Content Accessibility Guidelines 1.0. W3C Recommendation 5 May 1999,World Wide Web Consortium,Retrieved November 4th, 2009, from

[6] The Unified Web Evaluation Methodology (UWEM) 1.2 for WCAG 1.0,Annika Nietzio and Christophe Strobbe and Eric Velleman,394-401,2008,ICCHP

[7] Relaxed -- on the way towards true validation of compound documents,Petr Nalevka,Proceedings of the 15th international conference on World Wide Web

[8] Is It Possible to Predict the Manual Web Accessibility Result Using the Automatic Result?,Universal Access in Human-Computer Interaction. Applications and Services,645--653,2009

[9] Architecture for large-scale automatic web accessibility evaluation based on the UWEM methodology,Norwegian Conference for Informatics (NIK),978-8-251-923-866,November

[10] World Wide Web Consortium,Retrieved November 4th, 2009, from

[11] Building a Web Warehouse for Accessibility Data,Christian Thomsen,2006,Retrieved 2008 from

[12] SAMBA: a semi-automatic method for measuring barriers of accessibility,Giorgio Brajnik and Raffaella Lomuscio,43-50,2007,ASSETS

[13] Kevin Cullen and Lutz Kubitschke and Ingo Meyer,Retrieved March 19th, 2010, from

[14] Nomensa,Retrieved 2007 from

[15] Monitoring Accessibility of Governmental Web Sites in Europe,Christian Bühler,410-417,2008,{International Conference on Computers Helping People with Special Needs (ICCHP)

[16] Comparing accessibility evaluation tools: a method for tool effectiveness,Brajnik,, Giorgio,Univers. Access Inf. Soc.,252--263,2004,3,3,1615-5289,,Berlin, Heidelberg

[17] A Software Solution for Accessible E-Government Portals,Nedbal,, Dietmar and Petz,, Gerald,338--345,2008,ICCHP '08: Proceedings of the 11th international conference on Computers Helping People with Special Needs,978-3-540-70539-0,,Berlin, Heidelberg,linz, Austria

[18] The Web Guidelines quality model,Overheid heeft Antwoord,Retrieved 1st of February 2009 from

The author of this document is:
Morten Goodwin
E-mail address is:
morten.goodwin __at__
Phone is:
+47 95 24 86 79