Scalability Issues for large scale Web Accessibility Evaluation

Recent Updates:
New Scientific Paper:
Automatic Checking of Alternative Texts on Web Pages 2010-07-15
New Blog Post:
A collaborative approach for improving local government web sites 2010-07-30

General Information

Download Scalability Issues for large scale Web Accessibility Evaluation as PDF (313 KB) .

Title: Scalability Issues for large scale Web Accessibility Evaluation.
Author(s): Mikael Snaprud, Nils Ulltveit-Moe, Morten Goodwin, Torben Bach Pedersen, Christian Thomsen, Anand B. Pillai, Terje Gjøsæter, Helene Unander.
Published date: June 2006.
Published at: Second Workshop on Web Accessibility and Metamodelling 2006

Abstract


To enable large scale benchmarking of barriers encountered on web sites, a measurement machinery for a prototype European Internet Accessibility Observatory (EIAO) has been designed.
The first prototype Observatory (release 1.0) demonstrates assessments covering 145 web sites, using a singlethreaded crawler. This first release has focused on correct operation, and not so much on performance. This paper will discuss the experiences made so far, and outline possible ways of addressing performance bottlenecks to scale up for large scale evaluation. The final version of the prototype Observatory will monthly publish updated measurements from 10.000 web sites.
The main elements of the first release of the Observatory are briefly discussed below. They consist of a crawler, Web Accessibility Metrics modules that perform accessibility assessments, and a data warehouse, that will be used to support aggregation and presentation of data from the large scale assessments.


Each step in the production line will take some time. The crawler needs to sample simulated use scenarios by performing random walks on the web site. This process involves time for downloading and parsing web pages, and updating the in-link degree (n) of the different URLs that are visited. All downloaded web pages will be assessed by a set of Web Accessibility Metrics modules that will return EARL reports, that are stored in the RDF repository. Profiling during the test phase has shown that there are several possibilities for improvements and paralellisation, both on crawler, database and WAM level, and this paper will discuss the approaches that seem most efficient for scaling up the observatory to the required size.
Several approaches to attack the bottlenecks will be explored, like distribution, clustering, threading, asynchronous event handling and possible optimisations on underlying database structures for RDF handling.

The paper will also address the challenges with the vast amount of storage that is needed for the data produced by the Observatory, and other potential limitations like network bandwidth and processing needs for web pages and EARL reports.

BIBTEX


@article{wwamscalable2006,
author{Mikael Snaprud, Nils Ulltveit-Moe, Morten Goodwin Olsen, Torben Bach Pedersen, Christian Thomsen, Anand B. πllai, Terje Gjøsæter, Helene Unander},
title{Scalability Issues for large scale Web Accessibility Evaluation.},
booktitle{WWAM},
year{2006},
month{June}
}

The author of this document is:
Morten Goodwin
E-mail address is:
morten.goodwin __at__ tingtun.no
Phone is:
+47 95 24 86 79

Valid XHTML 1.0! Valid CSS! Checked by eGovMon