Adapting to feasible polling rates in an incremental web crawler using learning automata

General Information

Download Adapting to feasible polling rates in an incremental web crawler using learning automata as PDF (216 KB) .

Title: Adapting to feasible polling rates in an incremental web crawler using learning automata.
Author(s): Morten Goodwin.
Published date: April 2005.
Published at: Workshop on Web Accessibility and Metamodelling 2005

Abstract


In this poster we will propose a model schema to detect the maximum possible number of updates on a fixed number of monitored web pages with limited available capacity. We present a model using connected fixed structure deterministic learning automata adapting to the polling frequencies most desirable for detecting as many updates as possible.
The proposed schema can be implemented as part of an incremental crawler monitoring any given number of web pages when it is feasible to poll and download only content which is out of date compared to the local repository. The heart of the model involves connecting the polling frequencies of each monitored web page with a learning automaton. Each automaton increases the polling frequency when an update is detected and decreases the frequency when an assumed update did not occur. In this way the automata adapt the polling frequencies toward the frequencies of change of their corresponding web pages.
By connecting the automata to such a degree that they can only decrease the polling frequency whenever the maximal capacity is exceeded and only increase whenever the capacity is not exceeded, the schema will quickly adapt to using the exact available capacity.
We show, through experiments, that an incremental crawler with learning automata will utilize the available capacity much more than a traditional batch crawler. Our experiments show that the proposed schema will have up to twice as many polls toward modified pages than a batch crawler.

BIBTEX


@article{wwam2006,
author{Morten Goodwin Olsen},
title{Adapting to feasible polling rates in an incremental web crawler using learning automata.},
booktitle{WWAM},
year{2005},
month{April}
}

The author of this document is:
Morten Goodwin
E-mail address is:
morten.goodwin circle-a uia.no
Phone is:
+47 95 24 86 79