A solution to the exact match on rare item searches. Introduction of the Lost Sheep algorithm

General Information

Read the publication: A solution to the exact match on rare item searches. Introduction of the Lost Sheep algorithm .

Title: A solution to the exact match on rare item searches. Introduction of the Lost Sheep algorithm.
Author(s): Morten Goodwin.
Published date: May 2011.
Published at: International Conference on Web Intelligence, Mining and Semantics (WIMS) 2011

Abstract


This paper proposes an approach for finding a single web page in a large web site or a cloud of web pages. We formalize this problem and map it to the exact match on rare item searches (EMRIS). The EMRIS is not much addressed in the literature, but many closely related problems exists. This paper presents a state-of-the-art survey on related problems in the fields of information retrieval, web page classification and directed search.

As a solution to the EMRIS, this paper presents an innovative algorithm called the lost sheep. The lost sheep is specifically designed to work in web sites with of links, link texts and web pages. It works as a pre-classifier on link texts to decide if a web page is candidate for further evaluation.

This paper also defines sound metrics to evaluated the EMRIS. The lost sheep outperforms all comparable algorithms both when it comes to maximizing accuracy and minimizing the number of downloaded pages.

The author of this document is:
Morten Goodwin
E-mail address is:
morten.goodwin ASCII 64 uia.no
Phone is:
+47 95 24 86 79