Web Content Mining

Recent Updates:
New Scientific Paper:
A solution to the exact match on rare item searches: introducing the lost sheep algorithm 2011
New Blog Post:
Towards Automated eGovernment Monitoring 2011-09-26
Ph.D. Thesis:
Towards Automated eGovernment Monitoring

General Information

Download Web Content Mining as PDF (427 KB) .

Title: Web Content Mining.
Author(s): Sigbjørn Tvedt, Christian Kroken.
Published date: December 2006.
Published at: Web-Mining and Data Analysis 2006

Abstract


The goal of this project is to create a crawler/classifier that downloads the images in a web page and tries to classify the content of each image into different categories, e.g., mathematical formula, logo, buttons, and so on. The focus should be on automatic detection of image usage that reduces the accessibility of a web page.

The author of this document is:
Morten Goodwin
E-mail address is:
morten.goodwin __at__ tingtun.no
Phone is:
+47 95 24 86 79

Valid XHTML 1.0! Valid CSS! Checked by eGovMon