Web Content Mining

Recent Updates:
New Scientific Paper:
Automatic Checking of Alternative Texts on Web Pages 2010-07-15
New Blog Post:
A collaborative approach for improving local government web sites 2010-07-30

General Information

Download Web Content Mining as PDF (427 KB) .

Title: Web Content Mining.
Author(s): Sigbjørn Tvedt, Christian Kroken.
Published date: December 2006.
Published at: Web-Mining and Data Analysis 2006

Abstract


The goal of this project is to create a crawler/classifier that downloads the images in a web page and tries to classify the content of each image into different categories, e.g., mathematical formula, logo, buttons, and so on. The focus should be on automatic detection of image usage that reduces the accessibility of a web page.

The author of this document is:
Morten Goodwin
E-mail address is:
morten.goodwin ASCII 64 tingtun.no
Phone is:
+47 95 24 86 79

Valid XHTML 1.0! Valid CSS! Checked by eGovMon