Applied Math & Computer Science Lab
Data Analysis, Optimization & Mathematical Modeling, Artificial Intelligence, Neural Net For Everyday Life Applications
Artificial Intelligence/Data Mining Links Webmaster Resources AMCSL Forum: Web Mining Submit Link Archive

Website User Modeling with Perl

1. Mathematical description of website users
   Website user modeling is useful for providing dynamic content to user, making prediction on sale and in many other situations. This article will show how web user modeling can be implemented with perl and vector space theory.
Let's start from the simple examlple. We have website that is showing content on different programming languages. Thus users can be described by programming language of their interest. Let say the website is providing content on the following programming languages:

Perl, VB, Java, C++, C#, ASP.NET

Mathematically, we can describe the user by vector like
(0,1,0,1,0,1)

where each 0 respresents that the user is not interesting in given programming language and 1 means the user is interesting. For convenience we put the vector in row format rather than colunm as it should be.

Thus the user (1,0,0,0,0,0) is only interesting in Perl
while the user (1,1,0,0,0,1) is interesting in Perl, VB, ASP.NET
and the user (0,0,0,1,1,0) in C++ and C#.

2. Similarity measures
Obviously users with similar interests will have the similar vectors. The cosine of the angle between 2 vectors v1 and v2 is very offen used in SE, IR theory and so on to get the similarity measure

cos(v1,v2)= (v1 * v2) / (||v1||*||v2||)


There is also another way to get the similarity measure by using vector distance (difference)

diff(v1,v2) = v11-v21 + v12-v22 + ...+ v1n-v2n

This formula has advantage that no need to worry about division by 0. There are also ohter similarity measures.

3. Perl source code for vector-space modeling
Perl functions needed for calculating cosine are provided below.
vs.pm has two functions. Function get_cos vs.pm calculates value of cosine.
The second function calculates all cosines for query vector and returns two dimensional array where one column is cosine and another is the index.
Array is sorted by value of cosine in descending order. The script vs3.cgi shows how to use above functions.