Three new research papers (for TPDL’13) about user demographics and recommender evaluations, sponsored recommendations, and recommender persistance

After three demo-papers were accepted for JCDL 2013, we just received notice that another three posters were accepted for presentation at TPDL 2013 on Malta in September 2013. They cover some novel aspects of recommender systems relating to re-showing recommendations multiple times, considering user demographics when evaluating recommender systems, and investigating the effect of labelling recommendations. However, you can read the papers yourself, as we publish them as pre-print:

Paper 1: The Impact of Users’ Demographics (Age and Gender) and other Characteristics on Evaluating Recommender Systems (Download PDF | Doc)

In this paper we show the importance of considering demographics and other user characteristics when evaluating (research paper) recommender systems. We analyzed 37,572 recommendations delivered to 1,028 users and found that elderly users clicked more often on recommendations than younger ones. For instance, users with an age between 20 and 24 achieved click-through rates (CTR) of 2.73% on average while CTR for users between 50 and 54 was 9.26%. Gender only had a marginal impact (CTR males 6.88%; females 6.67%) but other user characteristics such as whether a user was registered (CTR: 6.95%) or not (4.97%) had a strong impact. Due to the results we argue that future research articles on recommender systems should report demographic data to make results better comparable.

(more…)

Evaluations in Information Retrieval: Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs. (Root) Mean Squared Error (MSE / RMSE) vs. Precision

As you may know, Docear offers literature recommendations and as you may know further, it’s part of my PhD to find out how to make these recommendations as good as possible. To accomplish this I need to know what a ‘good’ recommendation is. So far we have been using Click Through Rates (CTR) to evaluate different recommendation algorithms. CTR is a common performance measure in online advertisement. For instance, if a recommendation is shown 1000 times and clicked 12 times, then the CTR is 1,2% (12/1000).  That means if an algorithm A has a CTR of 1% and algorithm B has a CTR of 2%, B is better.

Recently, we submitted a paper to a conference. The paper summarized the results of some evaluations we did with different recommendation algorithms. The paper was rejected. Among others, a reviewer criticized the CTR as a too simple evaluation metric. We should rather use metrics that are common in information retrieval such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), or Precision (i.e. Mean Average Precision, MAE).

The funny thing is, CTR, MAE, MSE, RMSE and Precision are basically all the same, at least in a binary classification problem (recommendation relevant / clicked vs. recommendation irrelevant / not clicked). The table shows an example. Assume, you show ten recommendations to users (Rec1…Rec10). Then is the ‘Estimate’ for each recommendation ‘1’, i.e. it’s clicked by a user. The ‘Actual‘ value describes if a user actually clicked on a recommendation (‘1) or not (‘0’). The ‘Error’ is either 0 (if the recommendation actually was clicked) or 1 (if it was not clicked). The mean absolute error (MAE) is simply the sum of all errors (6 in the example) devided by the number of total recommendations (10 in the example). Since we have only zeros and ones, it makes no difference if they are squared or not. Consequently, the mean squared error (MSE) is identical to MAE. In addition, precision and mean average precision (MAP) is identical to CTR; precision (and CTR) is exactly 1-MAE (or 1-MSE), and also RMSE perfectly correlates with the other values because it’s simply the root square of MSE (or MAE).

Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs Mean Squared Error (MSE) vs Root Mean Squared Error (RMSE) vs Precision

In a binary evaluation (relevant / not relevant) in information retrieval, there is no difference in the significance between Click Through Rate (CTR), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Precision.

(more…)