Howto: Import references from webpages (e.g. PubMed, IEEE, ACM, …)

Compared to several other reference managers, Docear lacks a feature to directly import references from the Web. For instance, if you visit the detail page of a research article on a publisher’s website, you might wish to directly import the bibliographic data of that article to Docear. Many publishers offer export options for reference managers such as Endnote, RefWorks, or Zotero. So, how do you do it with Docear?

Fortunately, Docear uses the BibTeX format to store references. BibTeX is a de-facto standard for references that is supported by almost any publisher and any reference manager. So, read on to learn how to import bibliographic data from web-pages in two steps!

(more…)

Docear 1.1.1 Beta with Academic Search Feature

As you may know, Docear features a recommender system for academic literature. To find out which papers you might be interested in, the recommender system parses your mind maps and compares them to our digital library with currently about 1.8 million academic articles. While this is helpful and might point you to papers relevant for your general research goals, you will sometimes have to find information on a specific topic and hence search directly.

Based on our knowledge about recommender systems and some user requests, we decided to implement a direct search feature on our digital library. I am very grateful to Keystone, who supported me in visiting Dr. Georgia Kapitsaki at the University of Cyprus (UCY) in Nicosia for a full month to work on this idea. Dr. Kapitsaki’s has already supported us in our work on Docear’s recommender system in July 2013. Her knowledge about the inner mechanics and her ideas on the the search engine were essential for the implementation and the research part of the project.

How to use it

You can access the search feature from Docear’s ribbon bar (“Search and Filter > Documents > Online search”) or by double-clicking the “Online search” entry in Docear’s workspace panel. Since both the recommender system and the personalized search engine make use of your mind maps. you need to enable the recommendation service in Docear.

Screenshot from 2014-07-07 15:19:39

After opening the search page, you will see

  • a text box for your search query,
  • a “Search” button, and
  • several buttons below the text box reflecting search terms you might be interested in. If Docear does not have enough data to decide about your interests, this part remains empty.

Docear-online-search-interface

(more…)

Docear’s users donate $434 in two years (i.e. ~4 Cent per user)

As you probably know, Docear is free and open source. As you might know as well, we do accept donations. Today, we would like to share some statistics with you about the amount of donations we received. Actually, in the past two years, we received 434 US$ (~340€) from from 33 donators. That’s not a lot, given that Docear has several thousands of active users. However, it’s also no surprise and to be honest, we ourselves hardly ever donate for other software tools, so we cannot blame anyone for not donating to Docear (even if he should heavily use it).

The average donation we received was 13.16$ (median was 10$), the highest donation was 50$, the smallest 1$, standard deviation 11.04$. The following chart shows the individual and cumulated donations. Sometimes, we don’t receive any recommendations for several month, sometimes we get multiple ones within a week or so.

(more…)

On the popularity of reference managers, and their rise and fall

This weekend, I had some spare time and I wondered which was the most popular reference manager (and how Docear is doing in comparison). So, I took a list of reference managers from Wikipedia, and checked some statistics on Alexa, Google Trends, and Google Keyword Planner. Since I had the data anyway, I thought I share it with you :-). Please note that this is a quick and dirty analysis. I cannot guarantee that there is not one or two reference managers missing (i just took the list from Wikipedia), and, of course, there are many alternatives to Alexa and Google for measuring the popularity of a reference manager.

(more…)

Photos from the TPDL 2013

The 17th International Conference on Digital Libraries (TPDL2013) is almost over. There were many interesting presentations, great weather, and awesome food :-). I took some pictures, that you also find on Facebook, G+, or as a single file download on Dropbox.

New paper: “A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation”

Yesterday, we published a pre-print on the shortcomings of current research-paper recommender system evaluations. One of the findings was that results of offline and online experiments sometimes contradict each other. We did a more detailed analysis on this issue and wrote a new paper about it. More specifically, we conducted a comprehensive evaluation of a set of recommendation algorithms using (a) an offline evaluation and (b) an online evaluation. Results of the two evaluation methods were compared to determine whether and when results of the two methods contradicted each other. Subsequently, we discuss differences and validity of evaluation methods focusing on research paper recommender systems. The goal was to identify which of the evaluation methods were most authoritative, or, if some methods are unsuitable in general. By ‘authoritative’, we mean which evaluation method one should trust when results of different methods contradict each other.

Bibliographic data: Beel, J., Langer, S., Genzmehr, M., Gipp, B. and Nürnberger, A. 2013. A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation. Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys) (2013), 7–14.

Our current results cast doubt on the meaningfulness of offline evaluations. We showed that offline evaluations could often not predict results of online experiments (measured by click-through rate – CTR) and we identified two possible reasons.

The first reason for the lacking predictive power of offline evaluations is the ignorance of human factors. These factors may strongly influence whether users are satisfied with recommendations, regardless of the recommendation’s relevance. We argue that it probably will never be possible to determine when and how influential human factors are in practice. Thus, it is impossible to determine when offline evaluations have predictive power and when they do not. Assuming that the only purpose of offline evaluations is to predict results in real-world settings, the plausible consequence is to abandon offline evaluations entirely.

(more…)

New pre-print: “Research Paper Recommender System Evaluation: A Quantitative Literature Survey”

As you might know, Docear has a recommender system for research papers, and we are putting a lot of effort in the improvement of the recommender system. Actually, the development of the recommender system is part of my PhD research. When I began my work on the recommender system, some years ago, I became quite frustrated because there were so many different approaches for recommending research papers, but I had no clue which one would be most promising for Docear. I read many many papers (far more than 100), and although there were many interesting ideas presented in the papers, the evaluations… well, most of them were poor. Consequently, I did just not know which approaches to use in Docear.

Meanwhile, we reviewed all these papers more carefully and analyzed how exactly authors conducted their evaluations. More precisely, we analyzed the papers for the following questions.

  1. To what extent do authors perform user studies, online evaluations, and offline evaluations?
  2. How many participants do user studies have?
  3. Against which baselines are approaches compared?
  4. Do authors provide information about algorithm’s runtime and computational complexity?
  5. Which metrics are used for algorithm evaluation, and do different metrics provide similar rankings of the algorithms?
  6. Which datasets are used for offline evaluations
  7. Are results comparable among different evaluations based on different datasets?
  8. How consistent are online and offline evaluations? Do they provide the same, or at least similar, rankings of the evaluated approaches?
  9. Do authors provide sufficient information to re-implement their algorithms or replicate their experiments?

(more…)

Three new research papers (for TPDL’13) about user demographics and recommender evaluations, sponsored recommendations, and recommender persistance

After three demo-papers were accepted for JCDL 2013, we just received notice that another three posters were accepted for presentation at TPDL 2013 on Malta in September 2013. They cover some novel aspects of recommender systems relating to re-showing recommendations multiple times, considering user demographics when evaluating recommender systems, and investigating the effect of labelling recommendations. However, you can read the papers yourself, as we publish them as pre-print:

Paper 1: The Impact of Users’ Demographics (Age and Gender) and other Characteristics on Evaluating Recommender Systems (Download PDF | Doc)

In this paper we show the importance of considering demographics and other user characteristics when evaluating (research paper) recommender systems. We analyzed 37,572 recommendations delivered to 1,028 users and found that elderly users clicked more often on recommendations than younger ones. For instance, users with an age between 20 and 24 achieved click-through rates (CTR) of 2.73% on average while CTR for users between 50 and 54 was 9.26%. Gender only had a marginal impact (CTR males 6.88%; females 6.67%) but other user characteristics such as whether a user was registered (CTR: 6.95%) or not (4.97%) had a strong impact. Due to the results we argue that future research articles on recommender systems should report demographic data to make results better comparable.

(more…)

Docear at JCDL 2013 in Indianapolis (USA), three demo papers, proof-reading wanted

Three of our submissions to the ACM/IEEE Joint Conference on Digital Libraries (JCDL) were accepted. They relate to recommender systems, reference management, and pdf metadata extraction:

Docear4Word: Reference Management for Microsoft Word based on BibTeX and the Citation Style Language (CSL)

In this demo-paper we introduce Docear4Word. Docear4Word enables researchers to insert and format their references and bibliographies in Microsoft Word, based on BibTeX and the Citation Style Language (CSL). Docear4Word features over 1,700 citation styles (Harvard, IEEE, ACM, etc.), is published as open source tool on http://docear.org, and runs with Microsoft Word 2002 and later on Windows XP and later. Docear4Word is similar to the MS-Word add-ons that reference managers like Endnote, Zotero, or Citavi offer with the difference that it is being developed to work with the de-facto standard BibTeX and hence to work with almost any reference manager.

(more…)

List of 6513 stop-words for 17 languages (English, German, French, Italian, and many others)

To optimize Docear’s research paper recommender system I was looking for an extensive stop word list –  a list of words that is ignored for the analysis of your mind maps and research papers (for instance ‘the’, ‘and’, ‘or’, …). It’s easy to find some lists for some languages but I couldn’t find one extensive list for several languages. So I created one based on the stop lists from

  • http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html
  • http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop
  • http://members.unine.ch/jacques.savoy/clef/
  • http://norm.al/2009/04/14/list-of-english-stop-words/
  • http://snowball.tartarus.org/algorithms/english/stop.txt
  • http://solariz.de/649/deutsche-stopwords.htm
  • http://www.lextek.com/manuals/onix/
  • http://www.ranks.nl/resources/stopwords.html
  • http://www.textfixer.com/resources/common-english-words.php
  • http://www.translatum.gr/forum/index.php?topic=2476.0

In case anyone else needs such a stop word list: Here it is, 6513 stop words for English, French, German, Catalan, Czech, Danish, Dutch, Finish, Norwegian, Polish, Portuguese, Rumanian, Spanish, Swedish, and Turkish. I believe that some words have an encoding problem. If you discover an error, please let me know and I will correct it. Also, I wouldn’t be surprised to learn that a stop word from one language is an important word in another language.  If you discover some words in the list that should not be ignored by our research paper recommender system… please let us know 🙂

(more…)

Evaluations in Information Retrieval: Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs. (Root) Mean Squared Error (MSE / RMSE) vs. Precision

As you may know, Docear offers literature recommendations and as you may know further, it’s part of my PhD to find out how to make these recommendations as good as possible. To accomplish this I need to know what a ‘good’ recommendation is. So far we have been using Click Through Rates (CTR) to evaluate different recommendation algorithms. CTR is a common performance measure in online advertisement. For instance, if a recommendation is shown 1000 times and clicked 12 times, then the CTR is 1,2% (12/1000).  That means if an algorithm A has a CTR of 1% and algorithm B has a CTR of 2%, B is better.

Recently, we submitted a paper to a conference. The paper summarized the results of some evaluations we did with different recommendation algorithms. The paper was rejected. Among others, a reviewer criticized the CTR as a too simple evaluation metric. We should rather use metrics that are common in information retrieval such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), or Precision (i.e. Mean Average Precision, MAE).

The funny thing is, CTR, MAE, MSE, RMSE and Precision are basically all the same, at least in a binary classification problem (recommendation relevant / clicked vs. recommendation irrelevant / not clicked). The table shows an example. Assume, you show ten recommendations to users (Rec1…Rec10). Then is the ‘Estimate’ for each recommendation ‘1’, i.e. it’s clicked by a user. The ‘Actual‘ value describes if a user actually clicked on a recommendation (‘1) or not (‘0’). The ‘Error’ is either 0 (if the recommendation actually was clicked) or 1 (if it was not clicked). The mean absolute error (MAE) is simply the sum of all errors (6 in the example) devided by the number of total recommendations (10 in the example). Since we have only zeros and ones, it makes no difference if they are squared or not. Consequently, the mean squared error (MSE) is identical to MAE. In addition, precision and mean average precision (MAP) is identical to CTR; precision (and CTR) is exactly 1-MAE (or 1-MSE), and also RMSE perfectly correlates with the other values because it’s simply the root square of MSE (or MAE).

Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs Mean Squared Error (MSE) vs Root Mean Squared Error (RMSE) vs Precision

In a binary evaluation (relevant / not relevant) in information retrieval, there is no difference in the significance between Click Through Rate (CTR), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Precision.

(more…)