Keyphrase counts and their effect on clickthrough rates (CTR)

Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems

Our paper “Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems” was accepted for publication at the ACM/IEEE Joint Conference on Digital Libraries. 1 Introduction Many recommendation algorithms are available to operators of recommender systems in digital libraries. The effectiveness of algorithms in real-world systems is Read more…

RARD I: The Related-Article Recommender-System Dataset

RARD: The Related-Article Recommendation Dataset

We are proud to announce the release of ‘RARD’, the related-article recommendation dataset from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains information about 57.4 million recommendations that were displayed to the users of Sowiport. Information includes details on which recommendation approaches were used (e.g. content-based Read more…

Paper accepted at ISI conference in Berlin: “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport”

Our paper titled “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport” is accepted for publication at the 15th International Symposium on Information Science (ISI) in Berlin. Abstract: Stereotype and most-popular recommendations are widely neglected in the research-paper recommender-system and digital-library community. In other domains such as movie recommendations and hotel Read more…

New pre-print: “Research Paper Recommender System Evaluation: A Quantitative Literature Survey”

As you might know, Docear has a recommender system for research papers, and we are putting a lot of effort in the improvement of the recommender system. Actually, the development of the recommender system is part of my PhD research. When I began my work on the recommender system, some years ago, I became quite frustrated because there were so many different approaches for recommending research papers, but I had no clue which one would be most promising for Docear. I read many many papers (far more than 100), and although there were many interesting ideas presented in the papers, the evaluations… well, most of them were poor. Consequently, I did just not know which approaches to use in Docear.

Meanwhile, we reviewed all these papers more carefully and analyzed how exactly authors conducted their evaluations. More precisely, we analyzed the papers for the following questions.

  1. To what extent do authors perform user studies, online evaluations, and offline evaluations?
  2. How many participants do user studies have?
  3. Against which baselines are approaches compared?
  4. Do authors provide information about algorithm’s runtime and computational complexity?
  5. Which metrics are used for algorithm evaluation, and do different metrics provide similar rankings of the algorithms?
  6. Which datasets are used for offline evaluations
  7. Are results comparable among different evaluations based on different datasets?
  8. How consistent are online and offline evaluations? Do they provide the same, or at least similar, rankings of the evaluated approaches?
  9. Do authors provide sufficient information to re-implement their algorithms or replicate their experiments?

(more…)

We need your help (i.e. a server) to build a repository for academic PDF files

It’s a while ago that we started crawling the Web for academic PDFs to index them and use them for Docear’s research paper recommender system. Meanwhile, we have collected quite a few PDFs.  Unfortunately, in a foreseeable future, our servers’ disks will be full and the load of our servers is too high already (that’s why you sometimes won’t get recommendations in Docear – our servers simply are too busy).

Since our budget is tight and we don’t want to spend too much time for server administration neither, we are asking for your help: Do you have a server that you could spare? What we need is the following

(more…)