Some numbers about Mr. DLib’s Recommendations-as-a-Service (RaaS)

Six months ago, we launched Mr. DLib’s recommendations-as-a-service for Academia. Time, to look back and provide some numbers: Since September 2016, Mr. DLib´s recommender system has delivered 60,836,800 recommendations to our partner Sowiport, and Sowiport’s users have clicked 91,545 of the recommendations. This equals on overall click-through rate (CTR) of Read more…

By Joeran Beel, 9 years ago

Publications

Paper accepted at ISI conference in Berlin: “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport”

Our paper titled “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport” is accepted for publication at the 15th International Symposium on Information Science (ISI) in Berlin. Abstract: Stereotype and most-popular recommendations are widely neglected in the research-paper recommender-system and digital-library community. In other domains such as movie recommendations and hotel Read more…

By Joeran Beel, 9 years ago

Machine Learning

Two of our papers about citation and term-weighting schemes got accepted at iConference 2017

Two of our papers about weighting citations and terms in the context of user modeling and recommender systems got accepted at the iConference 2017. Here are the abstracts, and links to the pre-print versions: Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse Document Frequency’ (IDF) be applied to references? In Read more…

By Joeran Beel, 9 years ago

Pilot Partner

Demo paper about Mr. DLib’s Recommender System integration in JabRef is accepted at ECIR 2017

A demonstration paper about the integration of Mr. DLib in JabRef is accepted for publication at ECIR 2017. We will update this post soon with more information and a pre-print.

By Joeran Beel, 9 years ago

Recommendations as-a-Service (RaaS)

Enhanced re-ranking in our recommender system based on Mendeley’s readership statistics

Content-based filtering recommendations suffer from the problem that no human quality assessments are taken into account. This means a poorly written paper ppoor would be considered equally relevant for a given input paper pinput as high-quality paper pquality if pquality and ppoor contain the same words. We elevate for this problem by using Mendeley’s readership data Read more…

By Joeran Beel, 9 years ago

Trinity College Dublin, Home of our Recommender-Systems Research

Academia

Various positions to work on research-paper recommender systems (Mr. DLib) and Docear (Bachelor/Master/PhD/Post-Doc)

Update on 2018-03-15: We Are Hiring 1 Software Engineer & 1 Software Architect / Product Owner for a Recommender-System Business Start-up Updated on 2017-08-14: Here at Docear and Mr. DLib we have many exciting projects in the field of recommender systems, user modelling, personalisation, and adaptive systems (primarily with Read more…

By Joeran Beel, 9 years ago

Sakura in Tokyo (Internships in Recommender Systems and Machine Learning)

Help Wanted

Students & PostDocs: We have open positions in Tokyo, Copenhagen, and Konstanz (2-24 months)

Update 2016-01-12: The salary in Tokyo would be around 1.600 US$ per month, not 1.400. 2015 has been a rather quiet year for Docear, but 2016 will be different. We have lots of ideas for new projects, and even better – we have funding to pay at least 1 Master or Read more…

By Joeran Beel, 10 years ago

Docear

New paper for UMAP’15: Exploring the Potential of User Modeling based on Mind Maps

One reason why we originally started the development of Docear was our interest in how people are creating mind-maps and how the information contained in mind-maps could be used for building recommender systems and other user-modeling applications. As a result, we developed Docear’s research-paper recommender system, and if you are interested in how Read more…

By Joeran Beel, 11 years ago

Academia

New Pre-print: The Architecture and Datasets of Docear’s Research Paper Recommender System

Our paper “The Architecture and Datasets of Docear’s Research Paper Recommender System” was accepted at the 3rd International Workshop on Mining Scientific Publications (WOSP 2014), which is held in conjunction with the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014). This means, we will be in London from September 9 Read more…

By Joeran Beel, 11 years ago

Docear

Howto: Import references from webpages (e.g. PubMed, IEEE, ACM, …)

Compared to several other reference managers, Docear lacks a feature to directly import references from the Web. For instance, if you visit the detail page of a research article on a publisher’s website, you might wish to directly import the bibliographic data of that article to Docear. Many publishers offer export options for reference managers such as Endnote, RefWorks, or Zotero. So, how do you do it with Docear?

Fortunately, Docear uses the BibTeX format to store references. BibTeX is a de-facto standard for references that is supported by almost any publisher and any reference manager. So, read on to learn how to import bibliographic data from web-pages in two steps!

(more…)

By Joeran Beel, 11 years ago

Academic Search Engines

Docear 1.1.1 Beta with Academic Search Feature

As you may know, Docear features a recommender system for academic literature. To find out which papers you might be interested in, the recommender system parses your mind maps and compares them to our digital library with currently about 1.8 million academic articles. While this is helpful and might point you to papers relevant for your general research goals, you will sometimes have to find information on a specific topic and hence search directly.

Based on our knowledge about recommender systems and some user requests, we decided to implement a direct search feature on our digital library. I am very grateful to Keystone, who supported me in visiting Dr. Georgia Kapitsaki at the University of Cyprus (UCY) in Nicosia for a full month to work on this idea. Dr. Kapitsaki’s has already supported us in our work on Docear’s recommender system in July 2013. Her knowledge about the inner mechanics and her ideas on the the search engine were essential for the implementation and the research part of the project.

How to use it

You can access the search feature from Docear’s ribbon bar (“Search and Filter > Documents > Online search”) or by double-clicking the “Online search” entry in Docear’s workspace panel. Since both the recommender system and the personalized search engine make use of your mind maps. you need to enable the recommendation service in Docear.

After opening the search page, you will see

a text box for your search query,
a “Search” button, and
several buttons below the text box reflecting search terms you might be interested in. If Docear does not have enough data to decide about your interests, this part remains empty.

(more…)

By Joeran Beel, 12 years ago

Information Extraction

Docear 1.1 stable released with strongly improved PDF metadata extraction

Finally, after releasing the alpha and beta, today we release Docear 1.1 stable. If you have tried already one of the previous versions, there is not much news. Otherwise, read on.

Thanks to all the generous donors, our student Christoph could work on an improved PDF metadata retrieval for Docear. The new Docear 1.1 is able to extract the title of a PDF and fetch metadata from Google Scholar for that title. To do so, select a PDF in your mind-map and chose “Create or Update reference”, …

… and the following new dialog appears. The dialog shows the file name of your PDF file, and the extracted title. In the background, the extracted title is sent to Google Scholar and metadata for the first two search results are shown in the dialog. If the title was extracted incorrectly, you can manually correct it. You may also chose to use the PDF’s file name for the search. For instance, when you named your PDF already according to the title, select the radio button with the file name, and the file name is sent as search query to Google Scholar (you may also manually correct the file name before it’s sent to Google Scholar). Of course, all other options you already know are still available, such as creating a blank entry, or importing the XMP data of PDFs. Btw. Docear remembers your choice, i.e. when you select to create a blank entry, the option will be pre-selected when open that dialog the next time. It might happen, that your IP will be blocked by Google Scholar when you use the service too frequently. If this happens, a captcha should appear, and after solving it, you should be able to proceed. We did not yet test this thoroughly. Please let us know your experiences.

The precision of our metadata tool depends on two factors, A) the precision of the title extraction and B) the coverage of Google Scholar. According to a recent experiment, title extraction of our tool is around 70%. However, the final result very much depends on the format of your research articles. In my research field (i.e. recommender systems), I would say that our tool extracts the title correctly for about 90% of the articles in my personal library. In addition, almost all articles that are relevant for my research are indexed by Google Scholar (i would estimate, more than 90%). This means, for around 80% of my PDFs the correct metadata is retrieved fully automatically. Given that I provide the title manually, for even more than 90% the metadata may be retrieved. Please let us know your experience (and your research field). (more…)

By Joeran Beel, 12 years ago

Recommender Systems

Research Paper Recommender Systems: A Literature Survey (Preprint)

As some of you might know, I am a PhD student and the focus of my research lies on research-paper recommender systems. Now, I am about to finish an extensive literature review of more than 200 research articles on research paper recommender systems. My colleagues and I summarized the findings Read more…

By Joeran Beel, 12 years ago

Recommender Systems

New Paper: Utilizing Mind-Maps for Information Retrieval and User Modelling

We recently submitted a paper to UMAP (The Conference on User Modelling, Adaptation, and Personalization). The paper was about how mind-maps could be utilized by information retrieval applications such as recommender systems. The paper got accepted, which means we will be in Aalborg, Denmark from July 7 until July 11 to present the Read more…

By Joeran Beel, 12 years ago

Help Wanted

Wanted: Participants for a User Study about Docear’s Recommender System

We kindly ask you to participate in a brief study about Docear’s recommender system. Your participation will help us to improve the recommender system, and to secure long-term funding for the development of Docear in general! If you are willing to invest 15 minutes of your time, then please continue reading. Participate Read more…

By Joeran Beel, 12 years ago

Docear

Docear 1.1 Beta Released: New PDF Metadata Extraction, Better Zotero and Mendeley BibTeX support, and Bug Fixes

If you have tested the Preview of Docear 1.1 you may already know about some of Docear’s new features. With your feedback and the mind maps, log files and BibTeX files you shared with us, these features have matured. We are proud to introduce the first (and hopefully only) Beta release of Docear 1.1.

The new key features of Docear 1.1

Improved metadata retrieval

Thanks to your donations, our student Christoph greatly enhanced Docear’s PDF metadata retrieval. For us, it works really great, and with Docear 1.1 Beta the last bugs have been fixed. Btw. if you like what Christoph did, and if you are using LibreOffice, or OpenOffice, please also read our call for donation to develop an add-on for these two text processing tools.

Improved support for Zotero / Mendeley BibTeX files

(more…)

By Joeran Beel, 12 years ago

Docear

Preview of Docear 1.1 with PDF Metadata Retrieval from Google Scholar

Thanks to all the generous donors, our student Christoph could work on an improved PDF metadata retrieval for Docear, and today it’s time to present the first preview. The new Docear 1.1 (preview) is able to extract the title of a PDF and fetch appropriate metadata from Google Scholar. Whenever you select a PDF in your mind-map and chose “Create or Update reference”, the following new dialog appears.

The dialog shows the file name of your PDF file, and the extracted title. In the background, the extracted title is sent to Google Scholar and metadata for the first three search results are shown in the dialog. If the title was extracted incorrectly, you can manually correct it. You may also chose to use the PDF’s file name for the search. For instance, when you named your PDF already according to the title, select the radio button with the file name, and the file name is sent as search query to Google Scholar (you may also manually correct the file name before it’s sent to Google Scholar). Of course, all other options you already know are still available, such as creating a blank entry, or importing the XMP data of PDFs. Btw. Docear remembers your choice, i.e. when you select to create a blank entry, the option will be pre-selected when open that dialog the next time. It might happen, that your IP will be blocked by Google Scholar when you use the service too frequently. If this happens, a captcha should appear, and after solving it, you should be able to proceed. We did not yet test this thoroughly. Please let us know your experiences.

By Joeran Beel, 12 years ago

Developers

Docear’s users donate $434 in two years (i.e. ~4 Cent per user)

As you probably know, Docear is free and open source. As you might know as well, we do accept donations. Today, we would like to share some statistics with you about the amount of donations we received. Actually, in the past two years, we received 434 US$ (~340€) from from 33 donators. That’s not a lot, given that Docear has several thousands of active users. However, it’s also no surprise and to be honest, we ourselves hardly ever donate for other software tools, so we cannot blame anyone for not donating to Docear (even if he should heavily use it).

The average donation we received was 13.16$ (median was 10$), the highest donation was 50$, the smallest 1$, standard deviation 11.04$. The following chart shows the individual and cumulated donations. Sometimes, we don’t receive any recommendations for several month, sometimes we get multiple ones within a week or so.

(more…)

By Joeran Beel, 12 years ago

Off-Topic

On the popularity of reference managers, and their rise and fall

This weekend, I had some spare time and I wondered which was the most popular reference manager (and how Docear is doing in comparison). So, I took a list of reference managers from Wikipedia, and checked some statistics on Alexa, Google Trends, and Google Keyword Planner. Since I had the data anyway, I thought I share it with you :-). Please note that this is a quick and dirty analysis. I cannot guarantee that there is not one or two reference managers missing (i just took the list from Wikipedia), and, of course, there are many alternatives to Alexa and Google for measuring the popularity of a reference manager.

(more…)

By Joeran Beel, 12 years ago

Research

Photos from the TPDL 2013

The 17th International Conference on Digital Libraries (TPDL2013) is almost over. There were many interesting presentations, great weather, and awesome food :-). I took some pictures, that you also find on Facebook, G+, or as a single file download on Dropbox.

By Joeran Beel, 12 years ago

Recommender Systems

New paper: “A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation”

Yesterday, we published a pre-print on the shortcomings of current research-paper recommender system evaluations. One of the findings was that results of offline and online experiments sometimes contradict each other. We did a more detailed analysis on this issue and wrote a new paper about it. More specifically, we conducted a comprehensive evaluation of a set of recommendation algorithms using (a) an offline evaluation and (b) an online evaluation. Results of the two evaluation methods were compared to determine whether and when results of the two methods contradicted each other. Subsequently, we discuss differences and validity of evaluation methods focusing on research paper recommender systems. The goal was to identify which of the evaluation methods were most authoritative, or, if some methods are unsuitable in general. By ‘authoritative’, we mean which evaluation method one should trust when results of different methods contradict each other.

Bibliographic data: Beel, J., Langer, S., Genzmehr, M., Gipp, B. and Nürnberger, A. 2013. A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation. Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys) (2013), 7–14.

Our current results cast doubt on the meaningfulness of offline evaluations. We showed that offline evaluations could often not predict results of online experiments (measured by click-through rate – CTR) and we identified two possible reasons.

The first reason for the lacking predictive power of offline evaluations is the ignorance of human factors. These factors may strongly influence whether users are satisfied with recommendations, regardless of the recommendation’s relevance. We argue that it probably will never be possible to determine when and how influential human factors are in practice. Thus, it is impossible to determine when offline evaluations have predictive power and when they do not. Assuming that the only purpose of offline evaluations is to predict results in real-world settings, the plausible consequence is to abandon offline evaluations entirely.

(more…)

By Joeran Beel, 12 years ago

Recommender Systems

New pre-print: “Research Paper Recommender System Evaluation: A Quantitative Literature Survey”

As you might know, Docear has a recommender system for research papers, and we are putting a lot of effort in the improvement of the recommender system. Actually, the development of the recommender system is part of my PhD research. When I began my work on the recommender system, some years ago, I became quite frustrated because there were so many different approaches for recommending research papers, but I had no clue which one would be most promising for Docear. I read many many papers (far more than 100), and although there were many interesting ideas presented in the papers, the evaluations… well, most of them were poor. Consequently, I did just not know which approaches to use in Docear.

Meanwhile, we reviewed all these papers more carefully and analyzed how exactly authors conducted their evaluations. More precisely, we analyzed the papers for the following questions.

To what extent do authors perform user studies, online evaluations, and offline evaluations?
How many participants do user studies have?
Against which baselines are approaches compared?
Do authors provide information about algorithm’s runtime and computational complexity?
Which metrics are used for algorithm evaluation, and do different metrics provide similar rankings of the algorithms?
Which datasets are used for offline evaluations
Are results comparable among different evaluations based on different datasets?
How consistent are online and offline evaluations? Do they provide the same, or at least similar, rankings of the evaluated approaches?
Do authors provide sufficient information to re-implement their algorithms or replicate their experiments?

(more…)

By Joeran Beel, 12 years ago

Help Wanted

Three new research papers (for TPDL’13) about user demographics and recommender evaluations, sponsored recommendations, and recommender persistance

After three demo-papers were accepted for JCDL 2013, we just received notice that another three posters were accepted for presentation at TPDL 2013 on Malta in September 2013. They cover some novel aspects of recommender systems relating to re-showing recommendations multiple times, considering user demographics when evaluating recommender systems, and investigating the effect of labelling recommendations. However, you can read the papers yourself, as we publish them as pre-print:

Paper 1: The Impact of Users’ Demographics (Age and Gender) and other Characteristics on Evaluating Recommender Systems (Download PDF | Doc)

In this paper we show the importance of considering demographics and other user characteristics when evaluating (research paper) recommender systems. We analyzed 37,572 recommendations delivered to 1,028 users and found that elderly users clicked more often on recommendations than younger ones. For instance, users with an age between 20 and 24 achieved click-through rates (CTR) of 2.73% on average while CTR for users between 50 and 54 was 9.26%. Gender only had a marginal impact (CTR males 6.88%; females 6.67%) but other user characteristics such as whether a user was registered (CTR: 6.95%) or not (4.97%) had a strong impact. Due to the results we argue that future research articles on recommender systems should report demographic data to make results better comparable.

(more…)

By Joeran Beel, 13 years ago

Docear

Docear at JCDL 2013 in Indianapolis (USA), three demo papers, proof-reading wanted

Three of our submissions to the ACM/IEEE Joint Conference on Digital Libraries (JCDL) were accepted. They relate to recommender systems, reference management, and pdf metadata extraction:

Docear4Word: Reference Management for Microsoft Word based on BibTeX and the Citation Style Language (CSL)

In this demo-paper we introduce Docear4Word. Docear4Word enables researchers to insert and format their references and bibliographies in Microsoft Word, based on BibTeX and the Citation Style Language (CSL). Docear4Word features over 1,700 citation styles (Harvard, IEEE, ACM, etc.), is published as open source tool on http://docear.org, and runs with Microsoft Word 2002 and later on Windows XP and later. Docear4Word is similar to the MS-Word add-ons that reference managers like Endnote, Zotero, or Citavi offer with the difference that it is being developed to work with the de-facto standard BibTeX and hence to work with almost any reference manager.

(more…)

By Joeran Beel, 13 years ago

Information Extraction

Metadata retrieval and recommendations deactivated due to heavy server load

We are experiencing a very high server load due to several reasons (many people are using our services, we are doing some extensive research analyses, etc.). Therefore we decided to deactivate the metadata retrieval and recommendations for a while, hopefully only a few days. We will let you know as Read more…

By Joeran Beel, 13 years ago