Today, I got the new Google Pixel 5 in my mailbox. To me, the camera of a mobile phone is of particular importance. Given the excellent feedback in the media on Google´s phone cameras, I was excited to try it out myself. I compared the Google Pixel 5 with my Read more…
The growth of Coronavirus cases in Germany is basically identical to the development in Italy, just ~1 week delayed (no machine learning needed to see this)
Usually, I do research in the field of recommender systems, algorithm selection, and machine learning. But, to see how Coronavirus cases develop in Germany, and to predict how it will continue to develop, no machine learning is needed. The chart below shows how Covid-19 cases recently developed in Italy and Read more…
Short version: 1. Opodo insists that a canceled flight had flown as scheduled. 2. Opodo’s Twitter support ignores all requests, the web chat refers me to the hotline, the hotline refers me to sending an email, and the email system refers me to a contact form that does not exist. Read more…
I am not usually writing in this blog about my experience with software, but today I make an exception to prevent others from making the same bad experience that I had with Google File Stream, the enterprise version of Google Drive / GDrive. My research group is using Google File Read more…
As you probably know, Docear is free and open source. As you might know as well, we do accept donations. Today, we would like to share some statistics with you about the amount of donations we received. Actually, in the past two years, we received 434 US$ (~340€) from from 33 donators. That’s not a lot, given that Docear has several thousands of active users. However, it’s also no surprise and to be honest, we ourselves hardly ever donate for other software tools, so we cannot blame anyone for not donating to Docear (even if he should heavily use it).
The average donation we received was 13.16$ (median was 10$), the highest donation was 50$, the smallest 1$, standard deviation 11.04$. The following chart shows the individual and cumulated donations. Sometimes, we don’t receive any recommendations for several month, sometimes we get multiple ones within a week or so.
This weekend, I had some spare time and I wondered which was the most popular reference manager (and how Docear is doing in comparison). So, I took a list of reference managers from Wikipedia, and checked some statistics on Alexa, Google Trends, and Google Keyword Planner. Since I had the data anyway, I thought I share it with you :-). Please note that this is a quick and dirty analysis. I cannot guarantee that there is not one or two reference managers missing (i just took the list from Wikipedia), and, of course, there are many alternatives to Alexa and Google for measuring the popularity of a reference manager.
Update 2013-11-11: For some statistical data read On the popularity of reference managers, and their rise and fall
Update 2014-01-15: For a detailed review of Docear and other tools, read Comprehensive Comparison of Reference Managers: Mendeley vs. Zotero vs. Docear
At time of writing these lines, there are 31 reference management tools listed on Wikipedia and there are many attempts to identify the best ones, or even the best one (e.g. here, here, here, here, here, here, here, here, … ). Typically, reviewers gather a list of features and analyze which reference managers offer most of these features, and hence are the best ones. Unfortunately, each reviewer has its own preferences about which features are important, and so have you: Are many export formats more important than a mobile version? Is it more important to have metadata extraction for PDF files than an import for bibliographic data from academic search engines? Would a thorough manual be more important than free support? How important is a large number of citation styles? Do you need a Search & Replace function? Do you want to create synonyms for term lists (whatever that means)? …?
Let’s face the truth: it’s impossible to determine which of the hundred potential features you really need.
So how can you find the best reference manager? Recently we had an ironic look at the question what the best reference managers are. Today we want to have a more serious analysis, and propose to first identify the bad reference managers, instead of looking for the very best ones. Then, if the bad references managers are found, it should be easier to identify the best one(s) from the few remaining.
What makes a bad – or evil – reference manager? We believe that there are three no-go ‘features’ that make a reference manager so bad (i.e. so harming in the long run) that you should not use it, even if it possesses all the other features you might need.
1. A “lock-in feature” that prevents you from ever switching to a competitor tool
A reference manager might offer exactly the features you need, but how about in a few years? Maybe your needs are changing, other reference managers are just becoming better than your current tool, or your boss is telling you that you have to use a specific tool. In this case it is crucial that your current reference manager doesn’t lock you in and allows switching to your new favorite reference managers. Otherwise, you will have a serious problem. You might have had the perfect reference manager for the past one or two years. But then you are bound to the now not-so-perfect tool for the rest of your academic life. To being able to switch to another reference manager, your reference manager should be offering at least one of the following three functions (ideally the first one).
- Your data should be stored in a standard format that other reference managers can read
- Your reference manager should be able to export your data in a standard format
- Your reference manager allows direct access to your data, so other developers can write import filters for it.
Update 2013-10-14: For a more serious analysis read What makes a bad reference manager?
Update 2013-11-11: For some statistical data read On the popularity of reference managers, and their rise and fall
Update 2014-01-15: For a detailed review, read Comprehensive Comparison of Reference Managers: Mendeley vs. Zotero vs. Docear
<irony>Have you ever wondered what the best reference management software is? Well, today I found the answer on RefWorks’ web site: The best reference manager is RefWorks! Look at the picture below. It might be a little bit confusing but we did the math: Refworks is best and beats EndNote, EndNote Web, Reference Manager, Zotero, and Mendeley in virtually all categories.
As a Docear user you probably did some research before you decided to use Docear and maybe you stumbled upon the reference manager Mendeley. Mendeley definitely has some nice features and made it to one of the top reference management tools in the past few years (besides the fact that they don’t use mind maps for literature management, the main reason I wouldn’t use Mendeley is the fact that they store the annotations you make in PDFs in a proprietary format — this locks you in to Mendeley and makes it really hard/impossible to switch to another tool). Two days ago Techcrunch reported that the well known publisher Elsevier takes an interest in buying Mendeley for presumably 100.000.000 US$. That’s right: 100 Million US$. Considering that Mendeley is supposed to have 2 Million users that would be 50$ per user (and I don’t know if the 2 Million users are really active users). As far as I remember, the shareholders of Facebook payed about 100 Dollars per user when Facebook shares were first available at the stock market. Not bad :-).
What do you think? Is Mendeley worth 100 Million Dollar? Is it a smart move from Elsevier to buy Mendeley? And what are the consequences for Mendeley’s users since Elsevier is known for a very harsh publishing policy which lead to a boycott of Elsevier and lots of criticism by many academics).
To optimize Docear’s research paper recommender system I was looking for an extensive stop word list – a list of words that is ignored for the analysis of your mind maps and research papers (for instance ‘the’, ‘and’, ‘or’, …). It’s easy to find some lists for some languages but I couldn’t find one extensive list for several languages. So I created one based on the stop lists from
In case anyone else needs such a stop word list: Here it is, 6513 stop words for English, French, German, Catalan, Czech, Danish, Dutch, Finish, Norwegian, Polish, Portuguese, Rumanian, Spanish, Swedish, and Turkish. I believe that some words have an encoding problem. If you discover an error, please let me know and I will correct it. Also, I wouldn’t be surprised to learn that a stop word from one language is an important word in another language. If you discover some words in the list that should not be ignored by our research paper recommender system… please let us know 🙂
Evaluations in Information Retrieval: Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs. (Root) Mean Squared Error (MSE / RMSE) vs. Precision
As you may know, Docear offers literature recommendations and as you may know further, it’s part of my PhD to find out how to make these recommendations as good as possible. To accomplish this I need to know what a ‘good’ recommendation is. So far we have been using Click Through Rates (CTR) to evaluate different recommendation algorithms. CTR is a common performance measure in online advertisement. For instance, if a recommendation is shown 1000 times and clicked 12 times, then the CTR is 1,2% (12/1000). That means if an algorithm A has a CTR of 1% and algorithm B has a CTR of 2%, B is better.
Recently, we submitted a paper to a conference. The paper summarized the results of some evaluations we did with different recommendation algorithms. The paper was rejected. Among others, a reviewer criticized the CTR as a too simple evaluation metric. We should rather use metrics that are common in information retrieval such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), or Precision (i.e. Mean Average Precision, MAE).
The funny thing is, CTR, MAE, MSE, RMSE and Precision are basically all the same, at least in a binary classification problem (recommendation relevant / clicked vs. recommendation irrelevant / not clicked). The table shows an example. Assume, you show ten recommendations to users (Rec1…Rec10). Then is the ‘Estimate’ for each recommendation ‘1’, i.e. it’s clicked by a user. The ‘Actual‘ value describes if a user actually clicked on a recommendation (‘1) or not (‘0’). The ‘Error’ is either 0 (if the recommendation actually was clicked) or 1 (if it was not clicked). The mean absolute error (MAE) is simply the sum of all errors (6 in the example) devided by the number of total recommendations (10 in the example). Since we have only zeros and ones, it makes no difference if they are squared or not. Consequently, the mean squared error (MSE) is identical to MAE. In addition, precision and mean average precision (MAP) is identical to CTR; precision (and CTR) is exactly 1-MAE (or 1-MSE), and also RMSE perfectly correlates with the other values because it’s simply the root square of MSE (or MAE).
This post has nothing to do with Docear, but if you are interested in online marketing, it might be of interest to you. A few days ago, LinkedIn sent me a 50$ voucher for their new “LinkedIn Ads” program. LinkedIn Ads is similar to Google Adwords and allows organizations (such as Docear) to advertise on the profile pages of LinkedIn members (see screenshot).
I was curious how effective LinkedIn Ads would be and started a campaign. In addition, I started a campaign with Google Adwords (see screenshot below) which is the advertisement program of Google. Both campaigns were rather similar and had similar ads. However, results highly differed.
I just wondered which email provider students and scientists prefer. To find out I wrote a little script which analyzed the domain names of SciPlore MindMapping`s newsletter subscribers (there are 1375 of them). And, the answer is: Gmail (Google Mail) is the most preferred email provider. 42% of all subscribers Read more…