Our reference management software Docear has not been actively developed for a few years, and recently, the add-on to fetch metadata from Google Scholar stopped working. Fortunately, one of our users (Li Yang) adjusted Docear’s Google Scholar Parser, and now everything works as usual. However, we have not yet integrated Read more…
I love Google Scholar, it’s an amazing search engine for academics, and I have done quite a bit of research on Google Scholar, including an analysis of Google Scholar’s ranking algorithm and the effect of citations on ranking, Google Scholar’s vulnerability against spam, and guidelines for ‘academic search engine optimization Read more…
Update 2018-07-31: We updated the Dropbox Link Google Scholar recently changed its layout, and as a consequence, Docear couldn’t fetch metadata anymore from Google Scholar for PDF files. Fortunately, one of our users (“Silberzwiebel”) adjusted Docear’s Google Scholar Parser, and now everything works as usual. However, we have not yet integrated Read more…
After releasing the Beta some weeks ago, we made some minor adjustments, and consider the current version 1.2 as stable. There are two major improvements and two bad news: Various improvements in the PDF Metadata retrieval function for Google Scholar. If you had some problems in the previous Docear versions with retrieving metadata Read more…
Docear 1.2 Beta is now available and has two major improvements: A new add-on to import any kind of highlighted text from PDFs This new add-on is a true milestone in the Docear development. Until now, you could only import highlighted text from PDF editors that copied the highlighted text Read more…
Thanks to all the generous donors, our student Christoph could work on an improved PDF metadata retrieval for Docear, and today it’s time to present the first preview. The new Docear 1.1 (preview) is able to extract the title of a PDF and fetch appropriate metadata from Google Scholar. Whenever you select a PDF in your mind-map and chose “Create or Update reference”, the following new dialog appears.
The dialog shows the file name of your PDF file, and the extracted title. In the background, the extracted title is sent to Google Scholar and metadata for the first three search results are shown in the dialog. If the title was extracted incorrectly, you can manually correct it. You may also chose to use the PDF’s file name for the search. For instance, when you named your PDF already according to the title, select the radio button with the file name, and the file name is sent as search query to Google Scholar (you may also manually correct the file name before it’s sent to Google Scholar). Of course, all other options you already know are still available, such as creating a blank entry, or importing the XMP data of PDFs. Btw. Docear remembers your choice, i.e. when you select to create a blank entry, the option will be pre-selected when open that dialog the next time. It might happen, that your IP will be blocked by Google Scholar when you use the service too frequently. If this happens, a captcha should appear, and after solving it, you should be able to proceed. We did not yet test this thoroughly. Please let us know your experiences.
The precision of our metadata tool depends on two factors, A) the precision of the title extraction and B) the coverage of Google Scholar. According to a recent experiment, title extraction of our tool is around 70%. However, the final result very much depends on the format of your research articles. In my research field (i.e. recommender systems), I would say that our tool extracts the title correctly for about 90% of the articles in my personal library. In addition, almost all articles that are relevant for my research are indexed by Google Scholar (i would estimate, more than 90%). This means, for around 80% of my PDFs the correct metadata is retrieved fully automatically. Given that I provide the title manually, for even more than 90% the metadata may be retrieved. Please let us know your experience (and your research field). (more…)
Done! We’ve got all the money we need, thank you very much!!!!!!!! Read on here…
One of Docear’s biggest disadvantages, compared to other reference managers, is the rather poor PDF metadata extraction capability. As such, it is no surprise that the second most popular feature request is to add decent PDF metadata extraction and file renaming to Docear. However, adding such a function is a lot of work and we currently do not really have the manpower for this. Fortunately, one of our best students – i.e. Christoph, who already did a lot of work for us – wants a paid job for his semester breaks. If we could pay him 1,800 Euros, he would love to implement the PDF metadata extraction method in his semester breaks, and we have no doubts that he is capable of doing it. The problem is, we don’t have the funds to pay him.
Therefore, we would like to start a call for donation: If you want decent PDF metadata extraction in Docear, please donate, before February 28, 2014. We need 1,800 Euros to pay Christoph for four weeks, almost full-time, starting the end of February.
Are you using Google Scholar? For finding scientific literature? For obtaining citation counts and publication lists of researchers? Have you ever thought about how trustworthy the information is you get on Google Scholar? My colleague and I performed several tests with Google Scholar and found out that it is really Read more…
I am currently in Toronto presenting our new paper titled “On the Robustness of Google Scholar against Spam” at Hypertext 2010. The paper is about some experiments we did on Google Scholar to find out how reliable their citation data etc. is. The paper soon will be downloadable on our publication page but for now i will post a pre-print version of that paper here in the blog:
In this research-in-progress paper we present the current results of several experiments in which we analyzed whether spamming Google Scholar is possible. Our results show, it is possible: We ‘improved’ the ranking of articles by manipulating their citation counts and we made articles appear in searchers for keywords the articles did not originally contained by placing invisible text in modified versions of the article.
Researchers should have an interest in having their articles indexed by Google Scholar and other academic search engines such as CiteSeer(X). The inclusion of their articles in the index improves the ability to make their articles available to the academic community. In addition, authors should not only be concerned about the fact that their articles are indexed, but also where they are displayed in the result list. As with all ranked search results, articles displayed in top positions are more likely to be read.
In recent studies we researched the ranking algorithm of Google Scholar [/fusion_builder_column][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][1-3] and gave advice to researchers on how to optimize their scholarly literature for Google Scholar . However, there are provisos in the academic community against what we called “Academic Search Engine Optimization” . There is the concern that some researchers might use the knowledge about ranking algorithms to ‘over optimize’ their papers in order to push their articles’ rankings in non-legitimate ways.
We conducted some experiments to find out how robust Google Scholar is against spamming. The experiments are not all completed yet but those that are completed show interesting results which are presented in this paper. (more…)
In January we published our article about Academic Search Engine Optimization (ASEO). As expected, feedback varied strongly. Here are some of the opinions on ASEO:
Search engine optimization (SEO) has a golden age in this internet era, but to use it in academic research, it sounds quite strange for me. After reading this publication (pdf) focusing on this issue, my opinion changed.
[/fusion_builder_column][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][…] on first impressions it sounds like the stupidest idea I’ve ever heard.
ASEO sounds good to me. I think it’s a good idea.
As you have probably guessed from the above criticisms, I thought that the article was a piece of crap.
In my opinion, being interested in how (academic) search engines function and how scientific papers are indexed and, of course, responding to these… well… circumstances of the scientific citing business is just natural.
Check out the following Blogs to read more about it (some in German and Dutch) (more…)
The Journal of Scholarly Publishing just published our article Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar and Co. The article introduces and discusses the concept of what we call “academic search engine optimization” (ASEO) and define as: “Academic search engine optimization is the creation, publication, and Read more…