JabRef + automatic metadata extraction from PDF files (like Mendeley)

Most users of SciPlore MindMapping (including me) use JabRef to manage their references. However, I always was thinking about switching to Mendeley because they offer automatic extraction of metadata from PDFs which saves lots of time when creating your bibliography. But Mendeley is not that compatible with SciPlore MindMapping and has some other shortcomings, so I always sticked with JabRef and accepted the time consuming and annoying task of typing titles, author names etc. manually.

But now this will change: our team just created a modified version of JabRef which is able to extract metadata from PDFs. What does that mean? Well, whenever you find a PDF on the internet, you store it on your hard drive, drag&drop it to JabRef and then JabRef will automatically find the right meta data (authors, title, journal, year, page numbers, …) and create a new BibTeX entry which is linked to the PDF file.

And this is how it works:

1. Go to www.mr-dlib.org (more information about this new project is coming soon), download our version of JabRef and install it.

2. Open the software and drag&drop one or several PDFs somewhere on the table which lists your BibTeX entries.

3. A dialog will open in which you select “fetch meta data from Mr. dLib”.

4. The next steps should be self explaining 🙂

If you need a PDF for testing take this one. This should definately work. If not, please contact us.

And maybe the best thing: Our modified version of JabRef also accepts drag&drop directly from SciPlore MindMapping. That means you can drag&drop a PDF from SciPlore MindMapping into JabRef, there the metadata is extracted and a BibTeX entry created and then you can access the BibTeX data directly in SciPlore MindMapping. This will dramatically improve your workflow (if you don’t know about how to use SciPlore MindMapping and JabRef for managing your academic literature and drafting papers read here or check out this video)

Some words about how all this works in detail: The meta data extraction does not take place on your computer but JabRef will transfer your PDF to our server on which it will be analyzed. Our server then returns the extracted meta data. In most cases (I would assume something around 80%) you should get at least the title. And if your PDF is an article in the field of computer science you have a good chance to get much more information. However, we are constantly improving our algorithms and database. And btw. we will not store your PDF on our servers or any information of it. Once we have analyzed it and returned the meta data to you it will be deleted from our server.

UPDATE 2010-09-26

We have talked to the JabRef team and our new features will be integrated into the official JabRef version soon 🙂

Categories: jabrefSoftware & Projects

Tags: jabrefmetadata extractionpdf managementSciPlore MindMapping

Joeran Beel

Please visit https://isg.beel.org/people/joeran-beel/ for more details about me.

20 Comments

Tim · 15th May 2013 at 13:44

I just started using JabRef, but cannot get it to load a pdf file at all. When I try to drag and drop it just gives me the empty sign. When I used the load pdf ps command, it says “No pdf or ps defined” even though it didn’t offer an option by going to a windows file load screen. When I try to load a file, there are only two options “bib” and “all files” no pdf or ps? I need help badly thanks.

Fred · 28th February 2012 at 14:18

Hi there,
How’s it going the jabref and metadata extraction code?

filippo · 6th July 2011 at 09:33

Hi, please, can you let us know at which point are we with the possibility to interact with Mendeley from Sciplore?
Your answer will make me decide if I need to invest time on this or not.
Thanks
Fil

Joeran · 6th July 2011 at 11:25

Mendeley can continuously export a BibTeX file which can be read with SciPlore (there are some bugs which will be fixed in the next Beta). Other interaction such as Drag&Drop from or to Mendeley from SciPlore MindMapping is not possible at this moment. But I like the idea and will ask the Mendeley team if they are interested in cooperating in this point.

Nicky · 15th March 2011 at 15:35

I cannot get the pdf metadata import to work with your JabRef modified version. I get stuck to step 2, When I drag & drop a pdf file into the JabRef table, nothing happens..I am running Kubuntu 10.10. Is there something to d o withe the SciPlore/Freemind configuratuion ?

When starting JabRef from terminal and after an import failure, I get this :

Exception in thread “AWT-EventQueue-0” java.lang.NoClassDefFoundError: freemind/controller/MindMapNodesSelection
at net.sf.jabref.groups.EntryTableTransferHandler.importData(EntryTableTransferHandler.java:152)
at javax.swing.TransferHandler.importData(TransferHandler.java:772)
at javax.swing.TransferHandler$DropHandler.drop(TransferHandler.java:1495)
at java.awt.dnd.DropTarget.drop(DropTarget.java:446)
at javax.swing.TransferHandler$SwingDropTarget.drop(TransferHandler.java:1220)
at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(SunDropTargetContextPeer.java:529)
at sun.awt.X11.XDropTargetContextPeer.processDropMessage(XDropTargetContextPeer.java:183)
at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(SunDropTargetContextPeer.java:842)
at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(SunDropTargetContextPeer.java:766)
at sun.awt.dnd.SunDropTargetEvent.dispatch(SunDropTargetEvent.java:48)
at java.awt.Component.dispatchEventImpl(Component.java:4419)
at java.awt.Container.dispatchEventImpl(Container.java:2163)
at java.awt.Component.dispatchEvent(Component.java:4390)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4461)
at java.awt.LightweightDispatcher.processDropTargetEvent(Container.java:4196)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4050)
at java.awt.Container.dispatchEventImpl(Container.java:2149)
at java.awt.Window.dispatchEventImpl(Window.java:2478)
at java.awt.Component.dispatchEvent(Component.java:4390)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:649)
at java.awt.EventQueue.access$000(EventQueue.java:96)
at java.awt.EventQueue$1.run(EventQueue.java:608)
at java.awt.EventQueue$1.run(EventQueue.java:606)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:105)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:116)
at java.awt.EventQueue$2.run(EventQueue.java:622)
at java.awt.EventQueue$2.run(EventQueue.java:620)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:105)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:619)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:275)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:200)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:190)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:185)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:177)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:138)
Caused by: java.lang.ClassNotFoundException: freemind.controller.MindMapNodesSelection
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
… 37 more

Joeran · 17th March 2011 at 07:26

thank you for reporting this bug. we will have a look at this.

Joeran · 20th March 2011 at 20:50

we will have a look at this, give us a few days…

carly · 6th March 2011 at 13:26

Hello,

I tried to use your version of JabRef, everything works fine until after the pdf was detected by Mr.drlib.
in the mr.dlib window all data is displayed correctly, but the generated BIBTEX key contains weired characters only.

Also I trief Jabrev 2.7B, how is the function activated?

Feel free to answer in german/english.

Joeran · 17th March 2011 at 07:26

the functionality is not yet integrated in the official JabRef. We hope it will be in the next release.

Airbird · 12th December 2010 at 19:15

Hi,

There is no “your version of JabRef” available for the link you provided in this article.

Another question, the video you provided does show very convenient features for big PDF database, but in reality there is no such functions in the latest version. What we can expect for the future? Do you have any fixed date for the full implementation of those functions?

Thanks a lot for this nice work!

Joeran · 18th December 2010 at 09:42

On http://www.mr-dlib.org is a link to our current version.

before sciplore may rename files, a few more month will pass.

Jice · 11th February 2011 at 09:53

I am really waiting for it… I dislike so much renaming my PDF files when I download them ;-D

Costas · 18th October 2010 at 02:39

There is a stand alone program that does similar thing, though it does require input from the user. It reads the pdf itself and allows u to choose / correct the output.

http://www.molspaces.com/d_cb2bib-overview.php

it is free as well!

I dont know how well ur solution will work, some pdfs are quite large and imagine if 100s of users try to upload theyr 1000s of pdfs at the same time!

Andrewp · 18th September 2010 at 20:48

I have been using Sciplore Mind Mapping to manage PDF files for my Masters Degree for the past year. I have found that the reality is very few pdf files from the various online databases have metadata, not even the title. So I am not sure how useful this will be for those in non Computer Science fields. What I do is:

1. Download pdf files that I find from online databases into a directory that is monitored from my Sciplore Mind Map
2. Change the pdf filename into the title of the pdf paper (the document title)
3. Create bookmarks as needed within the pdf file
4. Download the citation information for the pdf that is usually available on the online database, and import it into Endnote for referencing.

Now what would be useful is:

1. If Sciplore Mindmap could retrieve the title field from those pdfs which do have it in metadata, and display that title in the mindmap rather than the original filename, this would save step 2 above which can be a pain with long titles.

2. If the same downloaded file that contains the citation information (for importing into Endnote) could also be drag and dropped into the relevant pdf in the MindMap, that would be sensational.

While I would consider moving from Endnote to JabRef if it could give me additional functionality over Endnote, it seems to me that Endnote is fully featured, intuative and straightforward to use (and the uni provides it for free) and so it would need to be a compelling reason.

Joeran · 23rd September 2010 at 17:20

Thank you for your suggestions. Regarding the metadata extraction: I am not talking about XMP metadata (metadata written by the online databases into the file). I am talking about extracting the data directly from the PDF’s fulltext. This can be done with any PDF which is not a scanned image but containes “real” text. Accordingly, we will be able to provide at least some metadata for almost any PDF. Of course, our algorithms are far from being perfect but I estimate that in 70-80% the title is extracted correctly and in about 50% the author(s) and the abstract.

Mr. Gunn · 17th September 2010 at 02:02

Hi, it’s William Gunn from Mendeley. We’re sorry to hear that Mendeley doesn’t work well with the Sciplore MindMapping tool, but there’s no point in re-inventing the wheel. I’m just thinking now that if we worked together we could add our strength in reference extraction and you could continue to focus on what you do best. Would you like to drop me an email to discuss this further?

Joeran · 19th September 2010 at 21:24

Hello William,

thank you for your interest in SciPlore MindMapping and our webservice. There are two problems SciPlore MindMapping has with Mendeley:

1. Mendeley is using a proprietary data format for references. I know, Mendeley can automatically create a BibTeX file but SciPlore MindMapping cannot write into this BibTeX file because it would be overwritten by Mendeley the next time.

2. Mendeley is using a proprietary data format for PDF bookmarks, so SciPlore MindMapping cannot import these bookmarks.

Because of these reasons we recommend JabRef and we do have the tools for extracting metadata anyway because of our search engine http://www.SciPlore.org. So it is no big effort to integrate it into JabRef for us.

Stephen Fujiwara · 15th April 2011 at 05:55

I’ve recently started using both SciPlore and Mendeley.

I think it would be great if both teams worked together!

Nikolay Karelin · 16th September 2010 at 17:36

It’s frustrating:

Why do you create customized JabRef, not a plugin. It will require additional efforts from JabRef developers to decide and intergrate… By the way, have you contacted JabRef team discussing your update.

The other thing is the meta-data extracting server. You see, not all research team always have good internet connection, and what’s more important, some teams are simply not allowed to share their paper collection with 3rd parties. There is rather interesting ‘queue of requests’ on Zotero site – http://forums.zotero.org/discussion/3574/ – lots of requests and just a few answers about local server.

Joeran · 19th September 2010 at 21:28

hello,

we are talking with the JabRef team and it looks that our changes will be integrated into the official jabref. so there is no further need for a plug in or special version.

I understand that you would prefer having the metadata being extract locally. But:

– Our tools for analysing PDFs are not platform independent and
therefore could not be easily integrated into JabRef.

– We are constantly improving our algorithms (for instance, in the
last 48 hours we made one bug fix and one other improvement). With a
webservice we just have to change our tools on the server and that’s
it. If our tools would be directly integrated into JabRef we would
have to release new versions of JabRef every few days.

– We have not developed our tools for JabRef only but for our project
Mr. dLib and the main concept of Mr. dLib is to offer metadata and
services for academic websites and tools via a webservice. So we
have little intention to release some libraries etc. for which we
would have to write detailed documentations etc. before others could
use it. Instead we will offer an easy to use webservice that can be used by JabRef and others.