Docear Beta 7 with PDF Metadata Extraction

Beta 7 is out and has one major new feature: (semi-)automatic extraction of bibliographic metadata from PDF files. That means, when creating a new reference, you don’t have to type everything manually but bibliographic information such as title, author, year, journal, etc. is all provided to you automatically. Here is how it works:

Do a right click on a node with a PDF and select as shown in the picture

Docear's bibliographic PDF metadata extraction: Select Docear's digital library

Select Docear’s digital library to retrieve data from

Provide the document’s title if it’s not already correctly extracted from the PDF (in about 80% the title should be extracted correctly). A click on “Yes” will send the title to Docear’s digital library and return all metadata for the documents with this title

We have looooooots of data in our database. So the chance to get the correct metadata is really high

Done 🙂

Maybe you remember that we had already a similar function retrieving data from Mr. DLib. However, our new function is much better. First of all, not the entire PDF is send but only the title of the PDF and a hash value. That means, instead of transferring maybe 1MB or more only a few KB are transferred. This will speed up the entire process dramatically. Second, Mr. DLib had a rather small database. Docear’s digital library is filled with metadata from various sources and chances are really high we have the correct metadata available. There is one downside, though: Currently, every user can only do 15 requests a day to Docear’s digital library. But we are confident to raise this limit very soon. In addition, the function is only available for registered users.

Download Docear Beta 7

The change-log in detail:

New features include:

#621 PDF Metadata Extraction

#627 Action to automatically export Windows registry and send to Docear for bug fixing

Feature enhancements include:

#661 Information added to dialog how to resolve duplicated entries in BibTeX files created by Mendeley

#651 use one method to open all library maps

#625 Double check PDF-XCV compatibility settings

#656 Change default preferences of JabRef

#612 Updated to the latest source code of Freeplane

#565/#587 Recommendations improved

Bug fixes include:

#613 Installation path in about dialog was wrong

#674 Typo in welcome map

#650 Recommended documents stored on a ftp server could not be downloaded

#637 SciPlore MindMapping files were converted withput permission when opened in the background

#657 Null Pointer Exception occured after starting Docear with completely new settings

#668 Hyperlinks to open folders did not always work

Other changes:

#667 Smart pdf viewer selection for Skim and Preview on MacOS removed

Docear

2021 Update for Docear’s “Google Scholar Parser” Library to Fetch Metadata for PDF files

Our reference management software Docear has not been actively developed for a few years, and recently, the add-on to fetch metadata from Google Scholar stopped working. Fortunately, one of our users (Li Yang) adjusted Docear’s Read more…

GIANT 2019, Reference Parsing, Deep Citation Parsing, Dataset, Cover

Information Extraction

GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing [pre-print]

This is the pre-print of: Mark Grennan, Martin Schibel, Andrew Collins, and Joeran Beel. “GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing.” In 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Read more…

The results of the comparison of 10 open-source bibliographic reference parsers

Information Extraction

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers

Our paper “Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Bibliographic Read more…

1 Comment

Link Roundup #3 | Personal Knowledge Management for Academia & Librarians · 20th December 2012 at 16:00

[…] the academic research management suite, has added automatic data extraction from PDFs. This is a big addition for their beta version. Docear is loosely descended from […]

Docear Beta 7 with PDF Metadata Extraction

Published by Joeran Beel on 21st November 2012

Joeran Beel

1 Comment

Link Roundup #3 | Personal Knowledge Management for Academia & Librarians · 20th December 2012 at 16:00

Leave a Reply Cancel reply

2021 Update for Docear’s “Google Scholar Parser” Library to Fetch Metadata for PDF files

GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing [pre-print]

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers

Docear Beta 7 with PDF Metadata Extraction

Published by Joeran Beel on 21st November 2012

Joeran Beel

1 Comment

Link Roundup #3 | Personal Knowledge Management for Academia & Librarians · 20th December 2012 at 16:00

Leave a Reply Cancel reply

Related Posts

2021 Update for Docear’s “Google Scholar Parser” Library to Fetch Metadata for PDF files

GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing [pre-print]

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers