• University of Siegen
  • –>
  • Dept. 4
  • –>
  • ETI
  • –>
  • Intelligent Systems Group (ISG)

ISG Siegen

  • Home
  • People
    • Joeran BEEL (Head of Group)
    • André KLAHOLD (Hon. Prof.)
    • Dagmar RAUTMANN (Secretary)
    • Matthias SCHNEIDER (Technical Staff)
    • Tobias VENTE (Scientific Staff)
    • Lukas WEGMETH (Scientific Staff)
    • Alumni
      • Lada LÜBKE (Secretary)
    • YOU !?!
  • Publications
  • Research
  • Projects
  • Industry
  • Students
    • Guidelines
      • Communicating Effectively
      • Academic Writing
      • Living in Dublin / Working at TCD
      • Living in Tokyo / Working at the NII
      • Guidelines: Software Tools for Academic Writing
    • Business Start-Up (‘Unternehmensgründung’)
    • Theses: ‘Studienarbeiten’, Bachelor, Master
      • What You Can Expect From Us
      • What We Expect From You
      • Grading Criteria
      • Project Ideas
      • Your Own Project Idea
      • Industry Thesis
      • Previous Students’ Successes
      • Apply for a Project
    • Work, Visit, Go Abroad
      • Jobs (PhD, PostDoc, …)
      • Student Jobs (in Siegen)
      • Internships & Research Visits (in Siegen)
      • Internships & Research Visits (Abroad)
    • Teaching
      • Teaching FAQ
      • #### Winter Term ####
      • Machine Learning Introduction [Winter Lecture & Lab]
      • Recommender Systems [Lecture & Lab]
      • Intelligent Systems Seminar: RecSys / ML / AI / Drones / eMTB
      • Automated Machine Learning (AutoML) [Project Group]
      • Scientific Writing [Block Lecture]
      • Cutting Edge Research [Lecture & Project]
      • #### Summer Term ####
      • Machine Learning Introduction [Summer Lecture & Lab]
      • Complex & Intelligent Software Systems Introduction [Lecture & Lab]
      • Intelligent Systems Seminar: RecSys / ML / AI / Drones / eMTB
      • Machine Learning Praktikum
      • Recent Advances in Machine Learning [Lecture & Lab]
      • Automated Machine Learning (AutoML) [Project Group]
      • Past Modules
        • Machine Learning (Trinity College Dublin)
        • e-Business II, Trinity College Dublin
        • Information Retrieval, Trinity College Dublin
        • Mobile Technologies, WADI University
        • Data Management, University of Konstanz
  • Jobs
  • Contact
  • Blog

pdf metadata extraction

Machine Learning

ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers

Our manuscript “ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers” was accepted for publication at the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS). It is an extended version of our recently presented poster “ParsRec: Meta-Learning Recommendations for Bibliographic Reference Parsing” at the ACM RecSys conference. The Read more…

By Joeran Beel, 7 years ago
Google Scholar
Docear

Update for Docear’s “Google Scholar Parser” Library to Fetch Metadata for PDF files

Update 2018-07-31: We updated the Dropbox Link Google Scholar recently changed its layout, and as a consequence, Docear couldn’t fetch metadata anymore from Google Scholar for PDF files. Fortunately, one of our users (“Silberzwiebel”) adjusted Docear’s Google Scholar Parser, and now everything works as usual. However, we have not yet integrated Read more…

By Joeran Beel, 8 years5th October 2017 ago
Information Extraction

Docear 1.1 stable released with strongly improved PDF metadata extraction

Finally, after releasing the alpha and beta, today we release Docear 1.1 stable. If you have tried already one of the previous versions, there is not much news. Otherwise, read on.

Thanks to all the generous donors, our student Christoph could work on an improved PDF metadata retrieval for Docear. The new Docear 1.1 is able to extract the title of a PDF and fetch metadata from Google Scholar for that title. To do so, select a PDF in your mind-map and chose “Create or Update reference”, …

… and the following new dialog appears. The dialog shows the file name of your PDF file, and the extracted title. In the background, the extracted title is sent to Google Scholar and metadata for the first two search results are shown in the dialog.  If the title was extracted incorrectly, you can manually correct it. You may also chose to use the PDF’s file name for the search. For instance, when you named your PDF already according to the title, select the radio button with the file name, and the file name is sent as search query to Google Scholar (you may also manually correct the file name before it’s sent to Google Scholar). Of course, all other options you already know are still available, such as creating a blank entry, or importing the XMP data of PDFs. Btw. Docear remembers your choice, i.e. when you select to create a blank entry, the option will be pre-selected when open that dialog the next time. It might happen, that your IP will be blocked by Google Scholar when you use the service too frequently. If this happens, a captcha should appear, and after solving it, you should be able to proceed. We did not yet test this thoroughly. Please let us know your experiences.

The precision of our metadata tool depends on two factors, A) the precision of the title extraction and B) the coverage of Google Scholar. According to a recent experiment, title extraction of our tool is around 70%. However, the final result very much depends on the format of your research articles. In my research field (i.e. recommender systems), I would say that our tool extracts the title correctly for about 90% of the articles in my personal library. In addition, almost all articles that are relevant for my research are indexed by Google Scholar (i would estimate, more than 90%). This means, for around 80% of my PDFs the correct metadata is retrieved fully automatically. Given that I provide the title manually, for even more than 90% the metadata may be retrieved. Please let us know your experience (and your research field). (more…)

By Joeran Beel, 11 years ago
Docear

Docear 1.1 Beta Released: New PDF Metadata Extraction, Better Zotero and Mendeley BibTeX support, and Bug Fixes

If you have tested the Preview of Docear 1.1 you may already know about some of Docear’s new features. With your feedback and the mind maps, log files and BibTeX files you shared with us, these features have matured. We are proud to introduce the first (and hopefully only) Beta release of Docear 1.1.

The new key features of Docear 1.1

Improved metadata retrieval

Thanks to your donations, our student Christoph greatly enhanced Docear’s PDF metadata retrieval. For us, it works really great, and with Docear 1.1 Beta the last bugs have been fixed. Btw. if you like what Christoph did, and if you are using LibreOffice, or OpenOffice, please also read our call for donation to develop an add-on for these two text processing tools.
Image

Improved support for Zotero / Mendeley BibTeX files

(more…)

By Joeran Beel, 11 years ago
Docear

Preview of Docear 1.1 with PDF Metadata Retrieval from Google Scholar

Thanks to all the generous donors, our student Christoph could work on an improved PDF metadata retrieval for Docear, and today it’s time to present the first preview. The new Docear 1.1 (preview) is able to extract the title of a PDF and fetch appropriate metadata from Google Scholar. Whenever you select a PDF in your mind-map and chose “Create or Update reference”, the following new dialog appears.

The dialog shows the file name of your PDF file, and the extracted title. In the background, the extracted title is sent to Google Scholar and metadata for the first three search results are shown in the dialog.  If the title was extracted incorrectly, you can manually correct it. You may also chose to use the PDF’s file name for the search. For instance, when you named your PDF already according to the title, select the radio button with the file name, and the file name is sent as search query to Google Scholar (you may also manually correct the file name before it’s sent to Google Scholar). Of course, all other options you already know are still available, such as creating a blank entry, or importing the XMP data of PDFs. Btw. Docear remembers your choice, i.e. when you select to create a blank entry, the option will be pre-selected when open that dialog the next time. It might happen, that your IP will be blocked by Google Scholar when you use the service too frequently. If this happens, a captcha should appear, and after solving it, you should be able to proceed. We did not yet test this thoroughly. Please let us know your experiences.

The precision of our metadata tool depends on two factors, A) the precision of the title extraction and B) the coverage of Google Scholar. According to a recent experiment, title extraction of our tool is around 70%. However, the final result very much depends on the format of your research articles. In my research field (i.e. recommender systems), I would say that our tool extracts the title correctly for about 90% of the articles in my personal library. In addition, almost all articles that are relevant for my research are indexed by Google Scholar (i would estimate, more than 90%). This means, for around 80% of my PDFs the correct metadata is retrieved fully automatically. Given that I provide the title manually, for even more than 90% the metadata may be retrieved. Please let us know your experience (and your research field). (more…)

By Joeran Beel, 11 years ago
Docear

Call for donation was successful: 1800 Euros donated to improve Docear’s PDF metadata retrieval function

  One month ago, we started a call for donation and asked our users for money so we could pay our student Christoph to improve Docear’s PDF metadata retrieval. We asked for 1800 Euros (~2500 US$) and today we achieved our goal. We would like to thank all donors who Read more…

By Joeran Beel, 11 years ago
Call for donation

Call for Donation: (Automatic) PDF Metadata Extraction and Renaming


Done! We’ve got all the money we need, thank you very much!!!!!!!! Read on here…


 
One of Docear’s biggest disadvantages, compared to other reference managers, is the rather poor PDF metadata extraction capability. As such, it is no surprise that the second most popular feature request is to add decent PDF metadata extraction  and file renaming to Docear. However, adding such a function is a lot of work and we currently do not really have the manpower for this. Fortunately, one of our best students – i.e. Christoph, who already did a lot of work for us – wants a paid job for his semester breaks. If we could pay him 1,800 Euros, he would love to implement the PDF metadata extraction method in his semester breaks, and we have no doubts that he is capable of doing it. The problem is, we don’t have the funds to pay him.

Therefore, we would like to start a call for donation: If you want decent PDF metadata extraction in Docear, please donate, before February 28, 2014. We need 1,800 Euros to pay Christoph for four weeks, almost full-time, starting the end of February.

 
AUDCADEURGBPJPYUSDNZDCHFHKDSGDSEKDKKPLNNOKHUFCZKILSMXNBRLMYRPHPTWDTHBTRYRUB

 
(more…)

By Joeran Beel, 11 years ago
jb_scss
jb_scss
  • Google Scholar
  • Linkedin
  • Twitter
  • ResearchGate
  • Mendeley
  • XING
  • Impressum
Hestia | Developed by ThemeIsle
Manage Cookie Consent
We use cookies to optimise our website and our service.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}