Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and CORA [pre-print]

ABSTRACT Citation parsing, particularly with deep neural networks, suffers from a lack of training data as available datasets typically contain only a few thousand training instances. Manually labelling citation strings is very time-consuming, hence synthetically created training data could be a solution. However, as of now, it is unknown if Read more…

ParsRec: Meta-Learning Recommendations for Bibliographic Reference Parsing (Pre-Print)

We are delighted to announce that our poster “ParsRec: Meta-Learning Recommendations for Bibliographic Reference Parsing” has been accepted at the 12th ACM Recommender Systems Conference (RecSys) for presentation in Vancouver, Canada. The pre-print is available on arXiv, and here in our blog: Abstract Bibliographic reference parsers extract metadata (e.g. author names, Read more…