Research – ISG Siegen

Tobias Vente Presents Our Work on AutoML’s Potential for Recommender Systems at ACM UMAP

We are excited to share that our PhD student, Tobias Vente, presented our research paper, “The Potential of AutoML for Recommender Systems,” at the ACM UMAP 2025 conference held at the Verizon Executive Education Center at Cornell Tech on Roosevelt Island, New York City. Tobias presented our work in the Hyper Workshop (Hybrid AI for Human-Centric Personalization), which Read more…

By t_vente, 2 weeks3rd July 2025 ago

Publications

Evaluating Sakana’s AI Scientist for Autonomous Research: Wishful Thinking or an Emerging Reality Towards ‘Artificial Research Intelligence’ (ARI)?

Abstract. Recently, Sakana.ai introduced the AI Scientist, which claims to automate the entire research lifecycle and conduct research autonomously. This is a concept we call Artificial Research Intelligence (ARI). Achieving ARI would be a major milestone toward Artificial General Intelligence (AGI) and a prerequisite to achieving Super Intelligence. The AI Read more…

By Joeran Beel, 5 months21st February 2025 ago

Publications

Checky, the Paper-Submission Checklist Generator for Authors, Reviewers and LLMs

Following our proposal for evidence-based best-practices for recommender systems evaluation and our Dagstuhl manuscript about Best-Practices for Offline Evaluations of Recommender Systems, we are glad to announce Checky, a tool for conference chairs and journal editors to create and manage submission checklists. Abstract. Submission checklists have become increasingly prevalent for Read more…

By Joeran Beel, 6 months28th January 2025 ago

Publications

Green Recommender Systems: Down-Sampling Datasets for Energy-Efficient Algorithm Performance

Abstract As recommender systems become increasingly prevalent, the environmental impact and energy efficiency of training these large-scale models have come under scrutiny. This paper investigates the potential for energy-efficient algorithm performance by optimizing dataset sizes through downsampling techniques. We conducted experiments on the MovieLens 100K, 1M, 10M and Amazon Toys Read more…

By Intelligent Systems Group, 9 months12th October 2024 ago

Publications

ISG will present 8 papers and posters at the ACM Recommender Systems Conference and Workshops

We are thrilled that 8 of our 11 submissions to the 18th ACM Recommender-Systems Conference and Workshops (RobustRecSys and RecSoGood) were accepted for publication. Our research was conducted jointly with partners from the University of Gothenburg (Alan Said), the University of Antwerpen (Lien Michiels), and some excellent Bachelor and Master Read more…

By Joeran Beel, 9 months9th October 2024 ago

e fold and not k fold (Green Recommender Systems)

Publications

From Theory to Practice: Implementing and Evaluating e-Fold Cross-Validation

Accepted for publication at the International Conference on Artificial Intelligence and Machine Learning Research (CAIMLR). The PDF is available here. Feel free to also read the original proposal that led to the current publication. Abstract In this paper, we present e-fold cross-validation, an energy-efficient alternative to k-fold, which dynamically adjusts Read more…

By Intelligent Systems Group, 10 months16th September 2024 ago

Publications

Informed Dataset Selection with ‘Algorithm Performance Spaces’

Accepted for publication as late-breaking-result (LBR) at the 18th ACM Conference on Recommender Systems. Read the article as PDF here. Abstract When designing recommender-systems experiments, a key question that has been largely overlooked is the choice of datasets. In a brief survey of ACM RecSys papers, we found that authors Read more…

By Intelligent Systems Group, 11 months1st September 2024 ago

Publications

From Clicks to Carbon: The Ecological Costs of Recommender Systems (Pre-Print)

Full pre-print as PDF: https://arxiv.org/abs/2408.08203 Abstract As global warming soars, the need to assess the environmental impact of research is becoming increasingly urgent. Despite this, few recommender systems research papers address their environmental impact. In this study, we estimate the ecological impact of recommender systems research by reproducing typical experimental Read more…

By Intelligent Systems Group, 11 months22nd August 2024 ago

Conferences

Our Publication at the ICDM 2022: Estimating the Pruned Search Space Size of Subgroup Discovery

At the end of last year, we had the pleasure of presenting our publication of Lennart Purucker‘s Master thesis at the 22nd IEEE International Conference on Data Mining (ICDM) in Florida, USA! Lennart worked on Meta-Learning for Subgroup Discovery during his Master’s thesis at the RWTH Aachen University. As part Read more…

By Intelligent Systems Group, 2 years7th February 2023 ago

Publications

Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and CORA [pre-print]

ABSTRACT Citation parsing, particularly with deep neural networks, suffers from a lack of training data as available datasets typically contain only a few thousand training instances. Manually labelling citation strings is very time-consuming, hence synthetically created training data could be a solution. However, as of now, it is unknown if Read more…

By Joeran Beel, 5 years8th September 2020 ago

GIANT 2019, Reference Parsing, Deep Citation Parsing, Dataset, Cover

Information Extraction

GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing [pre-print]

This is the pre-print of: Mark Grennan, Martin Schibel, Andrew Collins, and Joeran Beel. “GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing.” In 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, 101–112, 2019. final Publication: http://aics2019.datascienceinstitute.ie/papers/aics_25.pdf Abstract. Extracting and parsing reference strings from research articles Read more…

By Joeran Beel, 6 years ago

Keyphrase counts and their effect on clickthrough rates (CTR)

Mr. DLib

Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems

Our paper “Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems” was accepted for publication at the ACM/IEEE Joint Conference on Digital Libraries. 1 Introduction Many recommendation algorithms are available to operators of recommender systems in digital libraries. The effectiveness of algorithms in real-world systems is Read more…

By andrew_collins, 6 years ago

Research

Participate in the “Track Your Daily Routine” Research Study About Your Smartphone Usage and Your Personality

A colleague of mine has initiated a new research project to analyse smartphone users’ usage behaviour and personality. He and his team have released an Android app named TYDR: Track Your Daily Routine. With the data from TYDR, they want to research if they can estimate the personality of a smartphone user Read more…

By Joeran Beel, 7 years ago

Machine Learning

An Empirical Comparison of Syllabuses for Curriculum Learning (Pre-Print)

Update 12/11/2018: Our paper has been accepted at AICS 2018 and will be presented at the conference in December. We have published a pre-print (now available on Arxiv) which outlines our work comparing different syllabuses for curriculum learning. Neural networks are typically trained by repeatedly randomly selecting examples from a Read more…

By mcollier, 7 years ago

Machine Learning

3rd Call for EU Marie Curie EDGE Fellowships (e.g. in Machine-Learning or Recommender-Systems Research)

The 3rd call for EU Marie Curie EDGE fellowships has been published. We successfully supported the application of one EDGE fellow a year ago, and we would be happy to support talented candidates this year, too. So, if you are interested in an EDGE fellowship in the field of machine learning, recommender systems or Read more…

By Joeran Beel, 7 years ago

Click-through rate (CTR) and # of delivered recommendation in JabRef for Mr. DLib’s (MDL) and CORE’s recommendation engine and in total

Recommender Systems

Mr. DLib’s Living Lab for Scholarly Recommendations (preprint)

We published a manuscript on arXiv about the first living lab for scholarly recommender systems. This lab allows recommender-system researchers to conduct online evaluations of their novel algorithms for scholarly recommendations, i.e., research papers, citations, conferences, research grants etc. Recommendations are delivered through the living lab´s API in platforms such Read more…

By Joeran Beel, 7 years ago

Recommender Systems

RARD II: The 2nd Related-Article Recommendation Dataset (preprint)

We released a new version of RARD, i.e. RARD II and describe the new release in a preprint published on arXiv. The dataset is available at http://data.mr-dlib.org and the new manuscript is available on arXiv and here in our Blog. The main contribution of this paper is to introduce and describe Read more…

By Joeran Beel, 7 years ago

The results of the comparison of 10 open-source bibliographic reference parsers

Information Extraction

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers

Our paper “Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Bibliographic reference parsing refers to extracting machine-readable metadata, such as the names of the authors, the Read more…

By dominika_tkaczyk, 7 years ago

The workflow of author contributions extraction

Information Extraction

Who Did What? Identifying Author Contributions in Biomedical Publications using Naïve Bayes

Our paper “Who Did What? Identifying Author Contributions in Biomedical Publications using Naïve Bayes” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Creating scientific publications is a complex process. It is composed of a number of different activities, such as designing the experiments, Read more…

By dominika_tkaczyk, 7 years ago

Docear

Update for Docear’s “Google Scholar Parser” Library to Fetch Metadata for PDF files

Update 2018-07-31: We updated the Dropbox Link Google Scholar recently changed its layout, and as a consequence, Docear couldn’t fetch metadata anymore from Google Scholar for PDF files. Fortunately, one of our users (“Silberzwiebel”) adjusted Docear’s Google Scholar Parser, and now everything works as usual. However, we have not yet integrated Read more…

By Joeran Beel, 8 years5th October 2017 ago

RARD I: The Related-Article Recommender-System Dataset

Mr. DLib

RARD: The Related-Article Recommendation Dataset

We are proud to announce the release of ‘RARD’, the related-article recommendation dataset from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains information about 57.4 million recommendations that were displayed to the users of Sowiport. Information includes details on which recommendation approaches were used (e.g. content-based Read more…

By Joeran Beel, 8 years ago

Mr. DLib

Several new publications: Mr. DLib, Lessons Learned, Choice Overload, Bibliometrics (Mendeley Readership Statistics), Apache Lucene, CC-IDF, TF-IDuF

In the past few weeks, we published (or received acceptance notices for) a number of papers related to Mr. DLib, research-paper recommender systems, and recommendations-as-a-service. Many of them were written during our time at the NII or in collaboration with the NII. Here is the list of publications: Beel, Joeran, Bela Gipp, Read more…

By Joeran Beel, 8 years ago

Machine Learning

Some numbers about Mr. DLib’s Recommendations-as-a-Service (RaaS)

Six months ago, we launched Mr. DLib’s recommendations-as-a-service for Academia. Time, to look back and provide some numbers: Since September 2016, Mr. DLib´s recommender system has delivered 60,836,800 recommendations to our partner Sowiport, and Sowiport’s users have clicked 91,545 of the recommendations. This equals on overall click-through rate (CTR) of Read more…

By Joeran Beel, 8 years ago

Publications

Paper accepted at ISI conference in Berlin: “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport”

Our paper titled “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport” is accepted for publication at the 15th International Symposium on Information Science (ISI) in Berlin. Abstract: Stereotype and most-popular recommendations are widely neglected in the research-paper recommender-system and digital-library community. In other domains such as movie recommendations and hotel Read more…

By Joeran Beel, 8 years ago

Machine Learning

Two of our papers about citation and term-weighting schemes got accepted at iConference 2017

Two of our papers about weighting citations and terms in the context of user modeling and recommender systems got accepted at the iConference 2017. Here are the abstracts, and links to the pre-print versions: Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse Document Frequency’ (IDF) be applied to references? In Read more…

By Joeran Beel, 9 years ago