Our Ph.D. students, Tobias Vente, and Lukas Wegmeth, will attend the ACM RecSys 2023 conference in Singapore to present their latest work on Automated Recommender Systems (AutoRecSys) with four papers: two in the Doctoral Symposium, one in the PERSPECTIVES 2023 Workshop, and one in the Demo track.
We already posted a teaser on the two papers that Tobias and Lukas will present in the Doctoral Symposium. We are delighted to announce that two more of their papers were accepted. Tobias and Lukas worked tirelessly to ensure that their papers met the high standards of the recommender systems community. Their work discusses and solves significant challenges in AutoRecSys and recommender systems evaluation in general.
Their Doctoral Symposium papers are extended abstracts and summarize the goals of their Ph.D. work. Both papers are all about the challenges of AutoRecSys. Lukas focuses on meta-learning model selection in his work “Improving Recommender Systems Through the Automation of Design Decisions”. Tobias focuses on model selection through efficient hyperparameter optimization and ensembling in his work “Advancing Automation of Design Decisions in Recommender System Pipelines”. They will each give a brief presentation of their work during the Doctoral Symposium on Monday, September 18, and extended presentations with posters during the main conference on Thursday, September 21. They will be published in the Conference Proceedings after the live event. You can read the abstracts of both papers below.
Improving Recommender Systems Through the Automation of Design DecisionsRecommender systems developers are constantly faced with difficult design decisions. Additionally, the number of options that a recommender systems developer has to consider continually grows over time with new innovations. The machine learning community is in a similar situation and has come together to tackle the problem. They invented concepts and tools to make machine learning development both easier and faster. These developments are categorized as automated machine learning (AutoML). As a result, the AutoML community formed and continuously innovates new approaches. Inspired by AutoML, the recommender systems community has recently understood the need for automation and sparsely introduced AutoRecSys. The goal of AutoRecSys is not to replace recommender systems developers but to improve performance through the automation of design decisions. With AutoRecSys, recommender systems engineers do not have to focus on easy but time-consuming tasks and are free to pursue difficult engineering tasks instead. Additionally, AutoRecSys enables easier access to recommender systems for beginners as it reduces the amount of knowledge required to get started with the development of recommender systems. AutoRecSys, like AutoML, is still early in its development and does not yet cover the whole development pipeline. Additionally, it is not yet clear, under which circumstances AutoML approaches can be transferred to recommender systems. Our research intends to close this gap by improving AutoRecSys both with regard to the transfer of AutoML and novel approaches. Furthermore, we focus specifically on the development of novel automation approaches for data processing and training. We note that the realization of AutoRecSys is going to be a community effort. Our part in this effort is to research AutoRecSys fundamentals, build practical tools for the community, raise awareness of the advantages of automation, and catalyze AutoRecSys development.
Advancing Automation of Design Decisions in Recommender System PipelinesRecommender systems have become essential in domains like streaming services, social media platforms, and e-commerce websites. However, the development of a recommender system involves a complex pipeline with preprocessing, data splitting, algorithm and model selection, and postprocessing stages. Every stage of the recommender systems pipeline requires design decisions that influence the performance of the recommender system. To ease design decisions, automated machine learning (AutoML) techniques have been adapted to the field of recommender systems, resulting in various AutoRecSys libraries. Nevertheless, these libraries limit flexibility in integrating automation techniques. In response, our research aims to enhance the usability of AutoML techniques for design decisions in recommender system pipelines. We focus on developing flexible and library-independent automation techniques for algorithm selection, model selection, and postprocessing steps. By enabling developers to make informed choices and ease the recommender system development process, we decrease the developer’s effort while improving the performance of the recommender systems. Moreover, we want to analyze the cost-to-benefit ratio of automation techniques in recommender systems, evaluating the computational overhead and the resulting improvements in predictive performance. Our objective is to leverage AutoML concepts to automate design decisions in recommender system pipelines, reduce manual effort, and enhance the overall performance and usability of recommender systems.
Furthermore, Lukas will present the workshop paper, “The Effect of Random Seeds for Data Splitting on Recommendation Accuracy”, co-authored with his colleagues at ISG Siegen, in the PERSPECTIVES 2023 Workshop. You can already watch a video teaser and read the full paper. Lukas will discuss the paper during the workshop on Tuesday, September 19. You can read the abstract below.
The Effect of Random Seeds for Data Splitting on Recommendation AccuracyThe evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values 𝑘, and compare the results. In the extreme case with 𝑘 = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for 𝑘 = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of 𝑘 for both holdout and cross-validation.
Tobias will present the demo paper, “Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit” on Friday, September 22, in Poster Session 3. The paper is a joint work with Michael D. Ekstrand and introduces LensKit-Auto, a fully automated recommender system toolkit based on the LensKit library. The paper was accepted for the Demo track and will be published in the Conference Proceedings after the live event. You can read the abstract below.
Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) ToolkitLensKit is one of the first and most popular Recommender System libraries. While LensKit offers a wide variety of features, it does not include any optimization strategies or guidelines on how to select and tune LensKit algorithms. LensKit developers have to manually include third-party libraries into their experimental setup or implement optimization strategies by hand to optimize hyperpa- rameters. We found that 63.6% (21 out of 33) of papers using LensKit algorithms for their experiments did not select algorithms or tune hyperparameters. Non-optimized models represent poor baselines and produce less meaningful research results. This demo introduces LensKit-Auto. LensKit-Auto automates the entire Recommender System pipeline and enables LensKit developers to automatically select, optimize, and ensemble LensKit algorithms.
We are excited to discuss our work with the recommender systems community at the venue. Feel free to visit us at one of our posters to chat with us and discuss AutoRecSys and the evaluation of recommender systems.