Recently, we were delighted to learn that our paper “Revealing the Hidden Impact of Top-N Metrics on Optimization in Recommender Systems” was accepted for the “Full Paper” track at the ECIR 2024. The paper, co-authored by members of our chair, the Intelligent Systems Group, will be presented by one of its authors at the ECIR 2024 in Glasgow between March 25th and March 27th.

The 13-page long paper is already available as a pre-print on arXiv. Its journey started in early 2023 when we decided that one of our discoveries in related experiments showed an interesting effect that would pose an unanswered research question. We ask this research question: “Does the selection of items other than the top-n during the evaluation of recommender systems yield improved predictive accuracy for specific algorithms, domains, or data sets?” and answer it with a research study in our paper. You may read the abstract below.

The hyperparameters of recommender systems for top-n predictions are typically optimized to enhance the predictive performance of algorithms. Thereby, the optimization algorithm, e.g., grid search or random search, searches for the best hyperparameter configuration according to an optimization-target metric, like nDCG or Precision. In contrast, the optimized algorithm, internally optimizes a different loss function during training, like squared error or cross-entropy. To tackle this discrepancy, recent work focused on generating loss functions better suited for recommender systems. Yet, when evaluating an algorithm using a top-n metric during optimization, another discrepancy between the optimization-target metric and the training loss has so far been ignored. During optimization, the top-n items are selected for computing a top-n metric; ignoring that the top-n items are selected from the recommendations of a model trained with an entirely different loss function. Item recommendations suitable for optimization-target metrics could be outside the top-n recommended items; hiddenly impacting the optimization performance. Therefore, we were motivated to analyze whether the top-n items are optimal for optimization-target top-n metrics. In pursuit of an answer, we exhaustively evaluate the predictive performance of 250 selection strategies besides selecting the top-n. We extensively evaluate each selection strategy over twelve implicit feedback and eight explicit feedback data sets with eleven recommender systems algorithms. Our results show that there exist selection strategies other than top-n that increase predictive performance for various algorithms and recommendation domains. However, the performance of the top ~43% of selection strategies is not significantly different. We discuss the impact of our findings on optimization and re-ranking in recommender systems and feasible solutions.

Revealing the Hidden Impact of Top-N Metrics on Optimization in Recommender Systems (Lukas Wegmeth, Tobias Vente, Lennart Purucker)

We initially submitted the paper to the “Short Paper” track of the ACM RecSys 2023, where it received decent acclaim and rigorous feedback from multiple reviewers. Though rejected, we used the reviewer’s excellent feedback to extend the paper for submission to the ECIR 2024. Again, we received rigorous feedback that we eagerly incorporated due to the ECIR 2024 generously allowing an additional page after acceptance.

We are eager to learn what the broader recommender systems community thinks about our results. Especially considering the paper provides evidence that researchers and engineers could use top-n recommendation metrics (nDCG, Precision, etc.) for optimization without worrying about the confounding effects of such metrics.

We also commend the excellent ECIR 2024 reviewers and chairs for accepting the paper, though it presents a negative result. We believe this is a decisive step toward improving future research and signifies the importance of peer-reviewing and publishing such results.


Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *