On this page, you find our publications sorted by research area. Some publications are listed in multiple sections. To see a list of publications chronologically, please visit our publication page.
Recommender Systems
AutoRecSys (Automated Recommender Systems)
Vente, Tobias
Advancing Automation of Design Decisions in Recommender System Pipelines Proceedings Article
In: Proceedings of the 17th ACM Conference on Recommender Systems, pp. 1355-1360, 2023.
@inproceedings{Vente2023,
title = {Advancing Automation of Design Decisions in Recommender System Pipelines},
author = {Tobias Vente},
doi = {https://dl.acm.org/doi/10.1145/3604915.3608886},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 17th ACM Conference on Recommender Systems},
pages = {1355-1360},
abstract = {Recommender systems have become essential in domains like streaming services, social media platforms, and e-commerce websites. However, the development of a recommender system involves a complex pipeline with preprocessing, data splitting, algorithm and model selection, and postprocessing stages. Every stage of the recommender systems pipeline requires design decisions that influence the performance of the recommender system. To ease design decisions, automated machine learning (AutoML) techniques have been adapted to the field of recommender systems, resulting in various AutoRecSys libraries. Nevertheless, these libraries limit flexibility in integrating automation techniques. In response, our research aims to enhance the usability of AutoML techniques for design decisions in recommender system pipelines. We focus on developing flexible and library-independent automation techniques for algorithm selection, model selection, and postprocessing steps. By enabling developers to make informed choices and ease the recommender system development process, we decrease the developer’s effort while improving the performance of the recommender systems. Moreover, we want to analyze the cost-to-benefit ratio of automation techniques in recommender systems, evaluating the computational overhead and the resulting improvements in predictive performance. Our objective is to leverage AutoML concepts to automate design decisions in recommender system pipelines, reduce manual effort, and enhance the overall performance and usability of recommender systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Recommender systems have become essential in domains like streaming services, social media platforms, and e-commerce websites. However, the development of a recommender system involves a complex pipeline with preprocessing, data splitting, algorithm and model selection, and postprocessing stages. Every stage of the recommender systems pipeline requires design decisions that influence the performance of the recommender system. To ease design decisions, automated machine learning (AutoML) techniques have been adapted to the field of recommender systems, resulting in various AutoRecSys libraries. Nevertheless, these libraries limit flexibility in integrating automation techniques. In response, our research aims to enhance the usability of AutoML techniques for design decisions in recommender system pipelines. We focus on developing flexible and library-independent automation techniques for algorithm selection, model selection, and postprocessing steps. By enabling developers to make informed choices and ease the recommender system development process, we decrease the developer’s effort while improving the performance of the recommender systems. Moreover, we want to analyze the cost-to-benefit ratio of automation techniques in recommender systems, evaluating the computational overhead and the resulting improvements in predictive performance. Our objective is to leverage AutoML concepts to automate design decisions in recommender system pipelines, reduce manual effort, and enhance the overall performance and usability of recommender systems.
Wegmeth, Lukas
Improving Recommender Systems Through the Automation of Design Decisions Proceedings Article
In: Proceedings of the 17th ACM Conference on Recommender Systems, pp. 1332-1338, 2023.
@inproceedings{Wegmeth2023a,
title = {Improving Recommender Systems Through the Automation of Design Decisions},
author = {Lukas Wegmeth},
url = {https://dl.acm.org/doi/pdf/10.1145/3604915.3608877},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 17th ACM Conference on Recommender Systems},
pages = {1332-1338},
abstract = {Recommender systems developers are constantly faced with difficult design decisions. Additionally, the number of options that a recommender systems developer has to consider continually grows over time with new innovations. The machine learning community is in a similar situation and has come together to tackle the problem. They invented concepts and tools to make machine learning development both easier and faster. These developments are categorized as automated machine learning (AutoML). As a result, the AutoML community formed and continuously innovates new approaches. Inspired by AutoML, the recommender systems community has recently understood the need for automation and sparsely introduced AutoRecSys. The goal of AutoRecSys is not to replace recommender systems developers but to improve performance through the automation of design decisions. With AutoRecSys, recommender systems engineers do not have to focus on easy but time-consuming tasks and are free to pursue difficult engineering tasks instead. Additionally, AutoRecSys enables easier access to recommender systems for beginners as it reduces the amount of knowledge required to get started with the development of recommender systems. AutoRecSys, like AutoML, is still early in its development and does not yet cover the whole development pipeline. Additionally, it is not yet clear, under which circumstances AutoML approaches can be transferred to recommender systems. Our research intends to close this gap by improving AutoRecSys both with regard to the transfer of AutoML and novel approaches. Furthermore, we focus specifically on the development of novel automation approaches for data processing and training. We note that the realization of AutoRecSys is going to be a community effort. Our part in this effort is to research AutoRecSys fundamentals, build practical tools for the community, raise awareness of the advantages of automation, and catalyze AutoRecSys development.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Recommender systems developers are constantly faced with difficult design decisions. Additionally, the number of options that a recommender systems developer has to consider continually grows over time with new innovations. The machine learning community is in a similar situation and has come together to tackle the problem. They invented concepts and tools to make machine learning development both easier and faster. These developments are categorized as automated machine learning (AutoML). As a result, the AutoML community formed and continuously innovates new approaches. Inspired by AutoML, the recommender systems community has recently understood the need for automation and sparsely introduced AutoRecSys. The goal of AutoRecSys is not to replace recommender systems developers but to improve performance through the automation of design decisions. With AutoRecSys, recommender systems engineers do not have to focus on easy but time-consuming tasks and are free to pursue difficult engineering tasks instead. Additionally, AutoRecSys enables easier access to recommender systems for beginners as it reduces the amount of knowledge required to get started with the development of recommender systems. AutoRecSys, like AutoML, is still early in its development and does not yet cover the whole development pipeline. Additionally, it is not yet clear, under which circumstances AutoML approaches can be transferred to recommender systems. Our research intends to close this gap by improving AutoRecSys both with regard to the transfer of AutoML and novel approaches. Furthermore, we focus specifically on the development of novel automation approaches for data processing and training. We note that the realization of AutoRecSys is going to be a community effort. Our part in this effort is to research AutoRecSys fundamentals, build practical tools for the community, raise awareness of the advantages of automation, and catalyze AutoRecSys development.
Vente, Tobias; Ekstrand, Michael; Beel, Joeran
Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit Proceedings Article
In: Proceedings of the 17th ACM Conference on Recommender Systems, pp. 1212-1216, 2023.
@inproceedings{Vente2023a,
title = {Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit},
author = {Tobias Vente and Michael Ekstrand and Joeran Beel},
url = {https://dl.acm.org/doi/10.1145/3604915.3610656},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 17th ACM Conference on Recommender Systems},
pages = {1212-1216},
abstract = {LensKit is one of the first and most popular Recommender System libraries. While LensKit offers a wide variety of features, it does not include any optimization strategies or guidelines on how to select and tune LensKit algorithms. LensKit developers have to manually include third-party libraries into their experimental setup or implement optimization strategies by hand to optimize hyperparameters. We found that 63.6% (21 out of 33) of papers using LensKit algorithms for their experiments did not select algorithms or tune hyperparameters. Non-optimized models represent poor baselines and produce less meaningful research results. This demo introduces LensKit-Auto. LensKit-Auto automates the entire Recommender System pipeline and enables LensKit developers to automatically select, optimize, and ensemble LensKit algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
LensKit is one of the first and most popular Recommender System libraries. While LensKit offers a wide variety of features, it does not include any optimization strategies or guidelines on how to select and tune LensKit algorithms. LensKit developers have to manually include third-party libraries into their experimental setup or implement optimization strategies by hand to optimize hyperparameters. We found that 63.6% (21 out of 33) of papers using LensKit algorithms for their experiments did not select algorithms or tune hyperparameters. Non-optimized models represent poor baselines and produce less meaningful research results. This demo introduces LensKit-Auto. LensKit-Auto automates the entire Recommender System pipeline and enables LensKit developers to automatically select, optimize, and ensemble LensKit algorithms.
Wegmeth, Lukas; Vente, Tobias; Beel, Joeran
The Challenges of Algorithm Selection and Hyperparameter Optimization for Recommender Systems Journal Article
In: COSEAL Workshop 2023, 2023.
@article{Wegmeth2023b,
title = {The Challenges of Algorithm Selection and Hyperparameter Optimization for Recommender Systems},
author = {Lukas Wegmeth and Tobias Vente and Joeran Beel},
url = {http://dx.doi.org/10.13140/RG.2.2.24089.19049},
year = {2023},
date = {2023-01-01},
journal = {COSEAL Workshop 2023},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Wegmeth, Lukas; Beel, Joeran
CaMeLS: Cooperative Meta-Learning Service for Recommender Systems Proceedings Article
In: Proceedings of the 2nd Perspectives on the Evaluation of Recommender Systems Workshop, pp. 10–18, 2022.
@inproceedings{Wegmeth2022,
title = {CaMeLS: Cooperative Meta-Learning Service for Recommender Systems},
author = {Lukas Wegmeth and Joeran Beel},
url = {https://ceur-ws.org/Vol-3228/paper2.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the 2nd Perspectives on the Evaluation of Recommender Systems Workshop},
pages = {10–18},
abstract = {We present CaMeLS, a proof of concept of a cooperative meta-learning service for recommender systems. CaMeLS leverages the computing power of recommender systems users by uploading their metadata and algorithm evaluation scores to a centralized environment. Through the resulting database, CaMeLS then offers meta-learning services for everyone. Additionally, users may access evaluations of common data sets immediately to know the best-performing algorithms for those data sets. The metadata table may also be used for other purposes, eg, to perform benchmarks. In the initial version discussed in this paper, CaMeLS implements automatic algorithm selection through meta-learning over two recommender systems libraries. Automatic algorithm selection saves users time and computing power and does not require expertise, as the best algorithm is automatically found over multiple libraries. The CaMeLS database contains 20 metadata sets by default. We show that the automatic algorithm selection service is already on par with the single best algorithm in this default scenario. CaMeLS only requires a few seconds to predict a suitable algorithm, rather than potentially hours or days if performed manually, depending on the data set. The code is publicly available on our GitHub https://camels. recommender-systems.com.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We present CaMeLS, a proof of concept of a cooperative meta-learning service for recommender systems. CaMeLS leverages the computing power of recommender systems users by uploading their metadata and algorithm evaluation scores to a centralized environment. Through the resulting database, CaMeLS then offers meta-learning services for everyone. Additionally, users may access evaluations of common data sets immediately to know the best-performing algorithms for those data sets. The metadata table may also be used for other purposes, eg, to perform benchmarks. In the initial version discussed in this paper, CaMeLS implements automatic algorithm selection through meta-learning over two recommender systems libraries. Automatic algorithm selection saves users time and computing power and does not require expertise, as the best algorithm is automatically found over multiple libraries. The CaMeLS database contains 20 metadata sets by default. We show that the automatic algorithm selection service is already on par with the single best algorithm in this default scenario. CaMeLS only requires a few seconds to predict a suitable algorithm, rather than potentially hours or days if performed manually, depending on the data set. The code is publicly available on our GitHub https://camels. recommender-systems.com.
Wegmeth, Lukas; Beel, Joeran
Cooperative Meta-Learning Service for Recommender Systems Journal Article
In: COSEAL Workshop 2022, 2022.
@article{Wegmeth2022a,
title = {Cooperative Meta-Learning Service for Recommender Systems},
author = {Lukas Wegmeth and Joeran Beel},
url = {http://dx.doi.org/10.13140/RG.2.2.10667.41768},
year = {2022},
date = {2022-01-01},
journal = {COSEAL Workshop 2022},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Vente, Tobias; Purucker, Lennart; Beel, Joeran
The Feasibility of Greedy Ensemble Selection for Automated Recommender Systems Journal Article
In: COSEAL Workshop 2022, 2022.
@article{Vente2022,
title = {The Feasibility of Greedy Ensemble Selection for Automated Recommender Systems},
author = {Tobias Vente and Lennart Purucker and Joeran Beel},
url = {http://dx.doi.org/10.13140/RG.2.2.16277.29921},
year = {2022},
date = {2022-01-01},
journal = {COSEAL Workshop 2022},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Gupta, Srijan; Beel, Joeran
Auto-CaseRec: Automatically Selecting and Optimizing Recommendation-Systems Algorithms Journal Article
In: OSF Preprints DOI:10.31219/osf.io/4znmd,, 2020.
@article{Gupta2020,
title = {Auto-CaseRec: Automatically Selecting and Optimizing Recommendation-Systems Algorithms},
author = {Srijan Gupta and Joeran Beel},
doi = {10.31219/osf.io/4znmd},
year = {2020},
date = {2020-01-01},
journal = {OSF Preprints DOI:10.31219/osf.io/4znmd,},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Anand, Rohan; Beel, Joeran
Auto-Surprise: An Automated Recommender-System (AutoRecSys) Library with Tree of Parzens Estimator (TPE) Optimization Proceedings Article
In: 14th ACM Conference on Recommender Systems (RecSys), pp. 1–4, 2020.
@inproceedings{Anand2020,
title = {Auto-Surprise: An Automated Recommender-System (AutoRecSys) Library with Tree of Parzens Estimator (TPE) Optimization},
author = {Rohan Anand and Joeran Beel},
url = {https://arxiv.org/abs/2008.13532},
year = {2020},
date = {2020-01-01},
booktitle = {14th ACM Conference on Recommender Systems (RecSys)},
pages = {1–4},
abstract = {We introduce Auto-Surprise, an Automated Recommender System library. Auto-Surprise is an extension of the Surprise recommender system library and eases the algorithm selection and configuration process. Compared to out-of-the-box Surprise library, Auto-Surprise performs better when evaluated with MovieLens, Book Crossing and Jester Datasets. It may also result in the selection of an algorithm with significantly lower runtime. Compared to Surprise's grid search, Auto-Surprise performs equally well or slightly better in terms of RMSE, and is notably faster in finding the optimum hyperparameters.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We introduce Auto-Surprise, an Automated Recommender System library. Auto-Surprise is an extension of the Surprise recommender system library and eases the algorithm selection and configuration process. Compared to out-of-the-box Surprise library, Auto-Surprise performs better when evaluated with MovieLens, Book Crossing and Jester Datasets. It may also result in the selection of an algorithm with significantly lower runtime. Compared to Surprise's grid search, Auto-Surprise performs equally well or slightly better in terms of RMSE, and is notably faster in finding the optimum hyperparameters.
Arambakam, Mukesh; Beel, Joeran
Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries Proceedings Article
In: 7th ICML Workshop on Automated Machine Learning, pp. 1–8, 2020.
@inproceedings{Arambakam2020,
title = {Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries},
author = {Mukesh Arambakam and Joeran Beel},
url = {https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_39.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {7th ICML Workshop on Automated Machine Learning},
pages = {1–8},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collins, Andrew; Tierney, Laura; Beel, Joeran
Per-Instance Algorithm Selection for Recommender Systems via Instance Clustering Journal Article
In: arXiv, no. 2012.15151, 2020.
@article{Collins2020,
title = {Per-Instance Algorithm Selection for Recommender Systems via Instance Clustering},
author = {Andrew Collins and Laura Tierney and Joeran Beel},
url = {https://browse.arxiv.org/pdf/2012.15151.pdf},
year = {2020},
date = {2020-01-01},
journal = {arXiv},
number = {2012.15151},
abstract = {Recommendation algorithms perform differently if the users, recommendation contexts, applications, and user interfaces vary even slightly. It is similarly observed in other fields, such as combinatorial problem solving, that algorithms perform differently for each instance presented. In those fields, meta-learning is successfully used to predict an optimal algorithm for each instance, to improve overall system performance. Per-instance algorithm selection has thus far been unsuccessful for recommender systems. In this paper we propose a per-instance meta-learner that clusters data instances and predicts the best algorithm for unseen instances according to cluster membership. We test our approach using 10 collaborative- and 4 content-based filtering algorithms, for varying clustering parameters, and find a significant improvement over the best performing base algorithm at alpha=0.053 (MAE: 0.7107 vs LightGBM 0.7214; t-test). We also explore the performances of our base algorithms on a ratings dataset and empirically show that the error of a perfect algorithm selector monotonically decreases for larger pools of algorithm. To the best of our knowledge, this is the first effective meta-learning technique for per-instance algorithm selection in recommender systems.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Recommendation algorithms perform differently if the users, recommendation contexts, applications, and user interfaces vary even slightly. It is similarly observed in other fields, such as combinatorial problem solving, that algorithms perform differently for each instance presented. In those fields, meta-learning is successfully used to predict an optimal algorithm for each instance, to improve overall system performance. Per-instance algorithm selection has thus far been unsuccessful for recommender systems. In this paper we propose a per-instance meta-learner that clusters data instances and predicts the best algorithm for unseen instances according to cluster membership. We test our approach using 10 collaborative- and 4 content-based filtering algorithms, for varying clustering parameters, and find a significant improvement over the best performing base algorithm at alpha=0.053 (MAE: 0.7107 vs LightGBM 0.7214; t-test). We also explore the performances of our base algorithms on a ratings dataset and empirically show that the error of a perfect algorithm selector monotonically decreases for larger pools of algorithm. To the best of our knowledge, this is the first effective meta-learning technique for per-instance algorithm selection in recommender systems.
Collins, Andrew; Beel, Joeran
A First Analysis of Meta-Learned Per-Instance Algorithm Selection in Scholarly Recommender Systems Proceedings Article
In: Workshop on Recommendation in Complex Scenarios, 13th ACM Conference on Recommender Systems (RecSys), 2019.
@inproceedings{Collins2019a,
title = {A First Analysis of Meta-Learned Per-Instance Algorithm Selection in Scholarly Recommender Systems},
author = {Andrew Collins and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Workshop on Recommendation in Complex Scenarios, 13th ACM Conference on Recommender Systems (RecSys)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Kotthoff, Lars
Preface: The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR) Proceedings Article
In: Proceddings of The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR), pp. 1–9, CEUR-WS, 2019.
@inproceedings{Beel2019a,
title = {Preface: The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
author = {Joeran Beel and Lars Kotthoff},
year = {2019},
date = {2019-01-01},
booktitle = {Proceddings of The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
volume = {2431},
pages = {1–9},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Kotthoff, Lars
Proposal for the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR) Proceedings Article
In: Azzopardi, L.; Stein, B.; Fuhr, N.; Mayr, P.; Hauff, C.; Hiemstra, D. (Ed.): Proceedings of the 41st European Conference on Information Retrieval (ECIR), pp. 383–388, Springer, 2019.
@inproceedings{Beel2019c,
title = {Proposal for the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
author = {Joeran Beel and Lars Kotthoff},
editor = {L. Azzopardi and B. Stein and N. Fuhr and P. Mayr and C. Hauff and D. Hiemstra},
doi = {10.1007/978-3-030-15719-7_53},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 41st European Conference on Information Retrieval (ECIR)},
volume = {11438},
pages = {383–388},
publisher = {Springer},
series = {Lecture Notes in Computer Science book series (LNCS)},
abstract = {The algorithm selection problem describes the challenge of identifying
the best algorithm for a given problem space. In many domains, particularly
artificial intelligence, the algorithm selection problem is well
studied, and various approaches and tools exist to tackle it in practice.
Especially through meta-learning impressive performance improvements
have been achieved. The information retrieval (IR) community, however,
has paid little attention to the algorithm selection problem, although
the problem is highly relevant in information retrieval. This workshop
will bring together researchers from the fields of algorithm selection
and meta-learning as well as information retrieval. We aim to raise
the awareness in the IR community of the algorithm selection problem;
identify the potential for automatic algorithm selection in information
retrieval; and explore possible solutions for this context. In particular,
we will explore to what extent existing solutions to the algorithm
selection problem from other domains can be applied in information
retrieval, and also how techniques from IR can be used for automated
algorithm selection and meta-learning.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The algorithm selection problem describes the challenge of identifying
the best algorithm for a given problem space. In many domains, particularly
artificial intelligence, the algorithm selection problem is well
studied, and various approaches and tools exist to tackle it in practice.
Especially through meta-learning impressive performance improvements
have been achieved. The information retrieval (IR) community, however,
has paid little attention to the algorithm selection problem, although
the problem is highly relevant in information retrieval. This workshop
will bring together researchers from the fields of algorithm selection
and meta-learning as well as information retrieval. We aim to raise
the awareness in the IR community of the algorithm selection problem;
identify the potential for automatic algorithm selection in information
retrieval; and explore possible solutions for this context. In particular,
we will explore to what extent existing solutions to the algorithm
selection problem from other domains can be applied in information
retrieval, and also how techniques from IR can be used for automated
algorithm selection and meta-learning.
Collins, Andrew; Tkaczyk, Dominika; Beel, Joeran
A Novel Approach to Recommendation Algorithm Selection using Meta-Learning Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 210–219, CEUR-WS, 2018.
@inproceedings{Collins2018a,
title = {A Novel Approach to Recommendation Algorithm Selection using Meta-Learning},
author = {Andrew Collins and Dominika Tkaczyk and Joeran Beel},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {210–219},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Tkaczyk, Dominika; Sheridan, Paraic; Beel, Joeran
ParsRec: A Meta-Learning Recommender System for Bibliographic Reference Parsing Tools Proceedings Article
In: Proceedings of the 12th ACM Conference on Recommender Systems (RecSys), pp. 387–388, ACM, Fort Worth, Texas, USA, 2018.
@inproceedings{Tkaczyk2018,
title = {ParsRec: A Meta-Learning Recommender System for Bibliographic Reference Parsing Tools},
author = {Dominika Tkaczyk and Paraic Sheridan and Joeran Beel},
url = {http://doi.acm.org/10.1145/3197026.3203907},
doi = {10.1145/3197026.3203907},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 12th ACM Conference on Recommender Systems (RecSys)},
pages = {387–388},
publisher = {ACM},
address = {Fort Worth, Texas, USA},
series = {JCDL '18},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Tkaczyk, Dominika; Gupta, Rohit; Cinti, Riccardo; Beel, Joeran
ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 162–173, CEUR-WS, 2018.
@inproceedings{Tkaczyk2018b,
title = {ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers},
author = {Dominika Tkaczyk and Rohit Gupta and Riccardo Cinti and Joeran Beel},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
number = {1},
pages = {162–173},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
A Macro/Micro Recommender System for Recommendation Algorithms [Proposal] Journal Article
In: ResearchGate https://www.researchgate.net/publication/322138236_A_MacroMicro_Recommender_System_for_Recommendation_Algorithms_Proposal, 2017.
@article{Beel2017b,
title = {A Macro/Micro Recommender System for Recommendation Algorithms [Proposal]},
author = {Joeran Beel},
doi = {10.13140/RG.2.2.14978.79047},
year = {2017},
date = {2017-01-01},
journal = {ResearchGate https://www.researchgate.net/publication/322138236_A_MacroMicro_Recommender_System_for_Recommendation_Algorithms_Proposal},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Evaluation, Reproducibility, and RecSys Pipelines
Wegmeth, Lukas; Vente, Tobias; Purucker, Lennart; Beel, Joeran
The Effect of Random Seeds for Data Splitting on Recommendation Accuracy Proceedings Article
In: Proceedings of the 3rd Perspectives on the Evaluation of Recommender Systems Workshop, 2023.
@inproceedings{Wegmeth2023,
title = {The Effect of Random Seeds for Data Splitting on Recommendation Accuracy},
author = {Lukas Wegmeth and Tobias Vente and Lennart Purucker and Joeran Beel},
url = {https://ceur-ws.org/Vol-3476/paper4.pdf},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 3rd Perspectives on the Evaluation of Recommender Systems Workshop},
abstract = {The evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values k, and compare the results. In the extreme case with k = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for k = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of k for both holdout and cross-validation.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values k, and compare the results. In the extreme case with k = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for k = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of k for both holdout and cross-validation.
Wegmeth, Lukas
The Impact of Feature Quantity on Recommendation Algorithm Performance: A Movielens-100K Case Study Proceedings Article
In: arXiv:2207.08713, 2022.
@inproceedings{Wegmeth2022b,
title = {The Impact of Feature Quantity on Recommendation Algorithm Performance: A Movielens-100K Case Study},
author = {Lukas Wegmeth},
url = {https://arxiv.org/pdf/2207.08713.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {arXiv:2207.08713},
abstract = {Recent model-based Recommender Systems (RecSys) algorithms emphasize on the use of features, also called side information, in their design similar to algorithms in Machine Learning (ML). In contrast, some of the most popular and traditional algorithms for RecSys solely focus on a given user-item-rating relation without including side information. An important category of these is matrix factorization-based algorithms, e.g., Singular Value Decomposition and Alternating Least Squares, which are known to have high performance on RecSys data sets. The goal of this case study is to provide a performance comparison and assessment of RecSys and ML algorithms when side information is included. We chose the Movielens-100K data set since it is a standard for comparing RecSys algorithms. We compared six different feature sets with varying quantities of features which were generated from the baseline data and evaluated on a total of 19 RecSys algorithms, baseline ML algorithms, Automated Machine Learning (AutoML) pipelines, and state-of-the-art RecSys algorithms that incorporate side information. The results show that additional features benefit all algorithms we evaluated. However, the correlation between feature quantity and performance is not monotonous for AutoML and RecSys. In these categories, an analysis of feature importance revealed that the quality of features matters more than quantity. Throughout our experiments, the average performance on the feature set with the lowest number of features is ∼6% worse compared to that with the highest in terms of the Root Mean Squared Error. An interesting observation is that AutoML outperforms matrix factorization-based RecSys algorithms when additional features are used. Almost all algorithms that can include side information have higher performance when using the highest quantity of features. In the other cases, the performance difference is negligible (<1%). The results show a clear positive trend for the effect of feature quantity as well as the important effects of feature quality on the evaluated algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Recent model-based Recommender Systems (RecSys) algorithms emphasize on the use of features, also called side information, in their design similar to algorithms in Machine Learning (ML). In contrast, some of the most popular and traditional algorithms for RecSys solely focus on a given user-item-rating relation without including side information. An important category of these is matrix factorization-based algorithms, e.g., Singular Value Decomposition and Alternating Least Squares, which are known to have high performance on RecSys data sets. The goal of this case study is to provide a performance comparison and assessment of RecSys and ML algorithms when side information is included. We chose the Movielens-100K data set since it is a standard for comparing RecSys algorithms. We compared six different feature sets with varying quantities of features which were generated from the baseline data and evaluated on a total of 19 RecSys algorithms, baseline ML algorithms, Automated Machine Learning (AutoML) pipelines, and state-of-the-art RecSys algorithms that incorporate side information. The results show that additional features benefit all algorithms we evaluated. However, the correlation between feature quantity and performance is not monotonous for AutoML and RecSys. In these categories, an analysis of feature importance revealed that the quality of features matters more than quantity. Throughout our experiments, the average performance on the feature set with the lowest number of features is ∼6% worse compared to that with the highest in terms of the Root Mean Squared Error. An interesting observation is that AutoML outperforms matrix factorization-based RecSys algorithms when additional features are used. Almost all algorithms that can include side information have higher performance when using the highest quantity of features. In the other cases, the performance difference is negligible (<1%). The results show a clear positive trend for the effect of feature quantity as well as the important effects of feature quality on the evaluated algorithms.
Scheidt, Teresa; Beel, Joeran
Time-dependent Evaluation of Recommender Systems Proceedings Article
In: Perspectives on the Evaluation of Recommender Systems Workshop, ACM RecSys Conference, 2021.
@inproceedings{Scheidt2021,
title = {Time-dependent Evaluation of Recommender Systems},
author = {Teresa Scheidt and Joeran Beel},
url = {https://ceur-ws.org/Vol-2955/paper10.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Perspectives on the Evaluation of Recommender Systems Workshop, ACM RecSys Conference},
abstract = {Evaluation of recommender systems is an actively discussed topic in the recommender system community. However, some aspects of evaluation have received little to no attention, one of them being whether evaluating recommender system algorithms with single-number metrics is sufficient. When presenting results as a single number, the only possible assumption is a stable performance over time regardless of changes in the datasets, while it intuitively seems more likely that the performance changes over time. We suggest presenting results over time, making it possible to identify trends and changes in performance as the dataset grows and changes. In this paper, we conduct an analysis of 6 algorithms on 10 datasets over time to identify the need for a time-dependent evaluation. To enable this evaluation over time, we split the datasets based on the provided timesteps into smaller subsets. At every tested timepoint we use all available data up to this timepoint, simulating a growing dataset as encountered in the realworld. Our results show that for 90% of the datasets the performance changes over time and in 60% even the ranking of algorithms changes over time.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Evaluation of recommender systems is an actively discussed topic in the recommender system community. However, some aspects of evaluation have received little to no attention, one of them being whether evaluating recommender system algorithms with single-number metrics is sufficient. When presenting results as a single number, the only possible assumption is a stable performance over time regardless of changes in the datasets, while it intuitively seems more likely that the performance changes over time. We suggest presenting results over time, making it possible to identify trends and changes in performance as the dataset grows and changes. In this paper, we conduct an analysis of 6 algorithms on 10 datasets over time to identify the need for a time-dependent evaluation. To enable this evaluation over time, we split the datasets based on the provided timesteps into smaller subsets. At every tested timepoint we use all available data up to this timepoint, simulating a growing dataset as encountered in the realworld. Our results show that for 90% of the datasets the performance changes over time and in 60% even the ranking of algorithms changes over time.
Beel, Joeran; Brunel, Victor
Data Pruning in Recommender Systems Research: Best-Practice or Malpractice? Proceedings Article
In: 13th ACM Conference on Recommender Systems (RecSys), pp. 26–30, CEUR-WS, 2019.
@inproceedings{Beel2019d,
title = {Data Pruning in Recommender Systems Research: Best-Practice or Malpractice?},
author = {Joeran Beel and Victor Brunel},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
volume = {2431},
pages = {26–30},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Breitinger, Corinna; Langer, Stefan; Lommatzsch, Andreas; Gipp, Bela
Towards Reproducibility in Recommender-Systems Research Journal Article
In: User Modeling and User-Adapted Interaction (UMUAI), vol. 26, no. 1, pp. 69-101, 2016.
@article{Beel2016,
title = {Towards Reproducibility in Recommender-Systems Research},
author = {Joeran Beel and Corinna Breitinger and Stefan Langer and Andreas Lommatzsch and Bela Gipp},
doi = {10.1007/s11257-016-9174-x},
year = {2016},
date = {2016-01-01},
journal = {User Modeling and User-Adapted Interaction (UMUAI)},
volume = {26},
number = {1},
pages = {69-101},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Langer, Stefan
A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems Proceedings Article
In: Kapidakis, Sarantos; Mazurek, Cezary; Werla, Marcin (Ed.): Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL), pp. 153-168, 2015.
@inproceedings{Beel2015a,
title = {A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems},
author = {Joeran Beel and Stefan Langer},
editor = {Sarantos Kapidakis and Cezary Mazurek and Marcin Werla},
doi = {10.1007/978-3-319-24592-8_12},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL)},
volume = {9316},
pages = {153-168},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Langer, Stefan; Beel, Joeran
The Comparability of Recommender System Evaluations and Characteristics of Docear's Users Proceedings Article
In: Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys), pp. 1–6, CEUR-WS, 2014.
@inproceedings{Langer2014,
title = {The Comparability of Recommender System Evaluations and Characteristics of Docear's Users},
author = {Stefan Langer and Joeran Beel},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys)},
pages = {1–6},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Nuenberger, Andreas; Genzmehr, Marcel
The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems Proceedings Article
In: Aalberg, Trond; Dobreva, Milena; Papatheodorou, Christos; Tsakonas, Giannis; Farrugia, Charles (Ed.): Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 400–404, Springer, Valletta, Malta, 2013.
@inproceedings{Beel2013f,
title = {The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems},
author = {Joeran Beel and Stefan Langer and Andreas Nuenberger and Marcel Genzmehr},
editor = {Trond Aalberg and Milena Dobreva and Christos Papatheodorou and Giannis Tsakonas and Charles Farrugia},
year = {2013},
date = {2013-09-01},
booktitle = {Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013)},
pages = {400–404},
publisher = {Springer},
address = {Valletta, Malta},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Gipp, Bela; Nürnberger, Andreas
A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation Proceedings Article
In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), pp. 7-14, 2013.
@inproceedings{Beel2013d,
title = {A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Bela Gipp and Andreas Nürnberger},
doi = {10.1145/2532508.2532511},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys)},
pages = {7-14},
series = {ACM International Conference Proceedings Series (ICPS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
Evaluations in Information Retrieval: Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs. (Root) Mean Squared Error (MSE / RMSE) vs. Precision electronic
2012.
@electronic{Beel2012,
title = {Evaluations in Information Retrieval: Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs. (Root) Mean Squared Error (MSE / RMSE) vs. Precision},
author = {Joeran Beel},
url = {http://www.docear.org/2012/09/21/evaluations-in-information-retrieval-click-through-rate-ctr-vs-mean-absolute-error-mae-vs-root-mean-squared-error-mse-rmse-vs-precision/},
year = {2012},
date = {2012-09-01},
organization = {Docear},
howpublished = {Blog},
keywords = {},
pubstate = {published},
tppubtype = {electronic}
}
Algorithms for Recommender Systems, User Modelling, and Term Weighting
Wegmeth, Lukas; Vente, Tobias; Purucker, Lennart; Beel, Joeran
The Effect of Random Seeds for Data Splitting on Recommendation Accuracy Proceedings Article
In: Proceedings of the 3rd Perspectives on the Evaluation of Recommender Systems Workshop, 2023.
@inproceedings{Wegmeth2023,
title = {The Effect of Random Seeds for Data Splitting on Recommendation Accuracy},
author = {Lukas Wegmeth and Tobias Vente and Lennart Purucker and Joeran Beel},
url = {https://ceur-ws.org/Vol-3476/paper4.pdf},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 3rd Perspectives on the Evaluation of Recommender Systems Workshop},
abstract = {The evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values k, and compare the results. In the extreme case with k = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for k = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of k for both holdout and cross-validation.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values k, and compare the results. In the extreme case with k = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for k = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of k for both holdout and cross-validation.
Scheidt, Teresa; Beel, Joeran
Time-dependent Evaluation of Recommender Systems Proceedings Article
In: Perspectives on the Evaluation of Recommender Systems Workshop, ACM RecSys Conference, 2021.
@inproceedings{Scheidt2021,
title = {Time-dependent Evaluation of Recommender Systems},
author = {Teresa Scheidt and Joeran Beel},
url = {https://ceur-ws.org/Vol-2955/paper10.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Perspectives on the Evaluation of Recommender Systems Workshop, ACM RecSys Conference},
abstract = {Evaluation of recommender systems is an actively discussed topic in the recommender system community. However, some aspects of evaluation have received little to no attention, one of them being whether evaluating recommender system algorithms with single-number metrics is sufficient. When presenting results as a single number, the only possible assumption is a stable performance over time regardless of changes in the datasets, while it intuitively seems more likely that the performance changes over time. We suggest presenting results over time, making it possible to identify trends and changes in performance as the dataset grows and changes. In this paper, we conduct an analysis of 6 algorithms on 10 datasets over time to identify the need for a time-dependent evaluation. To enable this evaluation over time, we split the datasets based on the provided timesteps into smaller subsets. At every tested timepoint we use all available data up to this timepoint, simulating a growing dataset as encountered in the realworld. Our results show that for 90% of the datasets the performance changes over time and in 60% even the ranking of algorithms changes over time.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Evaluation of recommender systems is an actively discussed topic in the recommender system community. However, some aspects of evaluation have received little to no attention, one of them being whether evaluating recommender system algorithms with single-number metrics is sufficient. When presenting results as a single number, the only possible assumption is a stable performance over time regardless of changes in the datasets, while it intuitively seems more likely that the performance changes over time. We suggest presenting results over time, making it possible to identify trends and changes in performance as the dataset grows and changes. In this paper, we conduct an analysis of 6 algorithms on 10 datasets over time to identify the need for a time-dependent evaluation. To enable this evaluation over time, we split the datasets based on the provided timesteps into smaller subsets. At every tested timepoint we use all available data up to this timepoint, simulating a growing dataset as encountered in the realworld. Our results show that for 90% of the datasets the performance changes over time and in 60% even the ranking of algorithms changes over time.
Marwah, Divyanshu; Beel, Joeran
Term-Recency for TF-IDF, BM25 and USE Term Weighting Proceedings Article
In: Proceedings of the 8th International Workshop on Mining Scientific Publications, pp. 36–41, Association for Computational Linguistics, Wuhan, China, 2020.
@inproceedings{Marwah2020,
title = {Term-Recency for TF-IDF, BM25 and USE Term Weighting},
author = {Divyanshu Marwah and Joeran Beel},
url = {https://aclanthology.org/2020.wosp-1.5.pdf
https://www.aclweb.org/anthology/2020.wosp-1.5},
year = {2020},
date = {2020-08-01},
booktitle = {Proceedings of the 8th International Workshop on Mining Scientific Publications},
pages = {36–41},
publisher = {Association for Computational Linguistics},
address = {Wuhan, China},
abstract = {Effectiveness of a recommendation in an Information Retrieval (IR)
system is determined by relevancy scores of retrieved results. Term
weighting is responsible for computing the relevance scores and consequently
differentiating between the terms in a document. However, the current
term weighting formula (TF-IDF, for instance), weighs terms only
based on term frequency and inverse document frequency irrespective
of other important factors. This results in ambiguity in cases when
both TF and IDF values the same for more than one document, hence
resulting in same TF-IDF values. In this paper, we propose a modification
of TF-IDF and other term-weighting schemes that weighs the terms
based on the recency and the usage in the corpus. We have tested
the performance of our algorithm with existing term weighting schemes;
TF-IDF, BM25 and USE text embedding model. We have indexed three
different datasets with different domains to validate the premises
for our algorithm. On evaluating the algorithms using Precision,
Recall, F1 score, and NDCG, we found that time normalized TF-IDF
outperformed the classic TF-IDF with a significant difference in
all the metrics and datasets. Time-based USE model performed better
than the standard USE model in two out of three datasets. But the
time-based BM25 model did not perform well in some of the input queries
as compared to standard BM25 model.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Effectiveness of a recommendation in an Information Retrieval (IR)
system is determined by relevancy scores of retrieved results. Term
weighting is responsible for computing the relevance scores and consequently
differentiating between the terms in a document. However, the current
term weighting formula (TF-IDF, for instance), weighs terms only
based on term frequency and inverse document frequency irrespective
of other important factors. This results in ambiguity in cases when
both TF and IDF values the same for more than one document, hence
resulting in same TF-IDF values. In this paper, we propose a modification
of TF-IDF and other term-weighting schemes that weighs the terms
based on the recency and the usage in the corpus. We have tested
the performance of our algorithm with existing term weighting schemes;
TF-IDF, BM25 and USE text embedding model. We have indexed three
different datasets with different domains to validate the premises
for our algorithm. On evaluating the algorithms using Precision,
Recall, F1 score, and NDCG, we found that time normalized TF-IDF
outperformed the classic TF-IDF with a significant difference in
all the metrics and datasets. Time-based USE model performed better
than the standard USE model in two out of three datasets. But the
time-based BM25 model did not perform well in some of the input queries
as compared to standard BM25 model.
Molloy, Paul; Beel, Joeran; Aizawa, Akiko
Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents Proceedings Article
In: Proceedings of the 8th International Workshop on Mining Scientific Publications, pp. 1–8, Association for Computational Linguistics, Wuhan, China, 2020.
@inproceedings{Molloy2020,
title = {Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents},
author = {Paul Molloy and Joeran Beel and Akiko Aizawa},
url = {https://www.aclweb.org/anthology/2020.wosp-1.1},
year = {2020},
date = {2020-08-01},
booktitle = {Proceedings of the 8th International Workshop on Mining Scientific Publications},
pages = {1–8},
publisher = {Association for Computational Linguistics},
address = {Wuhan, China},
abstract = {The relatedness of research articles, patents, court rulings, web
pages, and other document types is often calculated with citation
or hyperlink-based approaches like co-citation (proximity) analysis.
The main limitation of citation-based approaches is that they cannot
be used for documents that receive little or no citations. We propose
Virtual Citation Proximity (VCP), a Siamese Neural Network architecture,
which combines the advantages of co-citation proximity analysis (diverse
notions of relatedness / high recommendation performance), with the
advantage of content-based filtering (high coverage). VCP is trained
on a corpus of documents with textual features, and with real citation
proximity as ground truth. VCP then predicts for any two documents,
based on their title and abstract, in what proximity the two documents
would be co-cited, if they were indeed co-cited. The prediction can
be used in the same way as real citation proximity to calculate document
relatedness, even for uncited documents. In our evaluation with 2
million co-citations from Wikipedia articles, VCP achieves an MAE
of 0.0055, i.e. an improvement of 20% over the baseline, though
the learning curve suggests that more work is needed.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The relatedness of research articles, patents, court rulings, web
pages, and other document types is often calculated with citation
or hyperlink-based approaches like co-citation (proximity) analysis.
The main limitation of citation-based approaches is that they cannot
be used for documents that receive little or no citations. We propose
Virtual Citation Proximity (VCP), a Siamese Neural Network architecture,
which combines the advantages of co-citation proximity analysis (diverse
notions of relatedness / high recommendation performance), with the
advantage of content-based filtering (high coverage). VCP is trained
on a corpus of documents with textual features, and with real citation
proximity as ground truth. VCP then predicts for any two documents,
based on their title and abstract, in what proximity the two documents
would be co-cited, if they were indeed co-cited. The prediction can
be used in the same way as real citation proximity to calculate document
relatedness, even for uncited documents. In our evaluation with 2
million co-citations from Wikipedia articles, VCP achieves an MAE
of 0.0055, i.e. an improvement of 20% over the baseline, though
the learning curve suggests that more work is needed.
Carroll, Oisín; Beel, Joeran
Finite Group Equivariant Neural Networks for Games Journal Article
In: arXiv, no. 2009.05027, pp. 1–8, 2020.
@article{Carroll2020,
title = {Finite Group Equivariant Neural Networks for Games},
author = {Oisín Carroll and Joeran Beel},
url = {https://arxiv.org/abs/2009.05027},
year = {2020},
date = {2020-01-01},
journal = {arXiv},
number = {2009.05027},
pages = {1–8},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Al-Rawi, Mohammed; Beel, Joeran
Towards an Interoperable Data Protocol Aimed at Linking the Fashion Industry with AI Companies Journal Article
In: arXiv:2009.03005, 2020.
@article{AlRawi2020,
title = {Towards an Interoperable Data Protocol Aimed at Linking the Fashion Industry with AI Companies},
author = {Mohammed Al-Rawi and Joeran Beel},
url = {https://arxiv.org/abs/2009.03005},
year = {2020},
date = {2020-01-01},
journal = {arXiv:2009.03005},
abstract = {The fashion industry is looking forward to use artificial intelligence technologies to enhance their processes, services, and applications. Although the amount of fashion data currently in use is increasing, there is a large gap in data exchange between the fashion industry and the related AI companies, not to mention the different structure used for each fashion dataset. As a result, AI companies are relying on manually annotated fashion data to build different applications. Furthermore, as of this writing, the terminology, vocabulary and methods of data representation used to denote fashion items are still ambiguous and confusing. Hence, it is clear that the fashion industry and AI companies will benefit from a protocol that allows them to exchange and organise fashion information in a unified way. To achieve this goal we aim (1) to define a protocol called DDOIF that will allow interoperability of fashion data; (2) for DDOIF to contain diverse entities including extensive information on clothing and accessories attributes in the form of text and various media formats; and (3)To design and implement an API that includes, among other things, functions for importing and exporting a file built according to the DDOIF protocol that stores all information about a single item of clothing. To this end, we identified over 1000 class and subclass names used to name fashion items and use them to build the DDOIF dictionary. We make DDOIF publicly available to all interested users and developers and look forward to engaging more collaborators to improve and enrich it.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The fashion industry is looking forward to use artificial intelligence technologies to enhance their processes, services, and applications. Although the amount of fashion data currently in use is increasing, there is a large gap in data exchange between the fashion industry and the related AI companies, not to mention the different structure used for each fashion dataset. As a result, AI companies are relying on manually annotated fashion data to build different applications. Furthermore, as of this writing, the terminology, vocabulary and methods of data representation used to denote fashion items are still ambiguous and confusing. Hence, it is clear that the fashion industry and AI companies will benefit from a protocol that allows them to exchange and organise fashion information in a unified way. To achieve this goal we aim (1) to define a protocol called DDOIF that will allow interoperability of fashion data; (2) for DDOIF to contain diverse entities including extensive information on clothing and accessories attributes in the form of text and various media formats; and (3)To design and implement an API that includes, among other things, functions for importing and exporting a file built according to the DDOIF protocol that stores all information about a single item of clothing. To this end, we identified over 1000 class and subclass names used to name fashion items and use them to build the DDOIF dictionary. We make DDOIF publicly available to all interested users and developers and look forward to engaging more collaborators to improve and enrich it.
Scharpf, Philipp; Mackerracher, Ian; Schubotz, Moritz; Beel, Joeran; Breitinger, Corinna; Gipp, Bela
AnnoMathTeX - a Formula Annotation Recommender System for STEM Documents Proceedings Article
In: 13th ACM Conference on Recommender Systems (RecSys), 2019.
@inproceedings{Scharpf2019,
title = {AnnoMathTeX - a Formula Annotation Recommender System for STEM Documents},
author = {Philipp Scharpf and Ian Mackerracher and Moritz Schubotz and Joeran Beel and Corinna Breitinger and Bela Gipp},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Edenhofer, Gordian; Collins, Andrew; Aizawa, Akiko; Beel, Joeran
Augmenting the DonorsChoose.org Corpus for Meta-Learning Proceedings Article
In: Proceedings of The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR), pp. 32–38, CEUR-WS, 2019.
@inproceedings{Edenhofer2019,
title = {Augmenting the DonorsChoose.org Corpus for Meta-Learning},
author = {Gordian Edenhofer and Andrew Collins and Akiko Aizawa and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
pages = {32–38},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Hassan, Hebatallah A. Mohamed; Sansonetti, Giuseppe; Gasparetti, Fabio; Micarelli, Alessandro; Beel, Joeran
BERT, ELMo, USE and InferSent Sentence Encoders: The Panacea for Research-Paper Recommendation? Proceedings Article
In: 13th ACM Conference on Recommender Systems (RecSys), pp. 6–10, CEUR-WS, 2019.
@inproceedings{Hassan2019,
title = {BERT, ELMo, USE and InferSent Sentence Encoders: The Panacea for Research-Paper Recommendation?},
author = {Hebatallah A. Mohamed Hassan and Giuseppe Sansonetti and Fabio Gasparetti and Alessandro Micarelli and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
volume = {2431},
pages = {6–10},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beierle, Felix; Aizawa, Akiko; Collins, Andrew; Beel, Joeran
Choice overload and recommendation effectiveness in related-article recommendations Journal Article
In: International Journal of Digital Libraries (IJDL), pp. 1–16, 2019.
@article{Beierle2019,
title = {Choice overload and recommendation effectiveness in related-article recommendations},
author = {Felix Beierle and Akiko Aizawa and Andrew Collins and Joeran Beel},
doi = {10.1007/s00799-019-00270-7},
year = {2019},
date = {2019-01-01},
journal = {International Journal of Digital Libraries (IJDL)},
pages = {1–16},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Tunstead, Keith; Beel, Joeran
Combating Stagnation in Reinforcement Learning Through 'Guided Learning' With 'Taught-Response Memory' Proceedings Article
In: 3rd International Tutorial & Workshop on Interactive Adaptive Learning (IAL2019) at the ECML PKDD Conference, 2019.
@inproceedings{Tunstead2019,
title = {Combating Stagnation in Reinforcement Learning Through 'Guided Learning' With 'Taught-Response Memory'},
author = {Keith Tunstead and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {3rd International Tutorial & Workshop on Interactive Adaptive Learning (IAL2019) at the ECML PKDD Conference},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Griffin, Alan; O'Shey, Conor
Darwin & Goliath: Recommendations-As-a-Service with Automated Algorithm-Selection and White-Labels Proceedings Article
In: Azzopardi, Leif; Stein, Benno; Fuhr, Norbert; Mayr, Philipp; Hauff, Claudia; Hiemstra, Djoerd (Ed.): 13th ACM Conference on Recommender Systems (RecSys), pp. 213–219, 2019.
@inproceedings{Beel2019b,
title = {Darwin & Goliath: Recommendations-As-a-Service with Automated Algorithm-Selection and White-Labels},
author = {Joeran Beel and Alan Griffin and Conor O'Shey},
editor = {Leif Azzopardi and Benno Stein and Norbert Fuhr and Philipp Mayr and Claudia Hauff and Djoerd Hiemstra},
doi = {10.1007/978-3-030-15719-7_27},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
volume = {11438},
pages = {213–219},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Brunel, Victor
Data Pruning in Recommender Systems Research: Best-Practice or Malpractice? Proceedings Article
In: 13th ACM Conference on Recommender Systems (RecSys), pp. 26–30, CEUR-WS, 2019.
@inproceedings{Beel2019d,
title = {Data Pruning in Recommender Systems Research: Best-Practice or Malpractice?},
author = {Joeran Beel and Victor Brunel},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
volume = {2431},
pages = {26–30},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collins, Andrew; Beel, Joeran
Document Embeddings vs. Keyphrases vs. Terms: A Large-Scale Online Evaluation in Digital Library Recommender Systems Proceedings Article
In: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2019.
@inproceedings{Collins2019,
title = {Document Embeddings vs. Keyphrases vs. Terms: A Large-Scale Online Evaluation in Digital Library Recommender Systems},
author = {Andrew Collins and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collier, Mark; Beel, Joeran
Memory-Augmented Neural Networks for Machine Translation Proceedings Article
In: Proceedings of the Machine Translation (MT) Summit, 2019.
@inproceedings{Collier2019,
title = {Memory-Augmented Neural Networks for Machine Translation},
author = {Mark Collier and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the Machine Translation (MT) Summit},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Smyth, Barry; Collins, Andrew
RARD II: The 94 Million Related-Article Recommendation Dataset Proceedings Article
In: Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR), pp. 39–55, CEUR-WS, 2019.
@inproceedings{Beel2019e,
title = {RARD II: The 94 Million Related-Article Recommendation Dataset},
author = {Joeran Beel and Barry Smyth and Andrew Collins},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
pages = {39–55},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collier, Mark; Beel, Joeran
Implementing Neural Turing Machines Proceedings Article
In: Kůrková, Věra; Manolopoulos, Yannis; Hammer, Barbara; Iliadis, Lazaros; Maglogiannis, Ilias (Ed.): 27th International Conference on Artificial Neural Networks (ICANN), pp. 94–104, Springer International Publishing, Cham, 2018, ISBN: 978-3-030-01424-7.
@inproceedings{Collier2018,
title = {Implementing Neural Turing Machines},
author = {Mark Collier and Joeran Beel},
editor = {Věra Kůrková and Yannis Manolopoulos and Barbara Hammer and Lazaros Iliadis and Ilias Maglogiannis},
doi = {10.1007/978-3-030-01424-7_10},
isbn = {978-3-030-01424-7},
year = {2018},
date = {2018-01-01},
booktitle = {27th International Conference on Artificial Neural Networks (ICANN)},
pages = {94–104},
publisher = {Springer International Publishing},
address = {Cham},
series = {Lecture Notes in Computer Science},
abstract = {Neural Turing Machines (NTMs) are an instance of Memory Augmented
Neural Networks, a new class of recurrent neural networks which decouple
computation from memory by introducing an external memory unit. NTMs
have demonstrated superior performance over Long Short-Term Memory
Cells in several sequence learning tasks. A number of open source
implementations of NTMs exist but are unstable during training and/or
fail to replicate the reported performance of NTMs. This paper presents
the details of our successful implementation of a NTM. Our implementation
learns to solve three sequential learning tasks from the original
NTM paper. We find that the choice of memory contents initialization
scheme is crucial in successfully implementing a NTM. Networks with
memory contents initialized to small constant values converge on
average 2 times faster than the next best memory contents initialization
scheme.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neural Turing Machines (NTMs) are an instance of Memory Augmented
Neural Networks, a new class of recurrent neural networks which decouple
computation from memory by introducing an external memory unit. NTMs
have demonstrated superior performance over Long Short-Term Memory
Cells in several sequence learning tasks. A number of open source
implementations of NTMs exist but are unstable during training and/or
fail to replicate the reported performance of NTMs. This paper presents
the details of our successful implementation of a NTM. Our implementation
learns to solve three sequential learning tasks from the original
NTM paper. We find that the choice of memory contents initialization
scheme is crucial in successfully implementing a NTM. Networks with
memory contents initialized to small constant values converge on
average 2 times faster than the next best memory contents initialization
scheme.
Collins, Andrew; Tkaczyk, Dominika; Aizawa, Akiko; Beel, Joeran
Position Bias in Recommender Systems for Digital Libraries Proceedings Article
In: Proceedings of the iConference, pp. 335-344, Springer, 2018.
@inproceedings{Collins2018,
title = {Position Bias in Recommender Systems for Digital Libraries},
author = {Andrew Collins and Dominika Tkaczyk and Akiko Aizawa and Joeran Beel},
doi = {10.1007/978-3-319-78105-1_37},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the iConference},
volume = {10766},
pages = {335-344},
publisher = {Springer},
series = {Lecture Notes on Computer Science (LNCS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Collins, Andrew; Aizawa, Akiko
The Architecture of Mr. DLib’s Scientific Recommender-System API Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 78–89, CEUR-WS, 2018.
@inproceedings{Beel2018,
title = {The Architecture of Mr. DLib’s Scientific Recommender-System API},
author = {Joeran Beel and Andrew Collins and Akiko Aizawa},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {78–89},
publisher = {CEUR-WS},
abstract = {Recommender systems in academia are not widely available. This may
be in part due to the difficulty and cost of developing and maintaining
recommender systems. Many operators of academic products such as
digital libraries and reference managers avoid this effort, although
a recommender system could provide significant benefits to their
users. In this paper, we introduce Mr. DLib’s “Recommendations as-a-Service"
(RaaS) API that allows operators of academic products to easily integrate
a scientific recommender system into their products. Mr. DLib generates
recommendations for research articles but in the future, recommendations
may include call for papers, grants, etc. Operators of academic products
can request recommendations from Mr. DLib and display these recommendations
to their users. Mr. DLib can be integrated in just a few hours or
days; creating an equivalent recommender system from scratch would
require several months for an academic operator. Mr. DLib has been
used by GESIS´ Sowiport and by the reference manager JabRef. Mr.
DLib is open source and its goal is to facilitate the application
of, and research on, scientific recommender systems. In this paper,
we present the motivation for Mr. DLib, the architecture and details
about the effectiveness. Mr. DLib has delivered 94m recommendations
over a span of two years with an average click-through rate of 0.12%.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Recommender systems in academia are not widely available. This may
be in part due to the difficulty and cost of developing and maintaining
recommender systems. Many operators of academic products such as
digital libraries and reference managers avoid this effort, although
a recommender system could provide significant benefits to their
users. In this paper, we introduce Mr. DLib’s “Recommendations as-a-Service"
(RaaS) API that allows operators of academic products to easily integrate
a scientific recommender system into their products. Mr. DLib generates
recommendations for research articles but in the future, recommendations
may include call for papers, grants, etc. Operators of academic products
can request recommendations from Mr. DLib and display these recommendations
to their users. Mr. DLib can be integrated in just a few hours or
days; creating an equivalent recommender system from scratch would
require several months for an academic operator. Mr. DLib has been
used by GESIS´ Sowiport and by the reference manager JabRef. Mr.
DLib is open source and its goal is to facilitate the application
of, and research on, scientific recommender systems. In this paper,
we present the motivation for Mr. DLib, the architecture and details
about the effectiveness. Mr. DLib has delivered 94m recommendations
over a span of two years with an average click-through rate of 0.12%.
Langer, Stefan; Beel, Joeran
Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned Proceedings Article
In: 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR), pp. 85-92, 2017.
@inproceedings{Langer2017,
title = {Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned},
author = {Stefan Langer and Joeran Beel},
year = {2017},
date = {2017-01-01},
booktitle = {5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR)},
pages = {85-92},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Breitinger, Corinna; Langer, Stefan
Evaluating the CC-IDF citation-weighting scheme: How effectively can 'Inverse Document Frequency' (IDF) be applied to references? Proceedings Article
In: Proceedings of the 12th iConference, 2017.
@inproceedings{Beel2017,
title = {Evaluating the CC-IDF citation-weighting scheme: How effectively can 'Inverse Document Frequency' (IDF) be applied to references?},
author = {Joeran Beel and Corinna Breitinger and Stefan Langer},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 12th iConference},
abstract = {In the domain of academic search engines and research-paper recommender
systems, CC-IDF is a common citation-weighting scheme that is used
to calculate semantic relatedness between documents. CC-IDF adopts
the principles of the popular term-weighting scheme TF-IDF and assumes
that if a rare academic citation is shared by two documents then
this occurrence should receive a higher weight than if the citation
is shared among a large number of documents. Although CC-IDF is in
common use, we found no empirical evaluation and comparison of CC-IDF
with plain citation weight (CC-Only). Therefore, we conducted such
an evaluation and present the results in this paper. The evaluation
was conducted with real users of the recommender system Docear. The
effectiveness of CC-IDF and CC-Only was measured using click-through
rate (CTR). For 238,681 delivered recommendations, CC-IDF had about
the same effectiveness as CC-Only (CTR of 6.15% vs. 6.23%). In other
words, CC-IDF was not more effective than CC-Only, which is a surprising
result. We provide a number of potential reasons and suggest to conduct
further research to understand the principles of CC-IDF in more detail.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In the domain of academic search engines and research-paper recommender
systems, CC-IDF is a common citation-weighting scheme that is used
to calculate semantic relatedness between documents. CC-IDF adopts
the principles of the popular term-weighting scheme TF-IDF and assumes
that if a rare academic citation is shared by two documents then
this occurrence should receive a higher weight than if the citation
is shared among a large number of documents. Although CC-IDF is in
common use, we found no empirical evaluation and comparison of CC-IDF
with plain citation weight (CC-Only). Therefore, we conducted such
an evaluation and present the results in this paper. The evaluation
was conducted with real users of the recommender system Docear. The
effectiveness of CC-IDF and CC-Only was measured using click-through
rate (CTR). For 238,681 delivered recommendations, CC-IDF had about
the same effectiveness as CC-Only (CTR of 6.15% vs. 6.23%). In other
words, CC-IDF was not more effective than CC-Only, which is a surprising
result. We provide a number of potential reasons and suggest to conduct
further research to understand the principles of CC-IDF in more detail.
Beierle, Felix; Aizawa, Akiko; Beel, Joeran
Exploring Choice Overload in Related-Article Recommendations in Digital Libraries Proceedings Article
In: 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR), pp. 51–61, 2017.
@inproceedings{Beierle2017,
title = {Exploring Choice Overload in Related-Article Recommendations in Digital Libraries},
author = {Felix Beierle and Akiko Aizawa and Joeran Beel},
year = {2017},
date = {2017-01-01},
booktitle = {5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR)},
pages = {51–61},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Feyer, Stefan; Siebert, Sophie; Gipp, Bela; Aizawa, Akiko; Beel, Joeran
Integration of the Scientific Recommender System Mr. DLib into the Reference Manager JabRef Proceedings Article
In: Proceedings of the 39th European Conference on Information Retrieval (ECIR), pp. 770–774, 2017.
@inproceedings{Feyer2017,
title = {Integration of the Scientific Recommender System Mr. DLib into the Reference Manager JabRef},
author = {Stefan Feyer and Sophie Siebert and Bela Gipp and Akiko Aizawa and Joeran Beel},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 39th European Conference on Information Retrieval (ECIR)},
pages = {770–774},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Aizawa, Akiko; Breitinger, Corinna; Gipp, Bela
Mr. DLib: Recommendations-as-a-service (RaaS) for Academia Proceedings Article
In: Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, pp. 313–314, IEEE Press, Toronto, Ontario, Canada, 2017, ISBN: 978-1-5386-3861-3.
@inproceedings{Beel2017c,
title = {Mr. DLib: Recommendations-as-a-service (RaaS) for Academia},
author = {Joeran Beel and Akiko Aizawa and Corinna Breitinger and Bela Gipp},
url = {http://dl.acm.org/citation.cfm?id=3200334.3200389},
isbn = {978-1-5386-3861-3},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries},
pages = {313–314},
publisher = {IEEE Press},
address = {Toronto, Ontario, Canada},
series = {JCDL '17},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Dinesh, Siddharth
Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them Proceedings Article
In: Mayr, Philipp; Frommholz, Ingo; Cabanac, Guillaume (Ed.): Proceedings of the Fifth Workshop on Bibliometric-enhanced Information Retrieval (BIR) co-located with the 39th European Conference on Information Retrieval (ECIR 2017), pp. 6-17, 2017.
@inproceedings{Beel2017e,
title = {Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them},
author = {Joeran Beel and Siddharth Dinesh},
editor = {Philipp Mayr and Ingo Frommholz and Guillaume Cabanac},
url = {http://ceur-ws.org/Vol-1823/paper1.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the Fifth Workshop on Bibliometric-enhanced Information Retrieval (BIR) co-located with the 39th European Conference on Information Retrieval (ECIR 2017)},
volume = {1823},
pages = {6-17},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Dinesh, Siddharth
Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them [Long Version] Proceedings Article
In: arxiv pre-print. https://arxiv.org/abs/1704.00156, Harvard Dataverse, 2017.
@inproceedings{Beel2017f,
title = {Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them [Long Version]},
author = {Joeran Beel and Siddharth Dinesh},
url = {http://dx.doi.org/10.7910/DVN/HFIV1A},
doi = {10.7910/DVN/HFIV1A},
year = {2017},
date = {2017-01-01},
booktitle = {arxiv pre-print. https://arxiv.org/abs/1704.00156},
publisher = {Harvard Dataverse},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Dinesh, Siddharth; Mayr, Philipp; Carevic, Zeljko; Raghvendra, Jain
Stereotype and Most-Popular Recommendations in the Digital Library Sowiport Proceedings Article
In: Proceedings of the 15th International Symposium of Information Science (ISI), pp. 96–108, 2017.
@inproceedings{Beel2017d,
title = {Stereotype and Most-Popular Recommendations in the Digital Library Sowiport},
author = {Joeran Beel and Siddharth Dinesh and Philipp Mayr and Zeljko Carevic and Jain Raghvendra},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 15th International Symposium of Information Science (ISI)},
volume = {23},
number = {7/8},
pages = {96–108},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Gipp, Bela
TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users' Personal Document Collections Proceedings Article
In: Proceedings of the 12th iConference, 2017.
@inproceedings{Beel2017a,
title = {TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users' Personal Document Collections},
author = {Joeran Beel and Stefan Langer and Bela Gipp},
doi = {10.13140/RG.2.2.18759.39842},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 12th iConference},
abstract = {TF-IDF is one of the most popular term-weighting schemes, and is applied
by search engines, recommender systems, and user modeling engines.
With regard to user modeling and recommender systems, we see two
shortcomings of TF-IDF. First, calculating IDF requires access to
the document corpus from which recommendations are made. Such access
is not always given in a user-modeling or recommender system. Second,
TF-IDF ignores information from a user’s personal document
collection, which could – so we hypothesize – enhance
the user modeling process. In this paper, we introduce TF-IDuF as
a term-weighting scheme that does not require access to the general
document corpus and that considers information from the users’
personal document collections. We evaluated the effectiveness of
TF-IDuF compared to TF-IDF and TF-Only and found that TF-IDF and
TF-IDuF perform similarly (click-through rates (CTR) of 5.09% vs.
5.14%), and both are around 25% more effective than TF-Only (CTR
of 4.06%) for recommending research papers. Consequently, we conclude
that TF-IDuF could be a promising term-weighting scheme, especially
when access to the document corpus for recommendations is not possible,
and thus classic IDF cannot be computed. It is also notable that
TF-IDuF and TF-IDF are not exclusive, so that both metrics may be
combined to a more effective term-weighting scheme.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
TF-IDF is one of the most popular term-weighting schemes, and is applied
by search engines, recommender systems, and user modeling engines.
With regard to user modeling and recommender systems, we see two
shortcomings of TF-IDF. First, calculating IDF requires access to
the document corpus from which recommendations are made. Such access
is not always given in a user-modeling or recommender system. Second,
TF-IDF ignores information from a user’s personal document
collection, which could – so we hypothesize – enhance
the user modeling process. In this paper, we introduce TF-IDuF as
a term-weighting scheme that does not require access to the general
document corpus and that considers information from the users’
personal document collections. We evaluated the effectiveness of
TF-IDuF compared to TF-IDF and TF-Only and found that TF-IDF and
TF-IDuF perform similarly (click-through rates (CTR) of 5.09% vs.
5.14%), and both are around 25% more effective than TF-Only (CTR
of 4.06%) for recommending research papers. Consequently, we conclude
that TF-IDuF could be a promising term-weighting scheme, especially
when access to the document corpus for recommendations is not possible,
and thus classic IDF cannot be computed. It is also notable that
TF-IDuF and TF-IDF are not exclusive, so that both metrics may be
combined to a more effective term-weighting scheme.
Beel, Joeran; Gipp, Bela; Langer, Stefan; Breitinger, Corinna
Research Paper Recommender Systems: A Literature Survey Journal Article
In: International Journal on Digital Libraries, no. 4, pp. 305–338, 2016, ISSN: 1432-5012.
@article{Beel2016a,
title = {Research Paper Recommender Systems: A Literature Survey},
author = {Joeran Beel and Bela Gipp and Stefan Langer and Corinna Breitinger},
doi = {10.1007/s00799-015-0156-0},
issn = {1432-5012},
year = {2016},
date = {2016-01-01},
journal = {International Journal on Digital Libraries},
number = {4},
pages = {305–338},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Breitinger, Corinna; Langer, Stefan; Lommatzsch, Andreas; Gipp, Bela
Towards Reproducibility in Recommender-Systems Research Journal Article
In: User Modeling and User-Adapted Interaction (UMUAI), vol. 26, no. 1, pp. 69-101, 2016.
@article{Beel2016,
title = {Towards Reproducibility in Recommender-Systems Research},
author = {Joeran Beel and Corinna Breitinger and Stefan Langer and Andreas Lommatzsch and Bela Gipp},
doi = {10.1007/s11257-016-9174-x},
year = {2016},
date = {2016-01-01},
journal = {User Modeling and User-Adapted Interaction (UMUAI)},
volume = {26},
number = {1},
pages = {69-101},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Langer, Stefan
A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems Proceedings Article
In: Kapidakis, Sarantos; Mazurek, Cezary; Werla, Marcin (Ed.): Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL), pp. 153-168, 2015.
@inproceedings{Beel2015a,
title = {A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems},
author = {Joeran Beel and Stefan Langer},
editor = {Sarantos Kapidakis and Cezary Mazurek and Marcin Werla},
doi = {10.1007/978-3-319-24592-8_12},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL)},
volume = {9316},
pages = {153-168},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Kapitsaki, Georgia M.; Breitinger, Corinna; Gipp, Bela
Exploring the Potential of User Modeling based on Mind Maps Proceedings Article
In: Ricci, Francesco; Bontcheva, Kalina; Conlan, Owen; Lawless, Séamus (Ed.): Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP), pp. 3-17, Springer, 2015.
@inproceedings{Beel2015b,
title = {Exploring the Potential of User Modeling based on Mind Maps},
author = {Joeran Beel and Stefan Langer and Georgia M. Kapitsaki and Corinna Breitinger and Bela Gipp},
editor = {Francesco Ricci and Kalina Bontcheva and Owen Conlan and Séamus Lawless},
doi = {10.1007/978-3-319-20267-9_1},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP)},
volume = {9146},
pages = {3-17},
publisher = {Springer},
series = {Lecture Notes of Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps Journal Article
In: PhD Thesis. Otto-von-Guericke Universität Magdeburg, 2015.
@article{Beel2015,
title = {Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps},
author = {Joeran Beel},
year = {2015},
date = {2015-01-01},
journal = {PhD Thesis. Otto-von-Guericke Universität Magdeburg},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Langer, Stefan; Gipp, Bela; Nuernberger, Andreas
The Architecture and Datasets of Docear's Research Paper Recommender System Journal Article
In: D-Lib Magazine, vol. 20, no. 11/12, 2014.
@article{Beel2014,
title = {The Architecture and Datasets of Docear's Research Paper Recommender System},
author = {Joeran Beel and Stefan Langer and Bela Gipp and Andreas Nuernberger},
doi = {10.1045/november14-beel},
year = {2014},
date = {2014-01-01},
journal = {D-Lib Magazine},
volume = {20},
number = {11/12},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Langer, Stefan; Beel, Joeran
The Comparability of Recommender System Evaluations and Characteristics of Docear's Users Proceedings Article
In: Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys), pp. 1–6, CEUR-WS, 2014.
@inproceedings{Langer2014,
title = {The Comparability of Recommender System Evaluations and Characteristics of Docear's Users},
author = {Stefan Langer and Joeran Beel},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys)},
pages = {1–6},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Gipp, Bela
Utilizing Mind-Maps for Information Retrieval and User Modelling Proceedings Article
In: Dimitrova, Vania; Kuflik, Tsvi; Chin, David; Ricci, Francesco; Dolog, Peter; Houben, Geert-Jan (Ed.): Proceedings of the 22nd Conference on User Modelling, Adaption, and Personalization (UMAP), pp. 301-313, Springer, 2014.
@inproceedings{Beel2014a,
title = {Utilizing Mind-Maps for Information Retrieval and User Modelling},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Bela Gipp},
editor = {Vania Dimitrova and Tsvi Kuflik and David Chin and Francesco Ricci and Peter Dolog and Geert-Jan Houben},
doi = {10.1007/978-3-319-08786-3_26},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 22nd Conference on User Modelling, Adaption, and Personalization (UMAP)},
volume = {8538},
pages = {301-313},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Nürnberger, Andreas
Persistence in Recommender Systems: Giving the Same Recommendations to the Same Users Multiple Times Proceedings Article
In: Aalberg, Trond; Dobreva, Milena; Papatheodorou, Christos; Tsakonas, Giannis; Farrugia, Charles (Ed.): Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 390–394, Springer, Valletta, Malta, 2013.
@inproceedings{Beel2013e,
title = {Persistence in Recommender Systems: Giving the Same Recommendations to the Same Users Multiple Times},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Andreas Nürnberger},
editor = {Trond Aalberg and Milena Dobreva and Christos Papatheodorou and Giannis Tsakonas and Charles Farrugia},
year = {2013},
date = {2013-09-01},
booktitle = {Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013)},
volume = {8092},
pages = {390–394},
publisher = {Springer},
address = {Valletta, Malta},
series = {Lecture Notes of Computer Science (LNCS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel
Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling Proceedings Article
In: Aalberg, Trond; Dobreva, Milena; Papatheodorou, Christos; Tsakonas, Giannis; Farrugia, Charles (Ed.): Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 395–399, Valletta, Malta, 2013.
@inproceedings{Beel2013a,
title = {Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr},
editor = {Trond Aalberg and Milena Dobreva and Christos Papatheodorou and Giannis Tsakonas and Charles Farrugia},
year = {2013},
date = {2013-09-01},
booktitle = {Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013)},
pages = {395–399},
address = {Valletta, Malta},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Nuenberger, Andreas; Genzmehr, Marcel
The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems Proceedings Article
In: Aalberg, Trond; Dobreva, Milena; Papatheodorou, Christos; Tsakonas, Giannis; Farrugia, Charles (Ed.): Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 400–404, Springer, Valletta, Malta, 2013.
@inproceedings{Beel2013f,
title = {The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems},
author = {Joeran Beel and Stefan Langer and Andreas Nuenberger and Marcel Genzmehr},
editor = {Trond Aalberg and Milena Dobreva and Christos Papatheodorou and Giannis Tsakonas and Charles Farrugia},
year = {2013},
date = {2013-09-01},
booktitle = {Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013)},
pages = {400–404},
publisher = {Springer},
address = {Valletta, Malta},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Gipp, Bela; Nürnberger, Andreas
A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation Proceedings Article
In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), pp. 7-14, 2013.
@inproceedings{Beel2013d,
title = {A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Bela Gipp and Andreas Nürnberger},
doi = {10.1145/2532508.2532511},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys)},
pages = {7-14},
series = {ACM International Conference Proceedings Series (ICPS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Nuernberger, Andreas
Introducing Docear's Research Paper Recommender System Proceedings Article
In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13), pp. 459-460, ACM, 2013.
@inproceedings{Beel2013c,
title = {Introducing Docear's Research Paper Recommender System},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Andreas Nuernberger},
doi = {10.1145/2467696.2467786},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13)},
pages = {459-460},
publisher = {ACM},
series = {ACM International Conference Proceedings Series (ICPS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
Evaluations in Information Retrieval: Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs. (Root) Mean Squared Error (MSE / RMSE) vs. Precision electronic
2012.
@electronic{Beel2012,
title = {Evaluations in Information Retrieval: Click Through Rate (CTR) vs. Mean Absolute Error (MAE) vs. (Root) Mean Squared Error (MSE / RMSE) vs. Precision},
author = {Joeran Beel},
url = {http://www.docear.org/2012/09/21/evaluations-in-information-retrieval-click-through-rate-ctr-vs-mean-absolute-error-mae-vs-root-mean-squared-error-mse-rmse-vs-precision/},
year = {2012},
date = {2012-09-01},
organization = {Docear},
howpublished = {Blog},
keywords = {},
pubstate = {published},
tppubtype = {electronic}
}
Beel, Joeran
Research paper recommendations based on mind maps Book Section
In: Arndt, Hans-Knud; Krcmar, Helmut (Ed.): Very Large Business Applications (VLBA): System Landscapes of the Future, pp. 66–75, Shaker Verlag, 2011.
@incollection{Beel2011a,
title = {Research paper recommendations based on mind maps},
author = {Joeran Beel},
editor = {Hans-Knud Arndt and Helmut Krcmar},
year = {2011},
date = {2011-08-01},
booktitle = {Very Large Business Applications (VLBA): System Landscapes of the Future},
pages = {66–75},
publisher = {Shaker Verlag},
series = {Berichte aus der Wirtschaftsinformatik},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
Beel, Joeran
SciPlore MindMapping now provides literature recommendations (Beta 15) Journal Article
In: http://www.sciplore.org/2011/sciplore-mindmapping-now-provides-literature-recommendations-beta-15/, 2011.
@article{Beel2011,
title = {SciPlore MindMapping now provides literature recommendations (Beta 15)},
author = {Joeran Beel},
year = {2011},
date = {2011-04-01},
journal = {http://www.sciplore.org/2011/sciplore-mindmapping-now-provides-literature-recommendations-beta-15/},
howpublished = {Blog},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Gipp, Bela; Taylor, Adriana; Beel, Joeran
Link Proximity Analysis - Clustering Websites by Examining Link Proximity Proceedings Article
In: Lalmas, M.; Jose, J.; Rauber, A.; Sebastiani, F.; Frommholz, I. (Ed.): Proceedings of the 14th European Conference on Digital Libraries (ECDL'10): Research and Advanced Technology for Digital Libraries, pp. 449–452, Springer, 2010, (Available at: url http://sciplore.org/pub).
@inproceedings{Gipp2010b,
title = {Link Proximity Analysis - Clustering Websites by Examining Link Proximity},
author = {Bela Gipp and Adriana Taylor and Joeran Beel},
editor = {M. Lalmas and J. Jose and A. Rauber and F. Sebastiani and I. Frommholz},
year = {2010},
date = {2010-09-01},
booktitle = {Proceedings of the 14th European Conference on Digital Libraries (ECDL'10): Research and Advanced Technology for Digital Libraries},
volume = {6273},
pages = {449–452},
publisher = {Springer},
series = {Lecture Notes of Computer Science (LNCS)},
note = {Available at: url http://sciplore.org/pub},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Gipp, Bela
Detection of a similarity of documents by Citation Proximity Analysis Patent
2010, (WO/2010/078857).
@patent{Beel2010f,
title = {Detection of a similarity of documents by Citation Proximity Analysis},
author = {Joeran Beel and Bela Gipp},
year = {2010},
date = {2010-01-01},
abstract = {(DE) Die Erfindung betrifft ein computer-implementiertes Verfahren
zum Ermitteln einer Ähnlichkeit zwischen zumindest einem Eingabedokument
und einer Anzahl von Dokumenten. Es werden erste Dokumente und zweite
Dokumente ermittelt, welche direkt oder indirekt von dem Eingabedokument
referenziert werden oder das Eingabedokument referenzieren. Für
jedes ermittelte Dokument wird mindestens ein vorläufiger Ähnlichkeitswert
berechnet. Ist zu einem Dokument mehr als ein vorläufiger Ähnlichkeitswert
berechnet worden, wird aus den vorläufigen Ahnlichkeitswerten
ein endgültiger Ähnlichkeitswert berechnet. Das Verfahren
kann wiederum auf die ermittelten ersten Dokumente und zweiten Dokumente
angewandt werden, um weitere ähnliche Dokumente zu dem Eingabedokument
zu ermitteln und deren Ähnlichkeitswerte zu dem Eingabedokument
zu berechnen. (EN) The invention relates to a computer-implemented
method for detecting a similarity between at least one input document
and a plurality of documents. First documents and second documents
are detected which are directly or indirectly cited by the input
document or which directly or indirectly cite the input document.
At least one preliminary similarity value is calculated for every
detected document. If more than one preliminary similarity value
has been calculated for a document, a final similarity value is calculated
from the preliminary similarity values. The method can then be applied
to the detected first documents and second documents to detect further
documents that are similar to the input document and to calculate
their similarity values to the input document. (FR) L'invention concerne
un procédé informatisé servant àdéterminer une similarité
entre au moins un document d'entrée et un certain nombre de documents.
Ce procédé consiste àdéterminer des premiers et des
deuxièmes documents qui sont directement ou indirectement référencés
par le document d'entrée ou qui référencent le document
d'entrée. Au moins une valeur de similarité provisoire est
calculée pour chaque document déterminé. Si plus d'une valeur
de similarité provisoire a été calculée pour un document,
une valeur de similarité définitive est calculée àpartir
des valeurs de similarité provisoire. Ce procédé peut encore
être appliqué aux premiers et aux deuxièmes documents déterminés
pour déterminer d'autres documents similaires au document d'entrée
et pour calculer leurs valeurs de similarité par rapport au document
d'entrée.},
howpublished = {Patent Application},
note = {WO/2010/078857},
keywords = {},
pubstate = {published},
tppubtype = {patent}
}
(DE) Die Erfindung betrifft ein computer-implementiertes Verfahren
zum Ermitteln einer Ähnlichkeit zwischen zumindest einem Eingabedokument
und einer Anzahl von Dokumenten. Es werden erste Dokumente und zweite
Dokumente ermittelt, welche direkt oder indirekt von dem Eingabedokument
referenziert werden oder das Eingabedokument referenzieren. Für
jedes ermittelte Dokument wird mindestens ein vorläufiger Ähnlichkeitswert
berechnet. Ist zu einem Dokument mehr als ein vorläufiger Ähnlichkeitswert
berechnet worden, wird aus den vorläufigen Ahnlichkeitswerten
ein endgültiger Ähnlichkeitswert berechnet. Das Verfahren
kann wiederum auf die ermittelten ersten Dokumente und zweiten Dokumente
angewandt werden, um weitere ähnliche Dokumente zu dem Eingabedokument
zu ermitteln und deren Ähnlichkeitswerte zu dem Eingabedokument
zu berechnen. (EN) The invention relates to a computer-implemented
method for detecting a similarity between at least one input document
and a plurality of documents. First documents and second documents
are detected which are directly or indirectly cited by the input
document or which directly or indirectly cite the input document.
At least one preliminary similarity value is calculated for every
detected document. If more than one preliminary similarity value
has been calculated for a document, a final similarity value is calculated
from the preliminary similarity values. The method can then be applied
to the detected first documents and second documents to detect further
documents that are similar to the input document and to calculate
their similarity values to the input document. (FR) L'invention concerne
un procédé informatisé servant àdéterminer une similarité
entre au moins un document d'entrée et un certain nombre de documents.
Ce procédé consiste àdéterminer des premiers et des
deuxièmes documents qui sont directement ou indirectement référencés
par le document d'entrée ou qui référencent le document
d'entrée. Au moins une valeur de similarité provisoire est
calculée pour chaque document déterminé. Si plus d'une valeur
de similarité provisoire a été calculée pour un document,
une valeur de similarité définitive est calculée àpartir
des valeurs de similarité provisoire. Ce procédé peut encore
être appliqué aux premiers et aux deuxièmes documents déterminés
pour déterminer d'autres documents similaires au document d'entrée
et pour calculer leurs valeurs de similarité par rapport au document
d'entrée.
Gipp, Bela; Beel, Joeran
Integrating Citation Proximity Analysis into Google Books and Google Scholar Proceedings Article
In: Google Inc. Mountain View (USA), 2010, (Invited Talk.).
@inproceedings{Gipp2010,
title = {Integrating Citation Proximity Analysis into Google Books and Google Scholar},
author = {Bela Gipp and Joeran Beel},
year = {2010},
date = {2010-01-01},
address = {Mountain View (USA)},
organization = {Google Inc.},
note = {Invited Talk.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Gipp, Bela.
Link analysis in mind maps: a new approach to determining document relatedness Proceedings Article
In: Lalmas, M; Jose, J; Rauber, A; Sebastiani, R; Frommholz, I (Ed.): Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication (ICUIMC '10), pp. 38:1–38:5, ACM Springer, Glasgow (UK), 2010, (Doctoral Consortium).
@inproceedings{Beel2010b,
title = {Link analysis in mind maps: a new approach to determining document relatedness},
author = {Joeran Beel and Bela. Gipp},
editor = {M Lalmas and J Jose and A Rauber and R Sebastiani and I Frommholz},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication (ICUIMC '10)},
volume = {6273},
pages = {38:1–38:5},
publisher = {Springer},
address = {Glasgow (UK)},
organization = {ACM},
series = {Lecture Notes of Computer Science (LNCS)},
note = {Doctoral Consortium},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
Retrieving Data from Mind Maps to Enhance Search Applications Journal Article
In: Bulletin of IEEE Technical Committee on Digital Libraries, vol. 6, no. 2, 2010, (Available at http://docear.org).
@article{Beel2010d,
title = {Retrieving Data from Mind Maps to Enhance Search Applications},
author = {Joeran Beel},
year = {2010},
date = {2010-01-01},
journal = {Bulletin of IEEE Technical Committee on Digital Libraries},
volume = {6},
number = {2},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Gipp, Bela; Stiller, Jan-Olaf
Information Retrieval on Mind Maps - What could it be good for? Proceedings Article
In: Proceedings of the 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'09), pp. 1–4, IEEE, Washington (USA), 2009, (Available at http://docear.org).
@inproceedings{Beel2009f,
title = {Information Retrieval on Mind Maps - What could it be good for?},
author = {Joeran Beel and Bela Gipp and Jan-Olaf Stiller},
url = {https://ieeexplore.ieee.org/abstract/document/5364172
https://isg.beel.org/pubs/Information Retrieval on Mind Maps - What could it be good for –preprint.pdf},
year = {2009},
date = {2009-11-01},
booktitle = {Proceedings of the 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'09)},
pages = {1–4},
publisher = {IEEE},
address = {Washington (USA)},
abstract = {Mind maps are used by millions of people. In this paper we present how information retrieval on mind maps could be used to enhance expert search, document summarization, keyword based search engines, document recommender systems and determining word relatedness. For instance, words in a mind map could be used for creating a skill profile of the mind maps' author and hence enhance expert search. This paper is a research-in-progress paper which means no research results are presented but only ideas.},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Mind maps are used by millions of people. In this paper we present how information retrieval on mind maps could be used to enhance expert search, document summarization, keyword based search engines, document recommender systems and determining word relatedness. For instance, words in a mind map could be used for creating a skill profile of the mind maps' author and hence enhance expert search. This paper is a research-in-progress paper which means no research results are presented but only ideas.
Gipp, Bela; Beel, Joeran
Identifying Related Documents For Research Paper Recommender By CPA And COA Proceedings Article
In: Ao, S. I.; Douglas, C.; Grundfest, W. S.; Burgstone, J. (Ed.): Proceedings of The World Congress on Engineering and Computer Science 2009, pp. 636–639, International Association of Engineers (IAENG) Newswood Limited, Berkeley (USA), 2009, ISBN: 978-988-17012-6-8, (Available at: url http://sciplore.org/pub).
@inproceedings{Gipp2009,
title = {Identifying Related Documents For Research Paper Recommender By CPA And COA},
author = {Bela Gipp and Joeran Beel},
editor = {S. I. Ao and C. Douglas and W. S. Grundfest and J. Burgstone},
url = {https://www.iaeng.org/publication/WCECS2009/WCECS2009_pp636-639.pdf},
isbn = {978-988-17012-6-8},
year = {2009},
date = {2009-10-01},
booktitle = {Proceedings of The World Congress on Engineering and Computer Science 2009},
volume = {1},
pages = {636–639},
publisher = {Newswood Limited},
address = {Berkeley (USA)},
organization = {International Association of Engineers (IAENG)},
series = {Lecture Notes in Engineering and Computer Science},
abstract = {This work-in-progress paper introduces two new approaches called Citation Proximity Analysis (CPA) and Citation Order Analysis (COA). They can be applied to identify related documents for the purpose of research paper recommender systems. CPA is a variant of co-citation analysis that additionally considers the proximity of citations to each other within an article’s full-text. The underlying idea is that the closer citations are to each other in a document, the more likely it is that the cited documents are related. For example, citations listed in the same sentence are more likely to express related thoughts than citations listed only in the same section. In COA, the order of citations are considered, allowing the identification of a text similar to one that has been translated from language A to language B, as the citations would still occur in the same order. However, it is also shown that CPA and COA cannot replace text analysis and existing citation analysis approaches for research paper recommender systems since they all have their own strengths and weaknesses.},
note = {Available at: url http://sciplore.org/pub},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This work-in-progress paper introduces two new approaches called Citation Proximity Analysis (CPA) and Citation Order Analysis (COA). They can be applied to identify related documents for the purpose of research paper recommender systems. CPA is a variant of co-citation analysis that additionally considers the proximity of citations to each other within an article’s full-text. The underlying idea is that the closer citations are to each other in a document, the more likely it is that the cited documents are related. For example, citations listed in the same sentence are more likely to express related thoughts than citations listed only in the same section. In COA, the order of citations are considered, allowing the identification of a text similar to one that has been translated from language A to language B, as the citations would still occur in the same order. However, it is also shown that CPA and COA cannot replace text analysis and existing citation analysis approaches for research paper recommender systems since they all have their own strengths and weaknesses.
Gipp, Bela; Beel, Joeran
Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis Proceedings Article
In: Larsen, Birger; Leta, Jacqueline (Ed.): Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09), pp. 571–575, International Society for Scientometrics and Informetrics, Rio de Janeiro (Brazil), 2009, (ISSN 2175-1935. Available at: url http://sciplore.org/pub).
@inproceedings{Gipp2009a,
title = {Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis},
author = {Bela Gipp and Joeran Beel},
editor = {Birger Larsen and Jacqueline Leta},
year = {2009},
date = {2009-07-01},
booktitle = {Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09)},
volume = {2},
pages = {571–575},
publisher = {International Society for Scientometrics and Informetrics},
address = {Rio de Janeiro (Brazil)},
note = {ISSN 2175-1935. Available at: url http://sciplore.org/pub},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Gipp, Bela; Beel, Joeran; Hentschel, Christian
Scienstein: A Research Paper Recommender System Proceedings Article
In: Proceedings of the International Conference on Emerging Trends in Computing (ICETiC'09), pp. 309–315, Kamaraj College of Engineering and Technology India IEEE, Virudhunagar (India), 2009, (Available at: url http://sciplore.org/pub).
@inproceedings{Gipp2009b,
title = {Scienstein: A Research Paper Recommender System},
author = {Bela Gipp and Joeran Beel and Christian Hentschel},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the International Conference on Emerging Trends in Computing (ICETiC'09)},
pages = {309–315},
publisher = {IEEE},
address = {Virudhunagar (India)},
organization = {Kamaraj College of Engineering and Technology India},
note = {Available at: url http://sciplore.org/pub},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Literature Surveys
Beel, Joeran; Gipp, Bela; Langer, Stefan; Breitinger, Corinna
Research Paper Recommender Systems: A Literature Survey Journal Article
In: International Journal on Digital Libraries, no. 4, pp. 305–338, 2016, ISSN: 1432-5012.
@article{Beel2016a,
title = {Research Paper Recommender Systems: A Literature Survey},
author = {Joeran Beel and Bela Gipp and Stefan Langer and Corinna Breitinger},
doi = {10.1007/s00799-015-0156-0},
issn = {1432-5012},
year = {2016},
date = {2016-01-01},
journal = {International Journal on Digital Libraries},
number = {4},
pages = {305–338},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Gipp, Bela; Nürnberger, Andreas
A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation Proceedings Article
In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), pp. 7-14, 2013.
@inproceedings{Beel2013d,
title = {A Comparative Analysis of Offline and Online Evaluations and Discussion of Research Paper Recommender System Evaluation},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Bela Gipp and Andreas Nürnberger},
doi = {10.1145/2532508.2532511},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys)},
pages = {7-14},
series = {ACM International Conference Proceedings Series (ICPS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
RaaS (Recommendations-as-a-Service) & APIs
Wegmeth, Lukas; Beel, Joeran
CaMeLS: Cooperative Meta-Learning Service for Recommender Systems Proceedings Article
In: Proceedings of the 2nd Perspectives on the Evaluation of Recommender Systems Workshop, pp. 10–18, 2022.
@inproceedings{Wegmeth2022,
title = {CaMeLS: Cooperative Meta-Learning Service for Recommender Systems},
author = {Lukas Wegmeth and Joeran Beel},
url = {https://ceur-ws.org/Vol-3228/paper2.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the 2nd Perspectives on the Evaluation of Recommender Systems Workshop},
pages = {10–18},
abstract = {We present CaMeLS, a proof of concept of a cooperative meta-learning service for recommender systems. CaMeLS leverages the computing power of recommender systems users by uploading their metadata and algorithm evaluation scores to a centralized environment. Through the resulting database, CaMeLS then offers meta-learning services for everyone. Additionally, users may access evaluations of common data sets immediately to know the best-performing algorithms for those data sets. The metadata table may also be used for other purposes, eg, to perform benchmarks. In the initial version discussed in this paper, CaMeLS implements automatic algorithm selection through meta-learning over two recommender systems libraries. Automatic algorithm selection saves users time and computing power and does not require expertise, as the best algorithm is automatically found over multiple libraries. The CaMeLS database contains 20 metadata sets by default. We show that the automatic algorithm selection service is already on par with the single best algorithm in this default scenario. CaMeLS only requires a few seconds to predict a suitable algorithm, rather than potentially hours or days if performed manually, depending on the data set. The code is publicly available on our GitHub https://camels. recommender-systems.com.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We present CaMeLS, a proof of concept of a cooperative meta-learning service for recommender systems. CaMeLS leverages the computing power of recommender systems users by uploading their metadata and algorithm evaluation scores to a centralized environment. Through the resulting database, CaMeLS then offers meta-learning services for everyone. Additionally, users may access evaluations of common data sets immediately to know the best-performing algorithms for those data sets. The metadata table may also be used for other purposes, eg, to perform benchmarks. In the initial version discussed in this paper, CaMeLS implements automatic algorithm selection through meta-learning over two recommender systems libraries. Automatic algorithm selection saves users time and computing power and does not require expertise, as the best algorithm is automatically found over multiple libraries. The CaMeLS database contains 20 metadata sets by default. We show that the automatic algorithm selection service is already on par with the single best algorithm in this default scenario. CaMeLS only requires a few seconds to predict a suitable algorithm, rather than potentially hours or days if performed manually, depending on the data set. The code is publicly available on our GitHub https://camels. recommender-systems.com.
Wegmeth, Lukas; Beel, Joeran
Cooperative Meta-Learning Service for Recommender Systems Journal Article
In: COSEAL Workshop 2022, 2022.
@article{Wegmeth2022a,
title = {Cooperative Meta-Learning Service for Recommender Systems},
author = {Lukas Wegmeth and Joeran Beel},
url = {http://dx.doi.org/10.13140/RG.2.2.10667.41768},
year = {2022},
date = {2022-01-01},
journal = {COSEAL Workshop 2022},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Arambakam, Mukesh; Beel, Joeran
Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries Proceedings Article
In: 7th ICML Workshop on Automated Machine Learning, pp. 1–8, 2020.
@inproceedings{Arambakam2020,
title = {Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries},
author = {Mukesh Arambakam and Joeran Beel},
url = {https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_39.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {7th ICML Workshop on Automated Machine Learning},
pages = {1–8},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Griffin, Alan; O'Shey, Conor
Darwin & Goliath: Recommendations-As-a-Service with Automated Algorithm-Selection and White-Labels Proceedings Article
In: Azzopardi, Leif; Stein, Benno; Fuhr, Norbert; Mayr, Philipp; Hauff, Claudia; Hiemstra, Djoerd (Ed.): 13th ACM Conference on Recommender Systems (RecSys), pp. 213–219, 2019.
@inproceedings{Beel2019b,
title = {Darwin & Goliath: Recommendations-As-a-Service with Automated Algorithm-Selection and White-Labels},
author = {Joeran Beel and Alan Griffin and Conor O'Shey},
editor = {Leif Azzopardi and Benno Stein and Norbert Fuhr and Philipp Mayr and Claudia Hauff and Djoerd Hiemstra},
doi = {10.1007/978-3-030-15719-7_27},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
volume = {11438},
pages = {213–219},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Collins, Andrew; Aizawa, Akiko
The Architecture of Mr. DLib’s Scientific Recommender-System API Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 78–89, CEUR-WS, 2018.
@inproceedings{Beel2018,
title = {The Architecture of Mr. DLib’s Scientific Recommender-System API},
author = {Joeran Beel and Andrew Collins and Akiko Aizawa},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {78–89},
publisher = {CEUR-WS},
abstract = {Recommender systems in academia are not widely available. This may
be in part due to the difficulty and cost of developing and maintaining
recommender systems. Many operators of academic products such as
digital libraries and reference managers avoid this effort, although
a recommender system could provide significant benefits to their
users. In this paper, we introduce Mr. DLib’s “Recommendations as-a-Service"
(RaaS) API that allows operators of academic products to easily integrate
a scientific recommender system into their products. Mr. DLib generates
recommendations for research articles but in the future, recommendations
may include call for papers, grants, etc. Operators of academic products
can request recommendations from Mr. DLib and display these recommendations
to their users. Mr. DLib can be integrated in just a few hours or
days; creating an equivalent recommender system from scratch would
require several months for an academic operator. Mr. DLib has been
used by GESIS´ Sowiport and by the reference manager JabRef. Mr.
DLib is open source and its goal is to facilitate the application
of, and research on, scientific recommender systems. In this paper,
we present the motivation for Mr. DLib, the architecture and details
about the effectiveness. Mr. DLib has delivered 94m recommendations
over a span of two years with an average click-through rate of 0.12%.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Recommender systems in academia are not widely available. This may
be in part due to the difficulty and cost of developing and maintaining
recommender systems. Many operators of academic products such as
digital libraries and reference managers avoid this effort, although
a recommender system could provide significant benefits to their
users. In this paper, we introduce Mr. DLib’s “Recommendations as-a-Service"
(RaaS) API that allows operators of academic products to easily integrate
a scientific recommender system into their products. Mr. DLib generates
recommendations for research articles but in the future, recommendations
may include call for papers, grants, etc. Operators of academic products
can request recommendations from Mr. DLib and display these recommendations
to their users. Mr. DLib can be integrated in just a few hours or
days; creating an equivalent recommender system from scratch would
require several months for an academic operator. Mr. DLib has been
used by GESIS´ Sowiport and by the reference manager JabRef. Mr.
DLib is open source and its goal is to facilitate the application
of, and research on, scientific recommender systems. In this paper,
we present the motivation for Mr. DLib, the architecture and details
about the effectiveness. Mr. DLib has delivered 94m recommendations
over a span of two years with an average click-through rate of 0.12%.
Beel, Joeran; Aizawa, Akiko; Breitinger, Corinna; Gipp, Bela
Mr. DLib: Recommendations-as-a-service (RaaS) for Academia Proceedings Article
In: Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, pp. 313–314, IEEE Press, Toronto, Ontario, Canada, 2017, ISBN: 978-1-5386-3861-3.
@inproceedings{Beel2017c,
title = {Mr. DLib: Recommendations-as-a-service (RaaS) for Academia},
author = {Joeran Beel and Akiko Aizawa and Corinna Breitinger and Bela Gipp},
url = {http://dl.acm.org/citation.cfm?id=3200334.3200389},
isbn = {978-1-5386-3861-3},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries},
pages = {313–314},
publisher = {IEEE Press},
address = {Toronto, Ontario, Canada},
series = {JCDL '17},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Gipp, Bela; Nuernberger, Andreas
The Architecture and Datasets of Docear's Research Paper Recommender System Journal Article
In: D-Lib Magazine, vol. 20, no. 11/12, 2014.
@article{Beel2014,
title = {The Architecture and Datasets of Docear's Research Paper Recommender System},
author = {Joeran Beel and Stefan Langer and Bela Gipp and Andreas Nuernberger},
doi = {10.1045/november14-beel},
year = {2014},
date = {2014-01-01},
journal = {D-Lib Magazine},
volume = {20},
number = {11/12},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Real-World Recommender Systems
Vente, Tobias; Ekstrand, Michael; Beel, Joeran
Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit Proceedings Article
In: Proceedings of the 17th ACM Conference on Recommender Systems, pp. 1212-1216, 2023.
@inproceedings{Vente2023a,
title = {Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit},
author = {Tobias Vente and Michael Ekstrand and Joeran Beel},
url = {https://dl.acm.org/doi/10.1145/3604915.3610656},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 17th ACM Conference on Recommender Systems},
pages = {1212-1216},
abstract = {LensKit is one of the first and most popular Recommender System libraries. While LensKit offers a wide variety of features, it does not include any optimization strategies or guidelines on how to select and tune LensKit algorithms. LensKit developers have to manually include third-party libraries into their experimental setup or implement optimization strategies by hand to optimize hyperparameters. We found that 63.6% (21 out of 33) of papers using LensKit algorithms for their experiments did not select algorithms or tune hyperparameters. Non-optimized models represent poor baselines and produce less meaningful research results. This demo introduces LensKit-Auto. LensKit-Auto automates the entire Recommender System pipeline and enables LensKit developers to automatically select, optimize, and ensemble LensKit algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
LensKit is one of the first and most popular Recommender System libraries. While LensKit offers a wide variety of features, it does not include any optimization strategies or guidelines on how to select and tune LensKit algorithms. LensKit developers have to manually include third-party libraries into their experimental setup or implement optimization strategies by hand to optimize hyperparameters. We found that 63.6% (21 out of 33) of papers using LensKit algorithms for their experiments did not select algorithms or tune hyperparameters. Non-optimized models represent poor baselines and produce less meaningful research results. This demo introduces LensKit-Auto. LensKit-Auto automates the entire Recommender System pipeline and enables LensKit developers to automatically select, optimize, and ensemble LensKit algorithms.
Wegmeth, Lukas; Beel, Joeran
CaMeLS: Cooperative Meta-Learning Service for Recommender Systems Proceedings Article
In: Proceedings of the 2nd Perspectives on the Evaluation of Recommender Systems Workshop, pp. 10–18, 2022.
@inproceedings{Wegmeth2022,
title = {CaMeLS: Cooperative Meta-Learning Service for Recommender Systems},
author = {Lukas Wegmeth and Joeran Beel},
url = {https://ceur-ws.org/Vol-3228/paper2.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the 2nd Perspectives on the Evaluation of Recommender Systems Workshop},
pages = {10–18},
abstract = {We present CaMeLS, a proof of concept of a cooperative meta-learning service for recommender systems. CaMeLS leverages the computing power of recommender systems users by uploading their metadata and algorithm evaluation scores to a centralized environment. Through the resulting database, CaMeLS then offers meta-learning services for everyone. Additionally, users may access evaluations of common data sets immediately to know the best-performing algorithms for those data sets. The metadata table may also be used for other purposes, eg, to perform benchmarks. In the initial version discussed in this paper, CaMeLS implements automatic algorithm selection through meta-learning over two recommender systems libraries. Automatic algorithm selection saves users time and computing power and does not require expertise, as the best algorithm is automatically found over multiple libraries. The CaMeLS database contains 20 metadata sets by default. We show that the automatic algorithm selection service is already on par with the single best algorithm in this default scenario. CaMeLS only requires a few seconds to predict a suitable algorithm, rather than potentially hours or days if performed manually, depending on the data set. The code is publicly available on our GitHub https://camels. recommender-systems.com.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We present CaMeLS, a proof of concept of a cooperative meta-learning service for recommender systems. CaMeLS leverages the computing power of recommender systems users by uploading their metadata and algorithm evaluation scores to a centralized environment. Through the resulting database, CaMeLS then offers meta-learning services for everyone. Additionally, users may access evaluations of common data sets immediately to know the best-performing algorithms for those data sets. The metadata table may also be used for other purposes, eg, to perform benchmarks. In the initial version discussed in this paper, CaMeLS implements automatic algorithm selection through meta-learning over two recommender systems libraries. Automatic algorithm selection saves users time and computing power and does not require expertise, as the best algorithm is automatically found over multiple libraries. The CaMeLS database contains 20 metadata sets by default. We show that the automatic algorithm selection service is already on par with the single best algorithm in this default scenario. CaMeLS only requires a few seconds to predict a suitable algorithm, rather than potentially hours or days if performed manually, depending on the data set. The code is publicly available on our GitHub https://camels. recommender-systems.com.
Gupta, Srijan; Beel, Joeran
Auto-CaseRec: Automatically Selecting and Optimizing Recommendation-Systems Algorithms Journal Article
In: OSF Preprints DOI:10.31219/osf.io/4znmd,, 2020.
@article{Gupta2020,
title = {Auto-CaseRec: Automatically Selecting and Optimizing Recommendation-Systems Algorithms},
author = {Srijan Gupta and Joeran Beel},
doi = {10.31219/osf.io/4znmd},
year = {2020},
date = {2020-01-01},
journal = {OSF Preprints DOI:10.31219/osf.io/4znmd,},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Anand, Rohan; Beel, Joeran
Auto-Surprise: An Automated Recommender-System (AutoRecSys) Library with Tree of Parzens Estimator (TPE) Optimization Proceedings Article
In: 14th ACM Conference on Recommender Systems (RecSys), pp. 1–4, 2020.
@inproceedings{Anand2020,
title = {Auto-Surprise: An Automated Recommender-System (AutoRecSys) Library with Tree of Parzens Estimator (TPE) Optimization},
author = {Rohan Anand and Joeran Beel},
url = {https://arxiv.org/abs/2008.13532},
year = {2020},
date = {2020-01-01},
booktitle = {14th ACM Conference on Recommender Systems (RecSys)},
pages = {1–4},
abstract = {We introduce Auto-Surprise, an Automated Recommender System library. Auto-Surprise is an extension of the Surprise recommender system library and eases the algorithm selection and configuration process. Compared to out-of-the-box Surprise library, Auto-Surprise performs better when evaluated with MovieLens, Book Crossing and Jester Datasets. It may also result in the selection of an algorithm with significantly lower runtime. Compared to Surprise's grid search, Auto-Surprise performs equally well or slightly better in terms of RMSE, and is notably faster in finding the optimum hyperparameters.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We introduce Auto-Surprise, an Automated Recommender System library. Auto-Surprise is an extension of the Surprise recommender system library and eases the algorithm selection and configuration process. Compared to out-of-the-box Surprise library, Auto-Surprise performs better when evaluated with MovieLens, Book Crossing and Jester Datasets. It may also result in the selection of an algorithm with significantly lower runtime. Compared to Surprise's grid search, Auto-Surprise performs equally well or slightly better in terms of RMSE, and is notably faster in finding the optimum hyperparameters.
Beel, Joeran; Griffin, Alan; O'Shey, Conor
Darwin & Goliath: Recommendations-As-a-Service with Automated Algorithm-Selection and White-Labels Proceedings Article
In: Azzopardi, Leif; Stein, Benno; Fuhr, Norbert; Mayr, Philipp; Hauff, Claudia; Hiemstra, Djoerd (Ed.): 13th ACM Conference on Recommender Systems (RecSys), pp. 213–219, 2019.
@inproceedings{Beel2019b,
title = {Darwin & Goliath: Recommendations-As-a-Service with Automated Algorithm-Selection and White-Labels},
author = {Joeran Beel and Alan Griffin and Conor O'Shey},
editor = {Leif Azzopardi and Benno Stein and Norbert Fuhr and Philipp Mayr and Claudia Hauff and Djoerd Hiemstra},
doi = {10.1007/978-3-030-15719-7_27},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
volume = {11438},
pages = {213–219},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Smyth, Barry; Collins, Andrew
RARD II: The 94 Million Related-Article Recommendation Dataset Proceedings Article
In: Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR), pp. 39–55, CEUR-WS, 2019.
@inproceedings{Beel2019e,
title = {RARD II: The 94 Million Related-Article Recommendation Dataset},
author = {Joeran Beel and Barry Smyth and Andrew Collins},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
pages = {39–55},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Collins, Andrew; Aizawa, Akiko
The Architecture of Mr. DLib’s Scientific Recommender-System API Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 78–89, CEUR-WS, 2018.
@inproceedings{Beel2018,
title = {The Architecture of Mr. DLib’s Scientific Recommender-System API},
author = {Joeran Beel and Andrew Collins and Akiko Aizawa},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {78–89},
publisher = {CEUR-WS},
abstract = {Recommender systems in academia are not widely available. This may
be in part due to the difficulty and cost of developing and maintaining
recommender systems. Many operators of academic products such as
digital libraries and reference managers avoid this effort, although
a recommender system could provide significant benefits to their
users. In this paper, we introduce Mr. DLib’s “Recommendations as-a-Service"
(RaaS) API that allows operators of academic products to easily integrate
a scientific recommender system into their products. Mr. DLib generates
recommendations for research articles but in the future, recommendations
may include call for papers, grants, etc. Operators of academic products
can request recommendations from Mr. DLib and display these recommendations
to their users. Mr. DLib can be integrated in just a few hours or
days; creating an equivalent recommender system from scratch would
require several months for an academic operator. Mr. DLib has been
used by GESIS´ Sowiport and by the reference manager JabRef. Mr.
DLib is open source and its goal is to facilitate the application
of, and research on, scientific recommender systems. In this paper,
we present the motivation for Mr. DLib, the architecture and details
about the effectiveness. Mr. DLib has delivered 94m recommendations
over a span of two years with an average click-through rate of 0.12%.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Recommender systems in academia are not widely available. This may
be in part due to the difficulty and cost of developing and maintaining
recommender systems. Many operators of academic products such as
digital libraries and reference managers avoid this effort, although
a recommender system could provide significant benefits to their
users. In this paper, we introduce Mr. DLib’s “Recommendations as-a-Service"
(RaaS) API that allows operators of academic products to easily integrate
a scientific recommender system into their products. Mr. DLib generates
recommendations for research articles but in the future, recommendations
may include call for papers, grants, etc. Operators of academic products
can request recommendations from Mr. DLib and display these recommendations
to their users. Mr. DLib can be integrated in just a few hours or
days; creating an equivalent recommender system from scratch would
require several months for an academic operator. Mr. DLib has been
used by GESIS´ Sowiport and by the reference manager JabRef. Mr.
DLib is open source and its goal is to facilitate the application
of, and research on, scientific recommender systems. In this paper,
we present the motivation for Mr. DLib, the architecture and details
about the effectiveness. Mr. DLib has delivered 94m recommendations
over a span of two years with an average click-through rate of 0.12%.
Langer, Stefan; Beel, Joeran
Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned Proceedings Article
In: 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR), pp. 85-92, 2017.
@inproceedings{Langer2017,
title = {Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned},
author = {Stefan Langer and Joeran Beel},
year = {2017},
date = {2017-01-01},
booktitle = {5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR)},
pages = {85-92},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Feyer, Stefan; Siebert, Sophie; Gipp, Bela; Aizawa, Akiko; Beel, Joeran
Integration of the Scientific Recommender System Mr. DLib into the Reference Manager JabRef Proceedings Article
In: Proceedings of the 39th European Conference on Information Retrieval (ECIR), pp. 770–774, 2017.
@inproceedings{Feyer2017,
title = {Integration of the Scientific Recommender System Mr. DLib into the Reference Manager JabRef},
author = {Stefan Feyer and Sophie Siebert and Bela Gipp and Akiko Aizawa and Joeran Beel},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 39th European Conference on Information Retrieval (ECIR)},
pages = {770–774},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Aizawa, Akiko; Breitinger, Corinna; Gipp, Bela
Mr. DLib: Recommendations-as-a-service (RaaS) for Academia Proceedings Article
In: Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, pp. 313–314, IEEE Press, Toronto, Ontario, Canada, 2017, ISBN: 978-1-5386-3861-3.
@inproceedings{Beel2017c,
title = {Mr. DLib: Recommendations-as-a-service (RaaS) for Academia},
author = {Joeran Beel and Akiko Aizawa and Corinna Breitinger and Bela Gipp},
url = {http://dl.acm.org/citation.cfm?id=3200334.3200389},
isbn = {978-1-5386-3861-3},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries},
pages = {313–314},
publisher = {IEEE Press},
address = {Toronto, Ontario, Canada},
series = {JCDL '17},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Dinesh, Siddharth
Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them Proceedings Article
In: Mayr, Philipp; Frommholz, Ingo; Cabanac, Guillaume (Ed.): Proceedings of the Fifth Workshop on Bibliometric-enhanced Information Retrieval (BIR) co-located with the 39th European Conference on Information Retrieval (ECIR 2017), pp. 6-17, 2017.
@inproceedings{Beel2017e,
title = {Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them},
author = {Joeran Beel and Siddharth Dinesh},
editor = {Philipp Mayr and Ingo Frommholz and Guillaume Cabanac},
url = {http://ceur-ws.org/Vol-1823/paper1.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the Fifth Workshop on Bibliometric-enhanced Information Retrieval (BIR) co-located with the 39th European Conference on Information Retrieval (ECIR 2017)},
volume = {1823},
pages = {6-17},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Dinesh, Siddharth
Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them [Long Version] Proceedings Article
In: arxiv pre-print. https://arxiv.org/abs/1704.00156, Harvard Dataverse, 2017.
@inproceedings{Beel2017f,
title = {Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them [Long Version]},
author = {Joeran Beel and Siddharth Dinesh},
url = {http://dx.doi.org/10.7910/DVN/HFIV1A},
doi = {10.7910/DVN/HFIV1A},
year = {2017},
date = {2017-01-01},
booktitle = {arxiv pre-print. https://arxiv.org/abs/1704.00156},
publisher = {Harvard Dataverse},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Gipp, Bela; Nuernberger, Andreas
The Architecture and Datasets of Docear's Research Paper Recommender System Journal Article
In: D-Lib Magazine, vol. 20, no. 11/12, 2014.
@article{Beel2014,
title = {The Architecture and Datasets of Docear's Research Paper Recommender System},
author = {Joeran Beel and Stefan Langer and Bela Gipp and Andreas Nuernberger},
doi = {10.1045/november14-beel},
year = {2014},
date = {2014-01-01},
journal = {D-Lib Magazine},
volume = {20},
number = {11/12},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Nuernberger, Andreas
Introducing Docear's Research Paper Recommender System Proceedings Article
In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13), pp. 459-460, ACM, 2013.
@inproceedings{Beel2013c,
title = {Introducing Docear's Research Paper Recommender System},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Andreas Nuernberger},
doi = {10.1145/2467696.2467786},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13)},
pages = {459-460},
publisher = {ACM},
series = {ACM International Conference Proceedings Series (ICPS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
SciPlore MindMapping now provides literature recommendations (Beta 15) Journal Article
In: http://www.sciplore.org/2011/sciplore-mindmapping-now-provides-literature-recommendations-beta-15/, 2011.
@article{Beel2011,
title = {SciPlore MindMapping now provides literature recommendations (Beta 15)},
author = {Joeran Beel},
year = {2011},
date = {2011-04-01},
journal = {http://www.sciplore.org/2011/sciplore-mindmapping-now-provides-literature-recommendations-beta-15/},
howpublished = {Blog},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Recommender Systems for Digital Libraries
Molloy, Paul; Beel, Joeran; Aizawa, Akiko
Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents Proceedings Article
In: Proceedings of the 8th International Workshop on Mining Scientific Publications, pp. 1–8, Association for Computational Linguistics, Wuhan, China, 2020.
@inproceedings{Molloy2020,
title = {Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents},
author = {Paul Molloy and Joeran Beel and Akiko Aizawa},
url = {https://www.aclweb.org/anthology/2020.wosp-1.1},
year = {2020},
date = {2020-08-01},
booktitle = {Proceedings of the 8th International Workshop on Mining Scientific Publications},
pages = {1–8},
publisher = {Association for Computational Linguistics},
address = {Wuhan, China},
abstract = {The relatedness of research articles, patents, court rulings, web
pages, and other document types is often calculated with citation
or hyperlink-based approaches like co-citation (proximity) analysis.
The main limitation of citation-based approaches is that they cannot
be used for documents that receive little or no citations. We propose
Virtual Citation Proximity (VCP), a Siamese Neural Network architecture,
which combines the advantages of co-citation proximity analysis (diverse
notions of relatedness / high recommendation performance), with the
advantage of content-based filtering (high coverage). VCP is trained
on a corpus of documents with textual features, and with real citation
proximity as ground truth. VCP then predicts for any two documents,
based on their title and abstract, in what proximity the two documents
would be co-cited, if they were indeed co-cited. The prediction can
be used in the same way as real citation proximity to calculate document
relatedness, even for uncited documents. In our evaluation with 2
million co-citations from Wikipedia articles, VCP achieves an MAE
of 0.0055, i.e. an improvement of 20% over the baseline, though
the learning curve suggests that more work is needed.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The relatedness of research articles, patents, court rulings, web
pages, and other document types is often calculated with citation
or hyperlink-based approaches like co-citation (proximity) analysis.
The main limitation of citation-based approaches is that they cannot
be used for documents that receive little or no citations. We propose
Virtual Citation Proximity (VCP), a Siamese Neural Network architecture,
which combines the advantages of co-citation proximity analysis (diverse
notions of relatedness / high recommendation performance), with the
advantage of content-based filtering (high coverage). VCP is trained
on a corpus of documents with textual features, and with real citation
proximity as ground truth. VCP then predicts for any two documents,
based on their title and abstract, in what proximity the two documents
would be co-cited, if they were indeed co-cited. The prediction can
be used in the same way as real citation proximity to calculate document
relatedness, even for uncited documents. In our evaluation with 2
million co-citations from Wikipedia articles, VCP achieves an MAE
of 0.0055, i.e. an improvement of 20% over the baseline, though
the learning curve suggests that more work is needed.
Beierle, Felix; Aizawa, Akiko; Collins, Andrew; Beel, Joeran
Choice overload and recommendation effectiveness in related-article recommendations Journal Article
In: International Journal of Digital Libraries (IJDL), pp. 1–16, 2019.
@article{Beierle2019,
title = {Choice overload and recommendation effectiveness in related-article recommendations},
author = {Felix Beierle and Akiko Aizawa and Andrew Collins and Joeran Beel},
doi = {10.1007/s00799-019-00270-7},
year = {2019},
date = {2019-01-01},
journal = {International Journal of Digital Libraries (IJDL)},
pages = {1–16},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Collins, Andrew; Beel, Joeran
Document Embeddings vs. Keyphrases vs. Terms: A Large-Scale Online Evaluation in Digital Library Recommender Systems Proceedings Article
In: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2019.
@inproceedings{Collins2019,
title = {Document Embeddings vs. Keyphrases vs. Terms: A Large-Scale Online Evaluation in Digital Library Recommender Systems},
author = {Andrew Collins and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Smyth, Barry; Collins, Andrew
RARD II: The 94 Million Related-Article Recommendation Dataset Proceedings Article
In: Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR), pp. 39–55, CEUR-WS, 2019.
@inproceedings{Beel2019e,
title = {RARD II: The 94 Million Related-Article Recommendation Dataset},
author = {Joeran Beel and Barry Smyth and Andrew Collins},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
pages = {39–55},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collins, Andrew; Tkaczyk, Dominika; Aizawa, Akiko; Beel, Joeran
Position Bias in Recommender Systems for Digital Libraries Proceedings Article
In: Proceedings of the iConference, pp. 335-344, Springer, 2018.
@inproceedings{Collins2018,
title = {Position Bias in Recommender Systems for Digital Libraries},
author = {Andrew Collins and Dominika Tkaczyk and Akiko Aizawa and Joeran Beel},
doi = {10.1007/978-3-319-78105-1_37},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the iConference},
volume = {10766},
pages = {335-344},
publisher = {Springer},
series = {Lecture Notes on Computer Science (LNCS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Collins, Andrew; Aizawa, Akiko
The Architecture of Mr. DLib’s Scientific Recommender-System API Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 78–89, CEUR-WS, 2018.
@inproceedings{Beel2018,
title = {The Architecture of Mr. DLib’s Scientific Recommender-System API},
author = {Joeran Beel and Andrew Collins and Akiko Aizawa},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {78–89},
publisher = {CEUR-WS},
abstract = {Recommender systems in academia are not widely available. This may
be in part due to the difficulty and cost of developing and maintaining
recommender systems. Many operators of academic products such as
digital libraries and reference managers avoid this effort, although
a recommender system could provide significant benefits to their
users. In this paper, we introduce Mr. DLib’s “Recommendations as-a-Service"
(RaaS) API that allows operators of academic products to easily integrate
a scientific recommender system into their products. Mr. DLib generates
recommendations for research articles but in the future, recommendations
may include call for papers, grants, etc. Operators of academic products
can request recommendations from Mr. DLib and display these recommendations
to their users. Mr. DLib can be integrated in just a few hours or
days; creating an equivalent recommender system from scratch would
require several months for an academic operator. Mr. DLib has been
used by GESIS´ Sowiport and by the reference manager JabRef. Mr.
DLib is open source and its goal is to facilitate the application
of, and research on, scientific recommender systems. In this paper,
we present the motivation for Mr. DLib, the architecture and details
about the effectiveness. Mr. DLib has delivered 94m recommendations
over a span of two years with an average click-through rate of 0.12%.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Recommender systems in academia are not widely available. This may
be in part due to the difficulty and cost of developing and maintaining
recommender systems. Many operators of academic products such as
digital libraries and reference managers avoid this effort, although
a recommender system could provide significant benefits to their
users. In this paper, we introduce Mr. DLib’s “Recommendations as-a-Service"
(RaaS) API that allows operators of academic products to easily integrate
a scientific recommender system into their products. Mr. DLib generates
recommendations for research articles but in the future, recommendations
may include call for papers, grants, etc. Operators of academic products
can request recommendations from Mr. DLib and display these recommendations
to their users. Mr. DLib can be integrated in just a few hours or
days; creating an equivalent recommender system from scratch would
require several months for an academic operator. Mr. DLib has been
used by GESIS´ Sowiport and by the reference manager JabRef. Mr.
DLib is open source and its goal is to facilitate the application
of, and research on, scientific recommender systems. In this paper,
we present the motivation for Mr. DLib, the architecture and details
about the effectiveness. Mr. DLib has delivered 94m recommendations
over a span of two years with an average click-through rate of 0.12%.
Beel, Joeran; Breitinger, Corinna; Langer, Stefan
Evaluating the CC-IDF citation-weighting scheme: How effectively can 'Inverse Document Frequency' (IDF) be applied to references? Proceedings Article
In: Proceedings of the 12th iConference, 2017.
@inproceedings{Beel2017,
title = {Evaluating the CC-IDF citation-weighting scheme: How effectively can 'Inverse Document Frequency' (IDF) be applied to references?},
author = {Joeran Beel and Corinna Breitinger and Stefan Langer},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 12th iConference},
abstract = {In the domain of academic search engines and research-paper recommender
systems, CC-IDF is a common citation-weighting scheme that is used
to calculate semantic relatedness between documents. CC-IDF adopts
the principles of the popular term-weighting scheme TF-IDF and assumes
that if a rare academic citation is shared by two documents then
this occurrence should receive a higher weight than if the citation
is shared among a large number of documents. Although CC-IDF is in
common use, we found no empirical evaluation and comparison of CC-IDF
with plain citation weight (CC-Only). Therefore, we conducted such
an evaluation and present the results in this paper. The evaluation
was conducted with real users of the recommender system Docear. The
effectiveness of CC-IDF and CC-Only was measured using click-through
rate (CTR). For 238,681 delivered recommendations, CC-IDF had about
the same effectiveness as CC-Only (CTR of 6.15% vs. 6.23%). In other
words, CC-IDF was not more effective than CC-Only, which is a surprising
result. We provide a number of potential reasons and suggest to conduct
further research to understand the principles of CC-IDF in more detail.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In the domain of academic search engines and research-paper recommender
systems, CC-IDF is a common citation-weighting scheme that is used
to calculate semantic relatedness between documents. CC-IDF adopts
the principles of the popular term-weighting scheme TF-IDF and assumes
that if a rare academic citation is shared by two documents then
this occurrence should receive a higher weight than if the citation
is shared among a large number of documents. Although CC-IDF is in
common use, we found no empirical evaluation and comparison of CC-IDF
with plain citation weight (CC-Only). Therefore, we conducted such
an evaluation and present the results in this paper. The evaluation
was conducted with real users of the recommender system Docear. The
effectiveness of CC-IDF and CC-Only was measured using click-through
rate (CTR). For 238,681 delivered recommendations, CC-IDF had about
the same effectiveness as CC-Only (CTR of 6.15% vs. 6.23%). In other
words, CC-IDF was not more effective than CC-Only, which is a surprising
result. We provide a number of potential reasons and suggest to conduct
further research to understand the principles of CC-IDF in more detail.
Beierle, Felix; Aizawa, Akiko; Beel, Joeran
Exploring Choice Overload in Related-Article Recommendations in Digital Libraries Proceedings Article
In: 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR), pp. 51–61, 2017.
@inproceedings{Beierle2017,
title = {Exploring Choice Overload in Related-Article Recommendations in Digital Libraries},
author = {Felix Beierle and Akiko Aizawa and Joeran Beel},
year = {2017},
date = {2017-01-01},
booktitle = {5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR)},
pages = {51–61},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Gipp, Bela; Langer, Stefan; Breitinger, Corinna
Research Paper Recommender Systems: A Literature Survey Journal Article
In: International Journal on Digital Libraries, no. 4, pp. 305–338, 2016, ISSN: 1432-5012.
@article{Beel2016a,
title = {Research Paper Recommender Systems: A Literature Survey},
author = {Joeran Beel and Bela Gipp and Stefan Langer and Corinna Breitinger},
doi = {10.1007/s00799-015-0156-0},
issn = {1432-5012},
year = {2016},
date = {2016-01-01},
journal = {International Journal on Digital Libraries},
number = {4},
pages = {305–338},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Langer, Stefan
A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems Proceedings Article
In: Kapidakis, Sarantos; Mazurek, Cezary; Werla, Marcin (Ed.): Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL), pp. 153-168, 2015.
@inproceedings{Beel2015a,
title = {A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems},
author = {Joeran Beel and Stefan Langer},
editor = {Sarantos Kapidakis and Cezary Mazurek and Marcin Werla},
doi = {10.1007/978-3-319-24592-8_12},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL)},
volume = {9316},
pages = {153-168},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Gipp, Bela; Nuernberger, Andreas
The Architecture and Datasets of Docear's Research Paper Recommender System Journal Article
In: D-Lib Magazine, vol. 20, no. 11/12, 2014.
@article{Beel2014,
title = {The Architecture and Datasets of Docear's Research Paper Recommender System},
author = {Joeran Beel and Stefan Langer and Bela Gipp and Andreas Nuernberger},
doi = {10.1045/november14-beel},
year = {2014},
date = {2014-01-01},
journal = {D-Lib Magazine},
volume = {20},
number = {11/12},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Novel Aspects Beyond Accuracy
Persistence
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Nürnberger, Andreas
Persistence in Recommender Systems: Giving the Same Recommendations to the Same Users Multiple Times Proceedings Article
In: Aalberg, Trond; Dobreva, Milena; Papatheodorou, Christos; Tsakonas, Giannis; Farrugia, Charles (Ed.): Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 390–394, Springer, Valletta, Malta, 2013.
@inproceedings{Beel2013e,
title = {Persistence in Recommender Systems: Giving the Same Recommendations to the Same Users Multiple Times},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Andreas Nürnberger},
editor = {Trond Aalberg and Milena Dobreva and Christos Papatheodorou and Giannis Tsakonas and Charles Farrugia},
year = {2013},
date = {2013-09-01},
booktitle = {Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013)},
volume = {8092},
pages = {390–394},
publisher = {Springer},
address = {Valletta, Malta},
series = {Lecture Notes of Computer Science (LNCS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Choice Overload
Beierle, Felix; Aizawa, Akiko; Collins, Andrew; Beel, Joeran
Choice overload and recommendation effectiveness in related-article recommendations Journal Article
In: International Journal of Digital Libraries (IJDL), pp. 1–16, 2019.
@article{Beierle2019,
title = {Choice overload and recommendation effectiveness in related-article recommendations},
author = {Felix Beierle and Akiko Aizawa and Andrew Collins and Joeran Beel},
doi = {10.1007/s00799-019-00270-7},
year = {2019},
date = {2019-01-01},
journal = {International Journal of Digital Libraries (IJDL)},
pages = {1–16},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beierle, Felix; Aizawa, Akiko; Beel, Joeran
Exploring Choice Overload in Related-Article Recommendations in Digital Libraries Proceedings Article
In: 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR), pp. 51–61, 2017.
@inproceedings{Beierle2017,
title = {Exploring Choice Overload in Related-Article Recommendations in Digital Libraries},
author = {Felix Beierle and Akiko Aizawa and Joeran Beel},
year = {2017},
date = {2017-01-01},
booktitle = {5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR)},
pages = {51–61},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Position Bias
Collins, Andrew; Tkaczyk, Dominika; Aizawa, Akiko; Beel, Joeran
Position Bias in Recommender Systems for Digital Libraries Proceedings Article
In: Proceedings of the iConference, pp. 335-344, Springer, 2018.
@inproceedings{Collins2018,
title = {Position Bias in Recommender Systems for Digital Libraries},
author = {Andrew Collins and Dominika Tkaczyk and Akiko Aizawa and Joeran Beel},
doi = {10.1007/978-3-319-78105-1_37},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the iConference},
volume = {10766},
pages = {335-344},
publisher = {Springer},
series = {Lecture Notes on Computer Science (LNCS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Labeling
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel
Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling Proceedings Article
In: Aalberg, Trond; Dobreva, Milena; Papatheodorou, Christos; Tsakonas, Giannis; Farrugia, Charles (Ed.): Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 395–399, Valletta, Malta, 2013.
@inproceedings{Beel2013a,
title = {Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr},
editor = {Trond Aalberg and Milena Dobreva and Christos Papatheodorou and Giannis Tsakonas and Charles Farrugia},
year = {2013},
date = {2013-09-01},
booktitle = {Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013)},
pages = {395–399},
address = {Valletta, Malta},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Graphical User Interfaces
Beel, Joeran; Dixon, Haley
The ‘Unreasonable’ Effectiveness of Graphical User Interfaces for Recommender Systems Proceedings Article
In: Adjunct Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, pp. 22–28, Association for Computing Machinery, New York, NY, USA, 2021, ISBN: 9781450383677.
@inproceedings{Beel2021,
title = {The ‘Unreasonable’ Effectiveness of Graphical User Interfaces for Recommender Systems},
author = {Joeran Beel and Haley Dixon},
url = {https://dl.acm.org/doi/fullHtml/10.1145/3450614.3461682
https://www.researchgate.net/publication/352672485_The_'Unreasonable'_Effectiveness_of_Graphical_User_Interfaces_for_Recommender_Systems},
doi = {10.1145/3450614.3461682},
isbn = {9781450383677},
year = {2021},
date = {2021-01-01},
booktitle = {Adjunct Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization},
pages = {22–28},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
abstract = {The impact of Graphical User Interfaces (GUI) for recommender systems is a little explored area. Therefore, we conduct an empirical study in which we create, deploy, and evaluate seven different GUI variations. We use these variations to display 68.260 related-blog-post recommendations to 10.595 unique visitors of our blog. The study shows that the GUIs have a strong effect on the recommender systems’ performance, measured in click-through rate (CTR). The best performing GUI achieved a 66% higher CTR than the worst performing GUI (statist. significant with p<0.05). In other words, with a few days of work to develop different GUIs, a recommender-system operator could increase CTR notably – maybe even more than by tuning the recommendation algorithm. In analogy to the ‘unreasonable effectiveness of data’ discussion by Google and others, we conclude that the effectiveness of graphical user interfaces for recommender systems is equally ‘unreasonable’. Hence, the recommender system community should spend more time on researching GUIs for recommender systems. In addition, we conduct a survey and find that the ACM Recommender Systems Conference has a strong focus on algorithms – 81% of all short and full papers published in 2019 and 2020 relate to algorithm development, and none to GUIs for recommender systems. We also surveyed the recommender systems of 50 blogs. While most displayed a thumbnail (86%) and had a mouseover interaction (62%) other design elements were rare. Only few highlighted top recommendations (8%), displayed rankings or relevance scores (6%), or offered a ‘view more’ option (4%).},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The impact of Graphical User Interfaces (GUI) for recommender systems is a little explored area. Therefore, we conduct an empirical study in which we create, deploy, and evaluate seven different GUI variations. We use these variations to display 68.260 related-blog-post recommendations to 10.595 unique visitors of our blog. The study shows that the GUIs have a strong effect on the recommender systems’ performance, measured in click-through rate (CTR). The best performing GUI achieved a 66% higher CTR than the worst performing GUI (statist. significant with p<0.05). In other words, with a few days of work to develop different GUIs, a recommender-system operator could increase CTR notably – maybe even more than by tuning the recommendation algorithm. In analogy to the ‘unreasonable effectiveness of data’ discussion by Google and others, we conclude that the effectiveness of graphical user interfaces for recommender systems is equally ‘unreasonable’. Hence, the recommender system community should spend more time on researching GUIs for recommender systems. In addition, we conduct a survey and find that the ACM Recommender Systems Conference has a strong focus on algorithms – 81% of all short and full papers published in 2019 and 2020 relate to algorithm development, and none to GUIs for recommender systems. We also surveyed the recommender systems of 50 blogs. While most displayed a thumbnail (86%) and had a mouseover interaction (62%) other design elements were rare. Only few highlighted top recommendations (8%), displayed rankings or relevance scores (6%), or offered a ‘view more’ option (4%).
Demographics
Langer, Stefan; Beel, Joeran
The Comparability of Recommender System Evaluations and Characteristics of Docear's Users Proceedings Article
In: Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys), pp. 1–6, CEUR-WS, 2014.
@inproceedings{Langer2014,
title = {The Comparability of Recommender System Evaluations and Characteristics of Docear's Users},
author = {Stefan Langer and Joeran Beel},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys)},
pages = {1–6},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Nuenberger, Andreas; Genzmehr, Marcel
The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems Proceedings Article
In: Aalberg, Trond; Dobreva, Milena; Papatheodorou, Christos; Tsakonas, Giannis; Farrugia, Charles (Ed.): Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 400–404, Springer, Valletta, Malta, 2013.
@inproceedings{Beel2013f,
title = {The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems},
author = {Joeran Beel and Stefan Langer and Andreas Nuenberger and Marcel Genzmehr},
editor = {Trond Aalberg and Milena Dobreva and Christos Papatheodorou and Giannis Tsakonas and Charles Farrugia},
year = {2013},
date = {2013-09-01},
booktitle = {Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013)},
pages = {400–404},
publisher = {Springer},
address = {Valletta, Malta},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Machine Learning
AutoML (Automated Machine Learning), Meta-Learning & Algorithm Selection
Vente, Tobias; Beel, Joeran
The Potential of AutoML for Recommender Systems Journal Article
In: arXiv, pp. 18, 2024.
@article{Vente2024,
title = {The Potential of AutoML for Recommender Systems},
author = {Tobias Vente and Joeran Beel},
url = {https://arxiv.org/abs/2402.04453},
doi = {10.48550/arXiv.2402.04453},
year = {2024},
date = {2024-01-01},
journal = {arXiv},
pages = {18},
abstract = {Automated Machine Learning (AutoML) has greatly advanced applications of Machine Learning (ML) including model compression, machine translation, and computer vision. Recommender Systems (RecSys) can be seen as an application of ML. Yet, AutoML has found little attention in the RecSys community; nor has RecSys found notable attention in the AutoML community. Only few and relatively simple Automated Recommender Systems (AutoRecSys) libraries exist that adopt AutoML techniques. However, these libraries are based on student projects and do not offer the features and thorough development of AutoML libraries. We set out to determine how AutoML libraries perform in the scenario of an inexperienced user who wants to implement a recommender system. We compared the predictive performance of 60 AutoML, AutoRecSys, ML, and RecSys algorithms from 15 libraries, including a mean predictor baseline, on 14 explicit feedback RecSys datasets. To simulate the perspective of an inexperienced user, the algorithms were evaluated with default hyperparameters. We found that AutoML and AutoRecSys libraries performed best. AutoML libraries performed best for six of the 14 datasets (43%), but it was not always the same AutoML library performing best. The single-best library was the AutoRecSys library Auto-Surprise, which performed best on five datasets (36%). On three datasets (21%), AutoML libraries performed poorly, and RecSys libraries with default parameters performed best. Although, while obtaining 50% of all placements in the top five per dataset, RecSys algorithms fall behind AutoML on average. ML algorithms generally performed the worst.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Automated Machine Learning (AutoML) has greatly advanced applications of Machine Learning (ML) including model compression, machine translation, and computer vision. Recommender Systems (RecSys) can be seen as an application of ML. Yet, AutoML has found little attention in the RecSys community; nor has RecSys found notable attention in the AutoML community. Only few and relatively simple Automated Recommender Systems (AutoRecSys) libraries exist that adopt AutoML techniques. However, these libraries are based on student projects and do not offer the features and thorough development of AutoML libraries. We set out to determine how AutoML libraries perform in the scenario of an inexperienced user who wants to implement a recommender system. We compared the predictive performance of 60 AutoML, AutoRecSys, ML, and RecSys algorithms from 15 libraries, including a mean predictor baseline, on 14 explicit feedback RecSys datasets. To simulate the perspective of an inexperienced user, the algorithms were evaluated with default hyperparameters. We found that AutoML and AutoRecSys libraries performed best. AutoML libraries performed best for six of the 14 datasets (43%), but it was not always the same AutoML library performing best. The single-best library was the AutoRecSys library Auto-Surprise, which performed best on five datasets (36%). On three datasets (21%), AutoML libraries performed poorly, and RecSys libraries with default parameters performed best. Although, while obtaining 50% of all placements in the top five per dataset, RecSys algorithms fall behind AutoML on average. ML algorithms generally performed the worst.
Purucker, Lennart; Beel, Joeran
A first Look at Meta-Learning Algorithm Selection for Post Hoc Ensembling in AutoML Proceedings Article
In: Poster Track of the COSEAL Workshop, 2023.
@inproceedings{Purucker2023a,
title = {A first Look at Meta-Learning Algorithm Selection for Post Hoc Ensembling in AutoML},
author = {Lennart Purucker and Joeran Beel},
year = {2023},
date = {2023-01-01},
booktitle = {Poster Track of the COSEAL Workshop},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Purucker, Lennart; Beel, Joeran
CMA-ES for Post Hoc Ensembling in AutoML: A Great Success and Salvageable Failure Proceedings Article
In: 2nd International Conference on Automated Machine Learning (AutoML), pp. 1–23, 2023.
@inproceedings{Purucker2023,
title = {CMA-ES for Post Hoc Ensembling in AutoML: A Great Success and Salvageable Failure},
author = {Lennart Purucker and Joeran Beel},
url = {https://openreview.net/pdf?id=MeCwOxob8jfl},
year = {2023},
date = {2023-01-01},
booktitle = {2nd International Conference on Automated Machine Learning (AutoML)},
pages = {1–23},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Purucker, Lennart; Schneider, Lennart; Anastacio, Marie; Beel, Joeran; Bischl, Bernd; Holger, Hoos
Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML Proceedings Article
In: 2nd International Conference on Automated Machine Learning (AutoML), pp. 1–34, 2023.
@inproceedings{Purucker2023b,
title = {Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML},
author = {Lennart Purucker and Lennart Schneider and Marie Anastacio and Joeran Beel and Bernd Bischl and Hoos Holger},
url = {https://openreview.net/pdf?id=zvV7hemQmtLl},
year = {2023},
date = {2023-01-01},
booktitle = {2nd International Conference on Automated Machine Learning (AutoML)},
pages = {1–34},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Purucker, Lennart Oswald; Beel, Joeran
Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML Proceedings Article
In: International Conference on Automated Machine Learning, Late-Breaking Results Track, pp. 1–18, 2022.
@inproceedings{Purucker2022,
title = {Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML},
author = {Lennart Oswald Purucker and Joeran Beel},
url = {https://2022.automl.cc/wp-content/uploads/2022/08/assembled_openml_creating_effi-Main-Paper-And-Supplementary-Material.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {International Conference on Automated Machine Learning, Late-Breaking Results Track},
pages = {1–18},
abstract = {Automated Machine Learning (AutoML) frameworks regularly use ensembles. Developers need to compare different ensemble techniques to select appropriate techniques for an AutoML framework from the many potential techniques. So far, the comparison of ensemble techniques is often computationally expensive, because many base models must be trained and evaluated one or multiple times. Therefore, we present Assembled-OpenML. Assembled-OpenML is a Python tool, which builds meta-datasets for ensembles using OpenML. A meta-dataset, called Metatask, consists of the data of an OpenML task, the task's dataset, and prediction data from model evaluations for the task. We can make the comparison of ensemble techniques computationally cheaper by using the predictions stored in a metatask instead of training and evaluating base models. To introduce Assembled-OpenML, we describe the first version of our tool. Moreover, we present an example of using Assembled-OpenML to compare a set of ensemble techniques. For this example comparison, we built a benchmark using Assembled-OpenML and implemented ensemble techniques expecting predictions instead of base models as input. In our example comparison, we gathered the prediction data of 1523 base models for 31 datasets. Obtaining the prediction data for all base models using Assembled-OpenML took ∼1 hour in total. In comparison, obtaining the prediction data by training and evaluating just one base model on the most computationally expensive dataset took ∼37 minutes.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Automated Machine Learning (AutoML) frameworks regularly use ensembles. Developers need to compare different ensemble techniques to select appropriate techniques for an AutoML framework from the many potential techniques. So far, the comparison of ensemble techniques is often computationally expensive, because many base models must be trained and evaluated one or multiple times. Therefore, we present Assembled-OpenML. Assembled-OpenML is a Python tool, which builds meta-datasets for ensembles using OpenML. A meta-dataset, called Metatask, consists of the data of an OpenML task, the task's dataset, and prediction data from model evaluations for the task. We can make the comparison of ensemble techniques computationally cheaper by using the predictions stored in a metatask instead of training and evaluating base models. To introduce Assembled-OpenML, we describe the first version of our tool. Moreover, we present an example of using Assembled-OpenML to compare a set of ensemble techniques. For this example comparison, we built a benchmark using Assembled-OpenML and implemented ensemble techniques expecting predictions instead of base models as input. In our example comparison, we gathered the prediction data of 1523 base models for 31 datasets. Obtaining the prediction data for all base models using Assembled-OpenML took ∼1 hour in total. In comparison, obtaining the prediction data by training and evaluating just one base model on the most computationally expensive dataset took ∼37 minutes.
Buskulic, Nathan; Bergman, Edward; Beel, Joeran
Online Neural Architecture Search (ONAS): Adapting neural network architecture search in a continuously evolving domain. [Proposal] Journal Article
In: https://osf.io/suqxr, pp. 1-4, 2021.
@article{Buskulic2021,
title = {Online Neural Architecture Search (ONAS): Adapting neural network architecture search in a continuously evolving domain. [Proposal]},
author = {Nathan Buskulic and Edward Bergman and Joeran Beel},
url = {https://osf.io/suqxr},
doi = {10.31219/osf.io/suqxr},
year = {2021},
date = {2021-01-01},
journal = {https://osf.io/suqxr},
pages = {1-4},
publisher = {OSF Preprint},
abstract = {Neural Architecture Search research has been limited to fixed datasets and as such does not provide the flexibility needed to deal with real-world, constantly evolving data. This is why we propose the basis of Online Neural Architecture Search (ONAS) to deal with complex, evolving, data distributions. We formalise ONAS as a minimisation problem upon which both the weights and the architecture of the neural network needs to be optimised for the data up until a time $t_i$. To solve this problem, we adapt a DARTS optimisation process, associated with an early stopping scheme, by using the supernet optimised on previous data as a warm-up initial state. This allows the architecture of the neural network to evolve as the data distribution evolves while limiting the computational burden. This work aims at building the initial mathematical formalism of the problem as well as the development of a framework where NAS methods could be used to solve this problem. Finally, several possible next steps are presented to show the potential of this field of Online Neural Architecture Search.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Neural Architecture Search research has been limited to fixed datasets and as such does not provide the flexibility needed to deal with real-world, constantly evolving data. This is why we propose the basis of Online Neural Architecture Search (ONAS) to deal with complex, evolving, data distributions. We formalise ONAS as a minimisation problem upon which both the weights and the architecture of the neural network needs to be optimised for the data up until a time $t_i$. To solve this problem, we adapt a DARTS optimisation process, associated with an early stopping scheme, by using the supernet optimised on previous data as a warm-up initial state. This allows the architecture of the neural network to evolve as the data distribution evolves while limiting the computational burden. This work aims at building the initial mathematical formalism of the problem as well as the development of a framework where NAS methods could be used to solve this problem. Finally, several possible next steps are presented to show the potential of this field of Online Neural Architecture Search.
Tyrrell, Bryan; Bergman, Edward; Jones, Gareth; Beel, Joeran
‘Algorithm-Performance Personas’ for Siamese Meta-Learning and Automated Algorithm Selection Proceedings Article
In: 7th ICML Workshop on Automated Machine Learning, pp. 1–16, 2020.
@inproceedings{Tyrrell2020,
title = {‘Algorithm-Performance Personas’ for Siamese Meta-Learning and Automated Algorithm Selection},
author = {Bryan Tyrrell and Edward Bergman and Gareth Jones and Joeran Beel},
url = {https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_48.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {7th ICML Workshop on Automated Machine Learning},
pages = {1–16},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Arambakam, Mukesh; Beel, Joeran
Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries Proceedings Article
In: 7th ICML Workshop on Automated Machine Learning, pp. 1–8, 2020.
@inproceedings{Arambakam2020,
title = {Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries},
author = {Mukesh Arambakam and Joeran Beel},
url = {https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_39.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {7th ICML Workshop on Automated Machine Learning},
pages = {1–8},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Tyrell, Bryan; Bergman, Edward; Collins, Andrew; Nagoor, Shahad
Siamese Meta-Learning and Algorithm Selection with ‘Algorithm-Performance Personas’ [Proposal] Journal Article
In: arXiv:2006.12328 [cs.LG], pp. 600–603, 2020.
@article{Beel2020,
title = {Siamese Meta-Learning and Algorithm Selection with ‘Algorithm-Performance Personas’ [Proposal]},
author = {Joeran Beel and Bryan Tyrell and Edward Bergman and Andrew Collins and Shahad Nagoor},
url = {https://doi.org/10.1145/3383313.3411522},
doi = {10.1145/3383313.3411522},
year = {2020},
date = {2020-01-01},
booktitle = {Fourteenth ACM Conference on Recommender Systems},
journal = {arXiv:2006.12328 [cs.LG]},
pages = {600–603},
publisher = {Association for Computing Machinery},
address = {Virtual Event, Brazil},
series = {RecSys '20},
abstract = {We introduce Recommender-Systems.com (RS_c) as a central platform
for the recommender-systems community. RS_c provides regular news
on important events in the community as well as curated lists of
recommender-system resources including datasets, algorithms, jobs,
software, and learning materials. Based on a survey with 28 participants
– mostly authors at the RecSys 2019 conference – 91% agree that RS_c
could be a major contribution to the community. Participants consider
it currently particularly difficult to find best practice guidelines
(45%); researchers, freelancers and employers (45%); and curated
lists of state-of-the-art algorithms, software, and datasets (36%).
Notably, only 19% consider it (very) easy to find material relating
to diversity, equality and anti-discrimination.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We introduce Recommender-Systems.com (RS_c) as a central platform
for the recommender-systems community. RS_c provides regular news
on important events in the community as well as curated lists of
recommender-system resources including datasets, algorithms, jobs,
software, and learning materials. Based on a survey with 28 participants
– mostly authors at the RecSys 2019 conference – 91% agree that RS_c
could be a major contribution to the community. Participants consider
it currently particularly difficult to find best practice guidelines
(45%); researchers, freelancers and employers (45%); and curated
lists of state-of-the-art algorithms, software, and datasets (36%).
Notably, only 19% consider it (very) easy to find material relating
to diversity, equality and anti-discrimination.
Edenhofer, Gordian; Collins, Andrew; Aizawa, Akiko; Beel, Joeran
Augmenting the DonorsChoose.org Corpus for Meta-Learning Proceedings Article
In: Proceedings of The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR), pp. 32–38, CEUR-WS, 2019.
@inproceedings{Edenhofer2019,
title = {Augmenting the DonorsChoose.org Corpus for Meta-Learning},
author = {Gordian Edenhofer and Andrew Collins and Akiko Aizawa and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
pages = {32–38},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries (Proposal) Journal Article
In: ResearchGate, 2019.
@article{Beel2019,
title = {Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries (Proposal)},
author = {Joeran Beel},
doi = {10.13140/RG.2.2.25744.35844},
year = {2019},
date = {2019-01-01},
journal = {ResearchGate},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Kotthoff, Lars
Preface: The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR) Proceedings Article
In: Proceddings of The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR), pp. 1–9, CEUR-WS, 2019.
@inproceedings{Beel2019a,
title = {Preface: The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
author = {Joeran Beel and Lars Kotthoff},
year = {2019},
date = {2019-01-01},
booktitle = {Proceddings of The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
volume = {2431},
pages = {1–9},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Kotthoff, Lars
Proposal for the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR) Proceedings Article
In: Azzopardi, L.; Stein, B.; Fuhr, N.; Mayr, P.; Hauff, C.; Hiemstra, D. (Ed.): Proceedings of the 41st European Conference on Information Retrieval (ECIR), pp. 383–388, Springer, 2019.
@inproceedings{Beel2019c,
title = {Proposal for the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
author = {Joeran Beel and Lars Kotthoff},
editor = {L. Azzopardi and B. Stein and N. Fuhr and P. Mayr and C. Hauff and D. Hiemstra},
doi = {10.1007/978-3-030-15719-7_53},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 41st European Conference on Information Retrieval (ECIR)},
volume = {11438},
pages = {383–388},
publisher = {Springer},
series = {Lecture Notes in Computer Science book series (LNCS)},
abstract = {The algorithm selection problem describes the challenge of identifying
the best algorithm for a given problem space. In many domains, particularly
artificial intelligence, the algorithm selection problem is well
studied, and various approaches and tools exist to tackle it in practice.
Especially through meta-learning impressive performance improvements
have been achieved. The information retrieval (IR) community, however,
has paid little attention to the algorithm selection problem, although
the problem is highly relevant in information retrieval. This workshop
will bring together researchers from the fields of algorithm selection
and meta-learning as well as information retrieval. We aim to raise
the awareness in the IR community of the algorithm selection problem;
identify the potential for automatic algorithm selection in information
retrieval; and explore possible solutions for this context. In particular,
we will explore to what extent existing solutions to the algorithm
selection problem from other domains can be applied in information
retrieval, and also how techniques from IR can be used for automated
algorithm selection and meta-learning.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The algorithm selection problem describes the challenge of identifying
the best algorithm for a given problem space. In many domains, particularly
artificial intelligence, the algorithm selection problem is well
studied, and various approaches and tools exist to tackle it in practice.
Especially through meta-learning impressive performance improvements
have been achieved. The information retrieval (IR) community, however,
has paid little attention to the algorithm selection problem, although
the problem is highly relevant in information retrieval. This workshop
will bring together researchers from the fields of algorithm selection
and meta-learning as well as information retrieval. We aim to raise
the awareness in the IR community of the algorithm selection problem;
identify the potential for automatic algorithm selection in information
retrieval; and explore possible solutions for this context. In particular,
we will explore to what extent existing solutions to the algorithm
selection problem from other domains can be applied in information
retrieval, and also how techniques from IR can be used for automated
algorithm selection and meta-learning.
Machine Learning Algorithms
Grennan, Mark; Beel, Joeran
Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and CORA Proceedings Article
In: Proceedings of the 8th International Workshop on Mining Scientific Publications, pp. 27–35, Association for Computational Linguistics, Wuhan, China, 2020.
@inproceedings{Grennan2020,
title = {Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and CORA},
author = {Mark Grennan and Joeran Beel},
url = {https://aclanthology.org/2020.wosp-1.4.pdf
https://www.aclweb.org/anthology/2020.wosp-1.4},
year = {2020},
date = {2020-08-01},
booktitle = {Proceedings of the 8th International Workshop on Mining Scientific Publications},
pages = {27–35},
publisher = {Association for Computational Linguistics},
address = {Wuhan, China},
abstract = {Citation parsing, particularly with deep neural networks, suffers
from a lack of training data as available datasets typically contain
only a few thousand training instances. Manually labelling citation
strings is very time-consuming, hence synthetically created training
data could be a solution. However, as of now, it is unknown if synthetically
created reference-strings are suitable to train machine learning
algorithms for citation parsing. To find out, we train Grobid, which
uses Conditional Random Fields, with a) human-labelled reference
strings from `real' bibliographies and b) synthetically created
reference strings from the GIANT dataset. We find that both synthetic
and organic reference strings are equally suited for training Grobid (F1 = 0.74). We additionally find that retraining Grobid has a notable
impact on its performance, for both synthetic and real data (+30%
in F1). Having as many types of labelled fields as possible during
training also improves effectiveness, even if these fields are not
available in the evaluation data (+13.5% F1). We conclude that
synthetic data is suitable for training (deep) citation parsing models.
We further suggest that in future evaluations of reference parsers
both evaluation data similar and dissimilar to the training data
should be used for more meaningful evaluations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Citation parsing, particularly with deep neural networks, suffers
from a lack of training data as available datasets typically contain
only a few thousand training instances. Manually labelling citation
strings is very time-consuming, hence synthetically created training
data could be a solution. However, as of now, it is unknown if synthetically
created reference-strings are suitable to train machine learning
algorithms for citation parsing. To find out, we train Grobid, which
uses Conditional Random Fields, with a) human-labelled reference
strings from `real' bibliographies and b) synthetically created
reference strings from the GIANT dataset. We find that both synthetic
and organic reference strings are equally suited for training Grobid (F1 = 0.74). We additionally find that retraining Grobid has a notable
impact on its performance, for both synthetic and real data (+30%
in F1). Having as many types of labelled fields as possible during
training also improves effectiveness, even if these fields are not
available in the evaluation data (+13.5% F1). We conclude that
synthetic data is suitable for training (deep) citation parsing models.
We further suggest that in future evaluations of reference parsers
both evaluation data similar and dissimilar to the training data
should be used for more meaningful evaluations.
Marwah, Divyanshu; Beel, Joeran
Term-Recency for TF-IDF, BM25 and USE Term Weighting Proceedings Article
In: Proceedings of the 8th International Workshop on Mining Scientific Publications, pp. 36–41, Association for Computational Linguistics, Wuhan, China, 2020.
@inproceedings{Marwah2020,
title = {Term-Recency for TF-IDF, BM25 and USE Term Weighting},
author = {Divyanshu Marwah and Joeran Beel},
url = {https://aclanthology.org/2020.wosp-1.5.pdf
https://www.aclweb.org/anthology/2020.wosp-1.5},
year = {2020},
date = {2020-08-01},
booktitle = {Proceedings of the 8th International Workshop on Mining Scientific Publications},
pages = {36–41},
publisher = {Association for Computational Linguistics},
address = {Wuhan, China},
abstract = {Effectiveness of a recommendation in an Information Retrieval (IR)
system is determined by relevancy scores of retrieved results. Term
weighting is responsible for computing the relevance scores and consequently
differentiating between the terms in a document. However, the current
term weighting formula (TF-IDF, for instance), weighs terms only
based on term frequency and inverse document frequency irrespective
of other important factors. This results in ambiguity in cases when
both TF and IDF values the same for more than one document, hence
resulting in same TF-IDF values. In this paper, we propose a modification
of TF-IDF and other term-weighting schemes that weighs the terms
based on the recency and the usage in the corpus. We have tested
the performance of our algorithm with existing term weighting schemes;
TF-IDF, BM25 and USE text embedding model. We have indexed three
different datasets with different domains to validate the premises
for our algorithm. On evaluating the algorithms using Precision,
Recall, F1 score, and NDCG, we found that time normalized TF-IDF
outperformed the classic TF-IDF with a significant difference in
all the metrics and datasets. Time-based USE model performed better
than the standard USE model in two out of three datasets. But the
time-based BM25 model did not perform well in some of the input queries
as compared to standard BM25 model.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Effectiveness of a recommendation in an Information Retrieval (IR)
system is determined by relevancy scores of retrieved results. Term
weighting is responsible for computing the relevance scores and consequently
differentiating between the terms in a document. However, the current
term weighting formula (TF-IDF, for instance), weighs terms only
based on term frequency and inverse document frequency irrespective
of other important factors. This results in ambiguity in cases when
both TF and IDF values the same for more than one document, hence
resulting in same TF-IDF values. In this paper, we propose a modification
of TF-IDF and other term-weighting schemes that weighs the terms
based on the recency and the usage in the corpus. We have tested
the performance of our algorithm with existing term weighting schemes;
TF-IDF, BM25 and USE text embedding model. We have indexed three
different datasets with different domains to validate the premises
for our algorithm. On evaluating the algorithms using Precision,
Recall, F1 score, and NDCG, we found that time normalized TF-IDF
outperformed the classic TF-IDF with a significant difference in
all the metrics and datasets. Time-based USE model performed better
than the standard USE model in two out of three datasets. But the
time-based BM25 model did not perform well in some of the input queries
as compared to standard BM25 model.
Molloy, Paul; Beel, Joeran; Aizawa, Akiko
Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents Proceedings Article
In: Proceedings of the 8th International Workshop on Mining Scientific Publications, pp. 1–8, Association for Computational Linguistics, Wuhan, China, 2020.
@inproceedings{Molloy2020,
title = {Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents},
author = {Paul Molloy and Joeran Beel and Akiko Aizawa},
url = {https://www.aclweb.org/anthology/2020.wosp-1.1},
year = {2020},
date = {2020-08-01},
booktitle = {Proceedings of the 8th International Workshop on Mining Scientific Publications},
pages = {1–8},
publisher = {Association for Computational Linguistics},
address = {Wuhan, China},
abstract = {The relatedness of research articles, patents, court rulings, web
pages, and other document types is often calculated with citation
or hyperlink-based approaches like co-citation (proximity) analysis.
The main limitation of citation-based approaches is that they cannot
be used for documents that receive little or no citations. We propose
Virtual Citation Proximity (VCP), a Siamese Neural Network architecture,
which combines the advantages of co-citation proximity analysis (diverse
notions of relatedness / high recommendation performance), with the
advantage of content-based filtering (high coverage). VCP is trained
on a corpus of documents with textual features, and with real citation
proximity as ground truth. VCP then predicts for any two documents,
based on their title and abstract, in what proximity the two documents
would be co-cited, if they were indeed co-cited. The prediction can
be used in the same way as real citation proximity to calculate document
relatedness, even for uncited documents. In our evaluation with 2
million co-citations from Wikipedia articles, VCP achieves an MAE
of 0.0055, i.e. an improvement of 20% over the baseline, though
the learning curve suggests that more work is needed.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The relatedness of research articles, patents, court rulings, web
pages, and other document types is often calculated with citation
or hyperlink-based approaches like co-citation (proximity) analysis.
The main limitation of citation-based approaches is that they cannot
be used for documents that receive little or no citations. We propose
Virtual Citation Proximity (VCP), a Siamese Neural Network architecture,
which combines the advantages of co-citation proximity analysis (diverse
notions of relatedness / high recommendation performance), with the
advantage of content-based filtering (high coverage). VCP is trained
on a corpus of documents with textual features, and with real citation
proximity as ground truth. VCP then predicts for any two documents,
based on their title and abstract, in what proximity the two documents
would be co-cited, if they were indeed co-cited. The prediction can
be used in the same way as real citation proximity to calculate document
relatedness, even for uncited documents. In our evaluation with 2
million co-citations from Wikipedia articles, VCP achieves an MAE
of 0.0055, i.e. an improvement of 20% over the baseline, though
the learning curve suggests that more work is needed.
Carroll, Oisín; Beel, Joeran
Finite Group Equivariant Neural Networks for Games Journal Article
In: arXiv, no. 2009.05027, pp. 1–8, 2020.
@article{Carroll2020,
title = {Finite Group Equivariant Neural Networks for Games},
author = {Oisín Carroll and Joeran Beel},
url = {https://arxiv.org/abs/2009.05027},
year = {2020},
date = {2020-01-01},
journal = {arXiv},
number = {2009.05027},
pages = {1–8},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Hassan, Hebatallah A. Mohamed; Sansonetti, Giuseppe; Gasparetti, Fabio; Micarelli, Alessandro; Beel, Joeran
BERT, ELMo, USE and InferSent Sentence Encoders: The Panacea for Research-Paper Recommendation? Proceedings Article
In: 13th ACM Conference on Recommender Systems (RecSys), pp. 6–10, CEUR-WS, 2019.
@inproceedings{Hassan2019,
title = {BERT, ELMo, USE and InferSent Sentence Encoders: The Panacea for Research-Paper Recommendation?},
author = {Hebatallah A. Mohamed Hassan and Giuseppe Sansonetti and Fabio Gasparetti and Alessandro Micarelli and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
volume = {2431},
pages = {6–10},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Tunstead, Keith; Beel, Joeran
Combating Stagnation in Reinforcement Learning Through 'Guided Learning' With 'Taught-Response Memory' Proceedings Article
In: 3rd International Tutorial & Workshop on Interactive Adaptive Learning (IAL2019) at the ECML PKDD Conference, 2019.
@inproceedings{Tunstead2019,
title = {Combating Stagnation in Reinforcement Learning Through 'Guided Learning' With 'Taught-Response Memory'},
author = {Keith Tunstead and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {3rd International Tutorial & Workshop on Interactive Adaptive Learning (IAL2019) at the ECML PKDD Conference},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collins, Andrew; Beel, Joeran
Document Embeddings vs. Keyphrases vs. Terms: A Large-Scale Online Evaluation in Digital Library Recommender Systems Proceedings Article
In: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2019.
@inproceedings{Collins2019,
title = {Document Embeddings vs. Keyphrases vs. Terms: A Large-Scale Online Evaluation in Digital Library Recommender Systems},
author = {Andrew Collins and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collier, Mark; Beel, Joeran
Memory-Augmented Neural Networks for Machine Translation Proceedings Article
In: Proceedings of the Machine Translation (MT) Summit, 2019.
@inproceedings{Collier2019,
title = {Memory-Augmented Neural Networks for Machine Translation},
author = {Mark Collier and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the Machine Translation (MT) Summit},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Bonello, Nicholas; Debattista, Jeremy; Beel, Joeran; Lawless, Seamus
Multi-stream Data Analytics for Enhanced Performance Prediction in Fantasy Football Proceedings Article
In: 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, 2019.
@inproceedings{Bonello2019,
title = {Multi-stream Data Analytics for Enhanced Performance Prediction in Fantasy Football},
author = {Nicholas Bonello and Jeremy Debattista and Joeran Beel and Seamus Lawless},
year = {2019},
date = {2019-01-01},
booktitle = {27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O'Sullivan, Conor; Beel, Joeran
Predicting the Outcome of Judicial Decisions made by the European Court of Human Rights Proceedings Article
In: 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, 2019.
@inproceedings{OSullivan2019,
title = {Predicting the Outcome of Judicial Decisions made by the European Court of Human Rights},
author = {Conor O'Sullivan and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collier, Mark; Beel, Joeran
An Empirical Comparison of Syllabuses for Curriculum Learning Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 150–161, CEUR-WS, 2018.
@inproceedings{Collier2018a,
title = {An Empirical Comparison of Syllabuses for Curriculum Learning},
author = {Mark Collier and Joeran Beel},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {150–161},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collier, Mark; Beel, Joeran
Implementing Neural Turing Machines Proceedings Article
In: Kůrková, Věra; Manolopoulos, Yannis; Hammer, Barbara; Iliadis, Lazaros; Maglogiannis, Ilias (Ed.): 27th International Conference on Artificial Neural Networks (ICANN), pp. 94–104, Springer International Publishing, Cham, 2018, ISBN: 978-3-030-01424-7.
@inproceedings{Collier2018,
title = {Implementing Neural Turing Machines},
author = {Mark Collier and Joeran Beel},
editor = {Věra Kůrková and Yannis Manolopoulos and Barbara Hammer and Lazaros Iliadis and Ilias Maglogiannis},
doi = {10.1007/978-3-030-01424-7_10},
isbn = {978-3-030-01424-7},
year = {2018},
date = {2018-01-01},
booktitle = {27th International Conference on Artificial Neural Networks (ICANN)},
pages = {94–104},
publisher = {Springer International Publishing},
address = {Cham},
series = {Lecture Notes in Computer Science},
abstract = {Neural Turing Machines (NTMs) are an instance of Memory Augmented
Neural Networks, a new class of recurrent neural networks which decouple
computation from memory by introducing an external memory unit. NTMs
have demonstrated superior performance over Long Short-Term Memory
Cells in several sequence learning tasks. A number of open source
implementations of NTMs exist but are unstable during training and/or
fail to replicate the reported performance of NTMs. This paper presents
the details of our successful implementation of a NTM. Our implementation
learns to solve three sequential learning tasks from the original
NTM paper. We find that the choice of memory contents initialization
scheme is crucial in successfully implementing a NTM. Networks with
memory contents initialized to small constant values converge on
average 2 times faster than the next best memory contents initialization
scheme.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neural Turing Machines (NTMs) are an instance of Memory Augmented
Neural Networks, a new class of recurrent neural networks which decouple
computation from memory by introducing an external memory unit. NTMs
have demonstrated superior performance over Long Short-Term Memory
Cells in several sequence learning tasks. A number of open source
implementations of NTMs exist but are unstable during training and/or
fail to replicate the reported performance of NTMs. This paper presents
the details of our successful implementation of a NTM. Our implementation
learns to solve three sequential learning tasks from the original
NTM paper. We find that the choice of memory contents initialization
scheme is crucial in successfully implementing a NTM. Networks with
memory contents initialized to small constant values converge on
average 2 times faster than the next best memory contents initialization
scheme.
Tkaczyk, Dominika; Collins, Andrew; Sheridan, Paraic; Beel, Joeran
Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers Proceedings Article
In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 99–108, ACM, Fort Worth, Texas, USA, 2018, ISBN: 978-1-4503-5178-2.
@inproceedings{Tkaczyk2018a,
title = {Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers},
author = {Dominika Tkaczyk and Andrew Collins and Paraic Sheridan and Joeran Beel},
url = {http://doi.acm.org/10.1145/3197026.3197048},
doi = {10.1145/3197026.3197048},
isbn = {978-1-4503-5178-2},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries},
volume = {2259},
number = {1},
pages = {99–108},
publisher = {ACM},
address = {Fort Worth, Texas, USA},
series = {JCDL '18},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Brennan, Rob; Beel, Joeran; Byrne, Ruth; Debattista, Jeremy
Preface: The 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2018) Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 1–7, CEUR-WS, 2018.
@inproceedings{Brennan2018,
title = {Preface: The 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2018)},
author = {Rob Brennan and Joeran Beel and Ruth Byrne and Jeremy Debattista},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {1–7},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Dinesh, Siddharth; Mayr, Philipp; Carevic, Zeljko; Raghvendra, Jain
Stereotype and Most-Popular Recommendations in the Digital Library Sowiport Proceedings Article
In: Proceedings of the 15th International Symposium of Information Science (ISI), pp. 96–108, 2017.
@inproceedings{Beel2017d,
title = {Stereotype and Most-Popular Recommendations in the Digital Library Sowiport},
author = {Joeran Beel and Siddharth Dinesh and Philipp Mayr and Zeljko Carevic and Jain Raghvendra},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 15th International Symposium of Information Science (ISI)},
volume = {23},
number = {7/8},
pages = {96–108},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Gipp, Bela
TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users' Personal Document Collections Proceedings Article
In: Proceedings of the 12th iConference, 2017.
@inproceedings{Beel2017a,
title = {TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users' Personal Document Collections},
author = {Joeran Beel and Stefan Langer and Bela Gipp},
doi = {10.13140/RG.2.2.18759.39842},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 12th iConference},
abstract = {TF-IDF is one of the most popular term-weighting schemes, and is applied
by search engines, recommender systems, and user modeling engines.
With regard to user modeling and recommender systems, we see two
shortcomings of TF-IDF. First, calculating IDF requires access to
the document corpus from which recommendations are made. Such access
is not always given in a user-modeling or recommender system. Second,
TF-IDF ignores information from a user’s personal document
collection, which could – so we hypothesize – enhance
the user modeling process. In this paper, we introduce TF-IDuF as
a term-weighting scheme that does not require access to the general
document corpus and that considers information from the users’
personal document collections. We evaluated the effectiveness of
TF-IDuF compared to TF-IDF and TF-Only and found that TF-IDF and
TF-IDuF perform similarly (click-through rates (CTR) of 5.09% vs.
5.14%), and both are around 25% more effective than TF-Only (CTR
of 4.06%) for recommending research papers. Consequently, we conclude
that TF-IDuF could be a promising term-weighting scheme, especially
when access to the document corpus for recommendations is not possible,
and thus classic IDF cannot be computed. It is also notable that
TF-IDuF and TF-IDF are not exclusive, so that both metrics may be
combined to a more effective term-weighting scheme.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
TF-IDF is one of the most popular term-weighting schemes, and is applied
by search engines, recommender systems, and user modeling engines.
With regard to user modeling and recommender systems, we see two
shortcomings of TF-IDF. First, calculating IDF requires access to
the document corpus from which recommendations are made. Such access
is not always given in a user-modeling or recommender system. Second,
TF-IDF ignores information from a user’s personal document
collection, which could – so we hypothesize – enhance
the user modeling process. In this paper, we introduce TF-IDuF as
a term-weighting scheme that does not require access to the general
document corpus and that considers information from the users’
personal document collections. We evaluated the effectiveness of
TF-IDuF compared to TF-IDF and TF-Only and found that TF-IDF and
TF-IDuF perform similarly (click-through rates (CTR) of 5.09% vs.
5.14%), and both are around 25% more effective than TF-Only (CTR
of 4.06%) for recommending research papers. Consequently, we conclude
that TF-IDuF could be a promising term-weighting scheme, especially
when access to the document corpus for recommendations is not possible,
and thus classic IDF cannot be computed. It is also notable that
TF-IDuF and TF-IDF are not exclusive, so that both metrics may be
combined to a more effective term-weighting scheme.
Beel, Joeran; Langer, Stefan; Kapitsaki, Georgia M.; Breitinger, Corinna; Gipp, Bela
Exploring the Potential of User Modeling based on Mind Maps Proceedings Article
In: Ricci, Francesco; Bontcheva, Kalina; Conlan, Owen; Lawless, Séamus (Ed.): Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP), pp. 3-17, Springer, 2015.
@inproceedings{Beel2015b,
title = {Exploring the Potential of User Modeling based on Mind Maps},
author = {Joeran Beel and Stefan Langer and Georgia M. Kapitsaki and Corinna Breitinger and Bela Gipp},
editor = {Francesco Ricci and Kalina Bontcheva and Owen Conlan and Séamus Lawless},
doi = {10.1007/978-3-319-20267-9_1},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP)},
volume = {9146},
pages = {3-17},
publisher = {Springer},
series = {Lecture Notes of Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps Journal Article
In: PhD Thesis. Otto-von-Guericke Universität Magdeburg, 2015.
@article{Beel2015,
title = {Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps},
author = {Joeran Beel},
year = {2015},
date = {2015-01-01},
journal = {PhD Thesis. Otto-von-Guericke Universität Magdeburg},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran
Research paper recommendations based on mind maps Book Section
In: Arndt, Hans-Knud; Krcmar, Helmut (Ed.): Very Large Business Applications (VLBA): System Landscapes of the Future, pp. 66–75, Shaker Verlag, 2011.
@incollection{Beel2011a,
title = {Research paper recommendations based on mind maps},
author = {Joeran Beel},
editor = {Hans-Knud Arndt and Helmut Krcmar},
year = {2011},
date = {2011-08-01},
booktitle = {Very Large Business Applications (VLBA): System Landscapes of the Future},
pages = {66–75},
publisher = {Shaker Verlag},
series = {Berichte aus der Wirtschaftsinformatik},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
Beel, Joeran; Gipp, Bela
Detection of a similarity of documents by Citation Proximity Analysis Patent
2010, (WO/2010/078857).
@patent{Beel2010f,
title = {Detection of a similarity of documents by Citation Proximity Analysis},
author = {Joeran Beel and Bela Gipp},
year = {2010},
date = {2010-01-01},
abstract = {(DE) Die Erfindung betrifft ein computer-implementiertes Verfahren
zum Ermitteln einer Ähnlichkeit zwischen zumindest einem Eingabedokument
und einer Anzahl von Dokumenten. Es werden erste Dokumente und zweite
Dokumente ermittelt, welche direkt oder indirekt von dem Eingabedokument
referenziert werden oder das Eingabedokument referenzieren. Für
jedes ermittelte Dokument wird mindestens ein vorläufiger Ähnlichkeitswert
berechnet. Ist zu einem Dokument mehr als ein vorläufiger Ähnlichkeitswert
berechnet worden, wird aus den vorläufigen Ahnlichkeitswerten
ein endgültiger Ähnlichkeitswert berechnet. Das Verfahren
kann wiederum auf die ermittelten ersten Dokumente und zweiten Dokumente
angewandt werden, um weitere ähnliche Dokumente zu dem Eingabedokument
zu ermitteln und deren Ähnlichkeitswerte zu dem Eingabedokument
zu berechnen. (EN) The invention relates to a computer-implemented
method for detecting a similarity between at least one input document
and a plurality of documents. First documents and second documents
are detected which are directly or indirectly cited by the input
document or which directly or indirectly cite the input document.
At least one preliminary similarity value is calculated for every
detected document. If more than one preliminary similarity value
has been calculated for a document, a final similarity value is calculated
from the preliminary similarity values. The method can then be applied
to the detected first documents and second documents to detect further
documents that are similar to the input document and to calculate
their similarity values to the input document. (FR) L'invention concerne
un procédé informatisé servant àdéterminer une similarité
entre au moins un document d'entrée et un certain nombre de documents.
Ce procédé consiste àdéterminer des premiers et des
deuxièmes documents qui sont directement ou indirectement référencés
par le document d'entrée ou qui référencent le document
d'entrée. Au moins une valeur de similarité provisoire est
calculée pour chaque document déterminé. Si plus d'une valeur
de similarité provisoire a été calculée pour un document,
une valeur de similarité définitive est calculée àpartir
des valeurs de similarité provisoire. Ce procédé peut encore
être appliqué aux premiers et aux deuxièmes documents déterminés
pour déterminer d'autres documents similaires au document d'entrée
et pour calculer leurs valeurs de similarité par rapport au document
d'entrée.},
howpublished = {Patent Application},
note = {WO/2010/078857},
keywords = {},
pubstate = {published},
tppubtype = {patent}
}
(DE) Die Erfindung betrifft ein computer-implementiertes Verfahren
zum Ermitteln einer Ähnlichkeit zwischen zumindest einem Eingabedokument
und einer Anzahl von Dokumenten. Es werden erste Dokumente und zweite
Dokumente ermittelt, welche direkt oder indirekt von dem Eingabedokument
referenziert werden oder das Eingabedokument referenzieren. Für
jedes ermittelte Dokument wird mindestens ein vorläufiger Ähnlichkeitswert
berechnet. Ist zu einem Dokument mehr als ein vorläufiger Ähnlichkeitswert
berechnet worden, wird aus den vorläufigen Ahnlichkeitswerten
ein endgültiger Ähnlichkeitswert berechnet. Das Verfahren
kann wiederum auf die ermittelten ersten Dokumente und zweiten Dokumente
angewandt werden, um weitere ähnliche Dokumente zu dem Eingabedokument
zu ermitteln und deren Ähnlichkeitswerte zu dem Eingabedokument
zu berechnen. (EN) The invention relates to a computer-implemented
method for detecting a similarity between at least one input document
and a plurality of documents. First documents and second documents
are detected which are directly or indirectly cited by the input
document or which directly or indirectly cite the input document.
At least one preliminary similarity value is calculated for every
detected document. If more than one preliminary similarity value
has been calculated for a document, a final similarity value is calculated
from the preliminary similarity values. The method can then be applied
to the detected first documents and second documents to detect further
documents that are similar to the input document and to calculate
their similarity values to the input document. (FR) L'invention concerne
un procédé informatisé servant àdéterminer une similarité
entre au moins un document d'entrée et un certain nombre de documents.
Ce procédé consiste àdéterminer des premiers et des
deuxièmes documents qui sont directement ou indirectement référencés
par le document d'entrée ou qui référencent le document
d'entrée. Au moins une valeur de similarité provisoire est
calculée pour chaque document déterminé. Si plus d'une valeur
de similarité provisoire a été calculée pour un document,
une valeur de similarité définitive est calculée àpartir
des valeurs de similarité provisoire. Ce procédé peut encore
être appliqué aux premiers et aux deuxièmes documents déterminés
pour déterminer d'autres documents similaires au document d'entrée
et pour calculer leurs valeurs de similarité par rapport au document
d'entrée.
Beel, Joeran; Gipp, Bela.
Link analysis in mind maps: a new approach to determining document relatedness Proceedings Article
In: Lalmas, M; Jose, J; Rauber, A; Sebastiani, R; Frommholz, I (Ed.): Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication (ICUIMC '10), pp. 38:1–38:5, ACM Springer, Glasgow (UK), 2010, (Doctoral Consortium).
@inproceedings{Beel2010b,
title = {Link analysis in mind maps: a new approach to determining document relatedness},
author = {Joeran Beel and Bela. Gipp},
editor = {M Lalmas and J Jose and A Rauber and R Sebastiani and I Frommholz},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication (ICUIMC '10)},
volume = {6273},
pages = {38:1–38:5},
publisher = {Springer},
address = {Glasgow (UK)},
organization = {ACM},
series = {Lecture Notes of Computer Science (LNCS)},
note = {Doctoral Consortium},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Gipp, Bela; Beel, Joeran
Identifying Related Documents For Research Paper Recommender By CPA And COA Proceedings Article
In: Ao, S. I.; Douglas, C.; Grundfest, W. S.; Burgstone, J. (Ed.): Proceedings of The World Congress on Engineering and Computer Science 2009, pp. 636–639, International Association of Engineers (IAENG) Newswood Limited, Berkeley (USA), 2009, ISBN: 978-988-17012-6-8, (Available at: url http://sciplore.org/pub).
@inproceedings{Gipp2009,
title = {Identifying Related Documents For Research Paper Recommender By CPA And COA},
author = {Bela Gipp and Joeran Beel},
editor = {S. I. Ao and C. Douglas and W. S. Grundfest and J. Burgstone},
url = {https://www.iaeng.org/publication/WCECS2009/WCECS2009_pp636-639.pdf},
isbn = {978-988-17012-6-8},
year = {2009},
date = {2009-10-01},
booktitle = {Proceedings of The World Congress on Engineering and Computer Science 2009},
volume = {1},
pages = {636–639},
publisher = {Newswood Limited},
address = {Berkeley (USA)},
organization = {International Association of Engineers (IAENG)},
series = {Lecture Notes in Engineering and Computer Science},
abstract = {This work-in-progress paper introduces two new approaches called Citation Proximity Analysis (CPA) and Citation Order Analysis (COA). They can be applied to identify related documents for the purpose of research paper recommender systems. CPA is a variant of co-citation analysis that additionally considers the proximity of citations to each other within an article’s full-text. The underlying idea is that the closer citations are to each other in a document, the more likely it is that the cited documents are related. For example, citations listed in the same sentence are more likely to express related thoughts than citations listed only in the same section. In COA, the order of citations are considered, allowing the identification of a text similar to one that has been translated from language A to language B, as the citations would still occur in the same order. However, it is also shown that CPA and COA cannot replace text analysis and existing citation analysis approaches for research paper recommender systems since they all have their own strengths and weaknesses.},
note = {Available at: url http://sciplore.org/pub},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This work-in-progress paper introduces two new approaches called Citation Proximity Analysis (CPA) and Citation Order Analysis (COA). They can be applied to identify related documents for the purpose of research paper recommender systems. CPA is a variant of co-citation analysis that additionally considers the proximity of citations to each other within an article’s full-text. The underlying idea is that the closer citations are to each other in a document, the more likely it is that the cited documents are related. For example, citations listed in the same sentence are more likely to express related thoughts than citations listed only in the same section. In COA, the order of citations are considered, allowing the identification of a text similar to one that has been translated from language A to language B, as the citations would still occur in the same order. However, it is also shown that CPA and COA cannot replace text analysis and existing citation analysis approaches for research paper recommender systems since they all have their own strengths and weaknesses.
Gipp, Bela; Beel, Joeran
Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis Proceedings Article
In: Larsen, Birger; Leta, Jacqueline (Ed.): Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09), pp. 571–575, International Society for Scientometrics and Informetrics, Rio de Janeiro (Brazil), 2009, (ISSN 2175-1935. Available at: url http://sciplore.org/pub).
@inproceedings{Gipp2009a,
title = {Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis},
author = {Bela Gipp and Joeran Beel},
editor = {Birger Larsen and Jacqueline Leta},
year = {2009},
date = {2009-07-01},
booktitle = {Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09)},
volume = {2},
pages = {571–575},
publisher = {International Society for Scientometrics and Informetrics},
address = {Rio de Janeiro (Brazil)},
note = {ISSN 2175-1935. Available at: url http://sciplore.org/pub},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Applications of Machine Learning
Grennan, Mark; Beel, Joeran
Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and CORA Proceedings Article
In: Proceedings of the 8th International Workshop on Mining Scientific Publications, pp. 27–35, Association for Computational Linguistics, Wuhan, China, 2020.
@inproceedings{Grennan2020,
title = {Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and CORA},
author = {Mark Grennan and Joeran Beel},
url = {https://aclanthology.org/2020.wosp-1.4.pdf
https://www.aclweb.org/anthology/2020.wosp-1.4},
year = {2020},
date = {2020-08-01},
booktitle = {Proceedings of the 8th International Workshop on Mining Scientific Publications},
pages = {27–35},
publisher = {Association for Computational Linguistics},
address = {Wuhan, China},
abstract = {Citation parsing, particularly with deep neural networks, suffers
from a lack of training data as available datasets typically contain
only a few thousand training instances. Manually labelling citation
strings is very time-consuming, hence synthetically created training
data could be a solution. However, as of now, it is unknown if synthetically
created reference-strings are suitable to train machine learning
algorithms for citation parsing. To find out, we train Grobid, which
uses Conditional Random Fields, with a) human-labelled reference
strings from `real' bibliographies and b) synthetically created
reference strings from the GIANT dataset. We find that both synthetic
and organic reference strings are equally suited for training Grobid (F1 = 0.74). We additionally find that retraining Grobid has a notable
impact on its performance, for both synthetic and real data (+30%
in F1). Having as many types of labelled fields as possible during
training also improves effectiveness, even if these fields are not
available in the evaluation data (+13.5% F1). We conclude that
synthetic data is suitable for training (deep) citation parsing models.
We further suggest that in future evaluations of reference parsers
both evaluation data similar and dissimilar to the training data
should be used for more meaningful evaluations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Citation parsing, particularly with deep neural networks, suffers
from a lack of training data as available datasets typically contain
only a few thousand training instances. Manually labelling citation
strings is very time-consuming, hence synthetically created training
data could be a solution. However, as of now, it is unknown if synthetically
created reference-strings are suitable to train machine learning
algorithms for citation parsing. To find out, we train Grobid, which
uses Conditional Random Fields, with a) human-labelled reference
strings from `real' bibliographies and b) synthetically created
reference strings from the GIANT dataset. We find that both synthetic
and organic reference strings are equally suited for training Grobid (F1 = 0.74). We additionally find that retraining Grobid has a notable
impact on its performance, for both synthetic and real data (+30%
in F1). Having as many types of labelled fields as possible during
training also improves effectiveness, even if these fields are not
available in the evaluation data (+13.5% F1). We conclude that
synthetic data is suitable for training (deep) citation parsing models.
We further suggest that in future evaluations of reference parsers
both evaluation data similar and dissimilar to the training data
should be used for more meaningful evaluations.
Carroll, Oisín; Beel, Joeran
Finite Group Equivariant Neural Networks for Games Journal Article
In: arXiv, no. 2009.05027, pp. 1–8, 2020.
@article{Carroll2020,
title = {Finite Group Equivariant Neural Networks for Games},
author = {Oisín Carroll and Joeran Beel},
url = {https://arxiv.org/abs/2009.05027},
year = {2020},
date = {2020-01-01},
journal = {arXiv},
number = {2009.05027},
pages = {1–8},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Tunstead, Keith; Beel, Joeran
Combating Stagnation in Reinforcement Learning Through 'Guided Learning' With 'Taught-Response Memory' Proceedings Article
In: 3rd International Tutorial & Workshop on Interactive Adaptive Learning (IAL2019) at the ECML PKDD Conference, 2019.
@inproceedings{Tunstead2019,
title = {Combating Stagnation in Reinforcement Learning Through 'Guided Learning' With 'Taught-Response Memory'},
author = {Keith Tunstead and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {3rd International Tutorial & Workshop on Interactive Adaptive Learning (IAL2019) at the ECML PKDD Conference},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collier, Mark; Beel, Joeran
Memory-Augmented Neural Networks for Machine Translation Proceedings Article
In: Proceedings of the Machine Translation (MT) Summit, 2019.
@inproceedings{Collier2019,
title = {Memory-Augmented Neural Networks for Machine Translation},
author = {Mark Collier and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the Machine Translation (MT) Summit},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Bonello, Nicholas; Debattista, Jeremy; Beel, Joeran; Lawless, Seamus
Multi-stream Data Analytics for Enhanced Performance Prediction in Fantasy Football Proceedings Article
In: 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, 2019.
@inproceedings{Bonello2019,
title = {Multi-stream Data Analytics for Enhanced Performance Prediction in Fantasy Football},
author = {Nicholas Bonello and Jeremy Debattista and Joeran Beel and Seamus Lawless},
year = {2019},
date = {2019-01-01},
booktitle = {27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O'Sullivan, Conor; Beel, Joeran
Predicting the Outcome of Judicial Decisions made by the European Court of Human Rights Proceedings Article
In: 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, 2019.
@inproceedings{OSullivan2019,
title = {Predicting the Outcome of Judicial Decisions made by the European Court of Human Rights},
author = {Conor O'Sullivan and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collier, Mark; Beel, Joeran
An Empirical Comparison of Syllabuses for Curriculum Learning Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 150–161, CEUR-WS, 2018.
@inproceedings{Collier2018a,
title = {An Empirical Comparison of Syllabuses for Curriculum Learning},
author = {Mark Collier and Joeran Beel},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {150–161},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Collier, Mark; Beel, Joeran
Implementing Neural Turing Machines Proceedings Article
In: Kůrková, Věra; Manolopoulos, Yannis; Hammer, Barbara; Iliadis, Lazaros; Maglogiannis, Ilias (Ed.): 27th International Conference on Artificial Neural Networks (ICANN), pp. 94–104, Springer International Publishing, Cham, 2018, ISBN: 978-3-030-01424-7.
@inproceedings{Collier2018,
title = {Implementing Neural Turing Machines},
author = {Mark Collier and Joeran Beel},
editor = {Věra Kůrková and Yannis Manolopoulos and Barbara Hammer and Lazaros Iliadis and Ilias Maglogiannis},
doi = {10.1007/978-3-030-01424-7_10},
isbn = {978-3-030-01424-7},
year = {2018},
date = {2018-01-01},
booktitle = {27th International Conference on Artificial Neural Networks (ICANN)},
pages = {94–104},
publisher = {Springer International Publishing},
address = {Cham},
series = {Lecture Notes in Computer Science},
abstract = {Neural Turing Machines (NTMs) are an instance of Memory Augmented
Neural Networks, a new class of recurrent neural networks which decouple
computation from memory by introducing an external memory unit. NTMs
have demonstrated superior performance over Long Short-Term Memory
Cells in several sequence learning tasks. A number of open source
implementations of NTMs exist but are unstable during training and/or
fail to replicate the reported performance of NTMs. This paper presents
the details of our successful implementation of a NTM. Our implementation
learns to solve three sequential learning tasks from the original
NTM paper. We find that the choice of memory contents initialization
scheme is crucial in successfully implementing a NTM. Networks with
memory contents initialized to small constant values converge on
average 2 times faster than the next best memory contents initialization
scheme.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neural Turing Machines (NTMs) are an instance of Memory Augmented
Neural Networks, a new class of recurrent neural networks which decouple
computation from memory by introducing an external memory unit. NTMs
have demonstrated superior performance over Long Short-Term Memory
Cells in several sequence learning tasks. A number of open source
implementations of NTMs exist but are unstable during training and/or
fail to replicate the reported performance of NTMs. This paper presents
the details of our successful implementation of a NTM. Our implementation
learns to solve three sequential learning tasks from the original
NTM paper. We find that the choice of memory contents initialization
scheme is crucial in successfully implementing a NTM. Networks with
memory contents initialized to small constant values converge on
average 2 times faster than the next best memory contents initialization
scheme.
Tkaczyk, Dominika; Collins, Andrew; Sheridan, Paraic; Beel, Joeran
Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers Proceedings Article
In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 99–108, ACM, Fort Worth, Texas, USA, 2018, ISBN: 978-1-4503-5178-2.
@inproceedings{Tkaczyk2018a,
title = {Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers},
author = {Dominika Tkaczyk and Andrew Collins and Paraic Sheridan and Joeran Beel},
url = {http://doi.acm.org/10.1145/3197026.3197048},
doi = {10.1145/3197026.3197048},
isbn = {978-1-4503-5178-2},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries},
volume = {2259},
number = {1},
pages = {99–108},
publisher = {ACM},
address = {Fort Worth, Texas, USA},
series = {JCDL '18},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Brennan, Rob; Beel, Joeran; Byrne, Ruth; Debattista, Jeremy
Preface: The 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2018) Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 1–7, CEUR-WS, 2018.
@inproceedings{Brennan2018,
title = {Preface: The 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2018)},
author = {Rob Brennan and Joeran Beel and Ruth Byrne and Jeremy Debattista},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
pages = {1–7},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Machine Translation
Collier, Mark; Beel, Joeran
Memory-Augmented Neural Networks for Machine Translation Proceedings Article
In: Proceedings of the Machine Translation (MT) Summit, 2019.
@inproceedings{Collier2019,
title = {Memory-Augmented Neural Networks for Machine Translation},
author = {Mark Collier and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the Machine Translation (MT) Summit},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Datasets for Recommender Systems, Machine Learning and Information Extraction
CITREC Citations
The CITREC dataset contains the data of two formerly separate collections for citation analysis and it provides the tools necessary for performing evaluations of similarity measures. The first collection is the PubMed Central Open Access Subset (PMC OAS), the second is the collection used for the Genomics Tracks at the Text REtrieval Conferences (TREC)’06 and ’07 (overview paper for the TREC Gen collection). CITREC extends the PMC OAS and TREC Genomics collections by providing:
- citation and reference information that includes the position of in-text citations for documents in both collections;
- code and pre-computed scores for 35 citation-based and text-based similarity measures;
- two gold standards based on Medical Subject Headings (MeSH) descriptors and the relevance feedback gathered for the TREC Genomics collection;
- a web-based system (Literature Recommendation Evaluator – LRE) that allows evaluating similarity measures on their ability to identify documents that are relevant to user-defined information needs;
- tools to statistically analyze and compare the scores that individual similarity measures yield.
RARD — Related Articles
RARD — the Related-Article Recommendation Dataset — is based on the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains information about 57.4 million recommendations that were displayed to the users of Sowiport. Information includes details on which recommendation approaches were used (e.g. content-based filtering, stereotype, most popular), what types of features were used in content-based filtering (simple terms vs. keyphrases), where the features were extracted from (title or abstract), and the time when recommendations were delivered and clicked. In addition, the dataset contains an implicit item-item rating matrix that was created based on the recommendation click logs. RARD enables researchers to train machine learning algorithms for research-paper recommendations, perform offline evaluations, and do research on data from Mr. DLib’s recommender system, without implementing a recommender system themselves. In the field of scientific recommender systems, our dataset is unique. To the best of our knowledge, there is no dataset with more (implicit) ratings available, and that many variations of recommendation algorithms. The dataset is published under the “Creative Commons Attribution 3.0 Unported (CC-BY)” license.
Docear RecSys
We released a dataset based on the recommender system in our reference management software Docear. The dataset contains the following sub-datasets.
Research Papers — The research papers dataset contains information about the research papers that Docear’s PDF Spider crawled and their citations. This includes information about 9.4 million documents, including 7.95 million citations and 1.8 million URLs to freely available academic PDFs on the Web. The dataset also provides citation positions, i.e. where in a document a citation occurs. This leads to 19.3 million entries in the dataset.
Mind-Maps / User libraries — The mind maps dataset contains information on 390,613 revisions of 52,202 mind-maps created by 12,038 users. The mind-maps themselves are not included in the dataset due to privacy reasons. Information includes the number of nodes, the documents that are linked, the id of the user who created the mind-map, and when mind-maps were created
Users — The user dataset contains information about 8,059 of 21,439 registered users, namely about those who activated recommendations and agreed to have their data analyzed and published. Among others, the dataset includes information about the users’ date of registration, gender and age (if provided during registration), usage intensity of Docear, when Docear was last started, when recommendations were last received, the number of created mind-maps, number of papers, how recommendations were labeled, the number of received recommendations, and click-through rates (CTR) on recommendations.
Recommendations — The recommendation dataset contains information on 308,146 recommendations that were delivered to 3,470 users between March 2013 and March 2014. This includes the date of creation and delivery, the time required to generate recommendations and corresponding user models, and information on the algorithm that generated the recommendations. Information on the algorithms is manyfold. We stored whether stop words were removed, which weighting scheme was applied, whether terms and/or citations were used for the user modelling process and 28 other variables that are described in more detail in the dataset’s readme file.
We are interested in many research areas and are always open to exploring new exciting areas including deep learning, precision medicine, bias in AI, explainability, and many more. Feel also free to visit our colleagues’ web pages relating to machine learning at the University of Siegen.
Donorschoose.org
RARD
RARD — the Related-Article Recommendation Dataset — is based on the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains information about 57.4 million recommendations that were displayed to the users of Sowiport. Information includes details on which recommendation approaches were used (e.g. content-based filtering, stereotype, most popular), what types of features were used in content-based filtering (simple terms vs. keyphrases), where the features were extracted from (title or abstract), and the time when recommendations were delivered and clicked. In addition, the dataset contains an implicit item-item rating matrix that was created based on the recommendation click logs. RARD enables researchers to train machine learning algorithms for research-paper recommendations, perform offline evaluations, and do research on data from Mr. DLib’s recommender system, without implementing a recommender system themselves. In the field of scientific recommender systems, our dataset is unique. To the best of our knowledge, there is no dataset with more (implicit) ratings available, and that many variations of recommendation algorithms. The dataset is published under the “Creative Commons Attribution 3.0 Unported (CC-BY)” license.
Docear RecSys
We released a dataset based on the recommender system in our reference management software Docear. The dataset contains the following sub-datasets.
Research Papers — The research papers dataset contains information about the research papers that Docear’s PDF Spider crawled and their citations. This includes information about 9.4 million documents, including 7.95 million citations and 1.8 million URLs to freely available academic PDFs on the Web. The dataset also provides citation positions, i.e. where in a document a citation occurs. This leads to 19.3 million entries in the dataset.
Mind-Maps / User libraries — The mind maps dataset contains information on 390,613 revisions of 52,202 mind-maps created by 12,038 users. The mind-maps themselves are not included in the dataset due to privacy reasons. Information includes the number of nodes, the documents that are linked, the id of the user who created the mind-map, and when mind-maps were created
Users — The user dataset contains information about 8,059 of 21,439 registered users, namely about those who activated recommendations and agreed to have their data analyzed and published. Among others, the dataset includes information about the users’ date of registration, gender and age (if provided during registration), usage intensity of Docear, when Docear was last started, when recommendations were last received, the number of created mind-maps, number of papers, how recommendations were labeled, the number of received recommendations, and click-through rates (CTR) on recommendations.
Recommendations — The recommendation dataset contains information on 308,146 recommendations that were delivered to 3,470 users between March 2013 and March 2014. This includes the date of creation and delivery, the time required to generate recommendations and corresponding user models, and information on the algorithm that generated the recommendations. Information on the algorithms is manyfold. We stored whether stop words were removed, which weighting scheme was applied, whether terms and/or citations were used for the user modelling process and 28 other variables that are described in more detail in the dataset’s readme file.
Other Domains
(Academic) Search & Search Engine Optimization (ASEO)
Beel, Joeran; Gipp, Bela
Academic search engine spam and Google Scholar's resilience against it Journal Article
In: Journal of Electronic Publishing, vol. 13, no. 3, 2010.
@article{Beel2010,
title = {Academic search engine spam and Google Scholar's resilience against it},
author = {Joeran Beel and Bela Gipp},
doi = {10.3998/3336451.0013.305},
year = {2010},
date = {2010-12-01},
journal = {Journal of Electronic Publishing},
volume = {13},
number = {3},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Gipp, Bela
Enhancing search applications by utilizing mind maps Proceedings Article
In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT'10), pp. 303–304, ACM, Toronto (CA), 2010, (Available at http://docear.org).
@inproceedings{Beel2010e,
title = {Enhancing search applications by utilizing mind maps},
author = {Joeran Beel and Bela Gipp},
year = {2010},
date = {2010-06-01},
booktitle = {Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT'10)},
pages = {303–304},
publisher = {ACM},
address = {Toronto (CA)},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Gipp, Bela
On the Robustness of Google Scholar Against Spam Proceedings Article
In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT'10), pp. 297–298, ACM, Toronto (CA), 2010.
@inproceedings{Beel2010a,
title = {On the Robustness of Google Scholar Against Spam},
author = {Joeran Beel and Bela Gipp},
doi = {10.1145/1810617.1810683},
year = {2010},
date = {2010-06-01},
booktitle = {Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT'10)},
pages = {297–298},
publisher = {ACM},
address = {Toronto (CA)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Gipp, Bela; Wilde, Erik
Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar and Co. Journal Article
In: Journal of Scholarly Publishing, vol. 41, no. 2, pp. 176–190, 2010, (University of Toronto Press. Available at http://docear.org).
@article{Beel2010h,
title = {Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar and Co.},
author = {Joeran Beel and Bela Gipp and Erik Wilde},
doi = {10.3138/jsp.41.2.176},
year = {2010},
date = {2010-01-01},
journal = {Journal of Scholarly Publishing},
volume = {41},
number = {2},
pages = {176–190},
note = {University of Toronto Press. Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran
Retrieving Data from Mind Maps to Enhance Search Applications Journal Article
In: Bulletin of IEEE Technical Committee on Digital Libraries, vol. 6, no. 2, 2010, (Available at http://docear.org).
@article{Beel2010d,
title = {Retrieving Data from Mind Maps to Enhance Search Applications},
author = {Joeran Beel},
year = {2010},
date = {2010-01-01},
journal = {Bulletin of IEEE Technical Committee on Digital Libraries},
volume = {6},
number = {2},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Gipp, Bela; Stiller, Jan-Olaf
Information Retrieval on Mind Maps - What could it be good for? Proceedings Article
In: Proceedings of the 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'09), pp. 1–4, IEEE, Washington (USA), 2009, (Available at http://docear.org).
@inproceedings{Beel2009f,
title = {Information Retrieval on Mind Maps - What could it be good for?},
author = {Joeran Beel and Bela Gipp and Jan-Olaf Stiller},
url = {https://ieeexplore.ieee.org/abstract/document/5364172
https://isg.beel.org/pubs/Information Retrieval on Mind Maps - What could it be good for –preprint.pdf},
year = {2009},
date = {2009-11-01},
booktitle = {Proceedings of the 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'09)},
pages = {1–4},
publisher = {IEEE},
address = {Washington (USA)},
abstract = {Mind maps are used by millions of people. In this paper we present how information retrieval on mind maps could be used to enhance expert search, document summarization, keyword based search engines, document recommender systems and determining word relatedness. For instance, words in a mind map could be used for creating a skill profile of the mind maps' author and hence enhance expert search. This paper is a research-in-progress paper which means no research results are presented but only ideas.},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Mind maps are used by millions of people. In this paper we present how information retrieval on mind maps could be used to enhance expert search, document summarization, keyword based search engines, document recommender systems and determining word relatedness. For instance, words in a mind map could be used for creating a skill profile of the mind maps' author and hence enhance expert search. This paper is a research-in-progress paper which means no research results are presented but only ideas.
Beel, Joeran; Gipp, Bela; Stiller, Jan-Olaf
Could Mind Maps Be Used To Improve Academic Search Engines? Proceedings Article
In: Ao, S. I.; Douglas, C.; Grundfest, W. S.; Burgstone, J. (Ed.): International Conference on Machine Learning and Data Analysis (ICMLDA'09), pp. 832–834, International Association of Engineers (IAENG) Newswood Limited, Berkeley (USA), 2009, (Available at http://docear.org).
@inproceedings{Beel2009e,
title = {Could Mind Maps Be Used To Improve Academic Search Engines?},
author = {Joeran Beel and Bela Gipp and Jan-Olaf Stiller},
editor = {S. I. Ao and C. Douglas and W. S. Grundfest and J. Burgstone},
url = {http://www.iaeng.org/publication/WCECS2009/WCECS2009_pp832-834.pdf},
year = {2009},
date = {2009-10-01},
booktitle = {International Conference on Machine Learning and Data Analysis (ICMLDA'09)},
volume = {2},
pages = {832–834},
publisher = {Newswood Limited},
address = {Berkeley (USA)},
organization = {International Association of Engineers (IAENG)},
series = {Lecture Notes in Engineering and Computer Science},
abstract = {In this paper the idea of mind map mining is presented. We propose that information retrieved from mind maps could improve academic search engines. The basic idea is that from a mind map’s text, keywords can be retrieved to describe research articles referenced by the mind map. So far, we have not conducted any research on mind map mining. Therefore this paper should only be seen as an early research in progress paper, outlining the ideas and aiming to stimulate a discussion. We start the discussion in this paper by presenting some challenges that mind map mining is likely to face.},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper the idea of mind map mining is presented. We propose that information retrieved from mind maps could improve academic search engines. The basic idea is that from a mind map’s text, keywords can be retrieved to describe research articles referenced by the mind map. So far, we have not conducted any research on mind map mining. Therefore this paper should only be seen as an early research in progress paper, outlining the ideas and aiming to stimulate a discussion. We start the discussion in this paper by presenting some challenges that mind map mining is likely to face.
Beel, Joeran
Information Retrieval in Mind Maps zum Verbessern von Suchapplikationen Proceedings Article
In: Arndt, H. -K.; Krcmar, H. (Ed.): Very Large Business Applications (VLBA): Systemlandschaften der Zukunft, pp. 139–152, Shaker Verlag, Magdeburg, 2009.
@inproceedings{Beel2009,
title = {Information Retrieval in Mind Maps zum Verbessern von Suchapplikationen},
author = {Joeran Beel},
editor = {H. -K. Arndt and H. Krcmar},
year = {2009},
date = {2009-10-01},
booktitle = {Very Large Business Applications (VLBA): Systemlandschaften der Zukunft},
volume = {3},
pages = {139–152},
publisher = {Shaker Verlag},
address = {Magdeburg},
series = {Workshop des Centers for Very Large Business Applications (CVLBA)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Gipp, Bela
Google Scholar's Ranking Algorithm: An Introductory Overview Proceedings Article
In: Larsen, Birger; Leta, Jacqueline (Ed.): Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09), pp. 230–241, International Society for Scientometrics and Informetrics, Rio de Janeiro (Brazil), 2009, (Available at http://docear.org).
@inproceedings{Beel2009b,
title = {Google Scholar's Ranking Algorithm: An Introductory Overview},
author = {Joeran Beel and Bela Gipp},
editor = {Birger Larsen and Jacqueline Leta},
url = {https://isg.beel.org/pubs/Google Scholar's Ranking Algorithm – An Introductory Overview – preprint.pdf},
year = {2009},
date = {2009-07-01},
booktitle = {Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09)},
volume = {1},
pages = {230–241},
publisher = {International Society for Scientometrics and Informetrics},
address = {Rio de Janeiro (Brazil)},
abstract = {Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. We performed the first steps to reverse-engineering Google Scholar’s ranking algorithm and present the results in this research-in-progress paper. The results are: Citation counts is the highest weighed factor in Google Scholar’s ranking algorithm. Therefore, highly cited articles are found significantly more often in higher positions than articles that have been cited less often. As a consequence, Google Scholar seems to be more suitable for finding standard literature than gems or articles by authors advancing a new or different view from the mainstream. However, interesting exceptions for some search queries occurred. Moreover, the occurrence of a search term in an article’s title seems to have a strong impact on the article’s ranking. The impact of search term frequencies in an article’s full text is weak. That means it makes no difference in an article’s ranking if the article contains the query terms only once or multiple times. It was further researched whether the name of an author or journal has an impact on the ranking and whether differences exist between the ranking algorithms of different search modes that Google Scholar offers. The answer in both of these cases was "yes". The results of our research may help authors to optimize their articles for Google Scholar and enable researchers to estimate the usefulness of Google Scholar with respect to their search intention and hence the need to use further academic search engines or databases.},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. We performed the first steps to reverse-engineering Google Scholar’s ranking algorithm and present the results in this research-in-progress paper. The results are: Citation counts is the highest weighed factor in Google Scholar’s ranking algorithm. Therefore, highly cited articles are found significantly more often in higher positions than articles that have been cited less often. As a consequence, Google Scholar seems to be more suitable for finding standard literature than gems or articles by authors advancing a new or different view from the mainstream. However, interesting exceptions for some search queries occurred. Moreover, the occurrence of a search term in an article’s title seems to have a strong impact on the article’s ranking. The impact of search term frequencies in an article’s full text is weak. That means it makes no difference in an article’s ranking if the article contains the query terms only once or multiple times. It was further researched whether the name of an author or journal has an impact on the ranking and whether differences exist between the ranking algorithms of different search modes that Google Scholar offers. The answer in both of these cases was "yes". The results of our research may help authors to optimize their articles for Google Scholar and enable researchers to estimate the usefulness of Google Scholar with respect to their search intention and hence the need to use further academic search engines or databases.
Beel, Joeran; Gipp, Bela
Google Scholar's Ranking Algorithm: The Impact of Articles' Age (An Empirical Study) Proceedings Article
In: Latifi, Shahram (Ed.): Proceedings of the 6th International Conference on Information Technology: New Generations (ITNG'09), pp. 160–164, IEEE, Las Vegas (USA), 2009, (Available at http://docear.org).
@inproceedings{Beel2009c,
title = {Google Scholar's Ranking Algorithm: The Impact of Articles' Age (An Empirical Study)},
author = {Joeran Beel and Bela Gipp},
editor = {Shahram Latifi},
url = {https://ieeexplore.ieee.org/abstract/document/5070610
https://isg.beel.org/pubs/Google Scholar's Ranking Algorithm - The Impact of Articles' Age (An Empirical Study) – preprint.pdf},
doi = {10.1109/ITNG.2009.317},
year = {2009},
date = {2009-04-01},
booktitle = {Proceedings of the 6th International Conference on Information Technology: New Generations (ITNG'09)},
pages = {160–164},
publisher = {IEEE},
address = {Las Vegas (USA)},
abstract = {Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. In recent studies we partly reverse-engineered the algorithm. This paper presents the results of our third study. While the first study provided a broad overview and the second study focused on researching the impact of citation counts, the current study focused on analyzing the correlation of an articlepsilas age and its ranking in Google Scholar. In other words, it was analyzed if older/recent published articles are more/less likely to appear in a top position in Google Scholarpsilas result lists. For our study, age and rankings of 1,099,749 articles retrieved via 2,100 search queries were analyzed. The analysis revealed that an articlepsilas age seems to play no significant role in Google Scholarpsilas ranking algorithm. It is also discussed why this might lead to a suboptimal ranking.},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. In recent studies we partly reverse-engineered the algorithm. This paper presents the results of our third study. While the first study provided a broad overview and the second study focused on researching the impact of citation counts, the current study focused on analyzing the correlation of an articlepsilas age and its ranking in Google Scholar. In other words, it was analyzed if older/recent published articles are more/less likely to appear in a top position in Google Scholarpsilas result lists. For our study, age and rankings of 1,099,749 articles retrieved via 2,100 search queries were analyzed. The analysis revealed that an articlepsilas age seems to play no significant role in Google Scholarpsilas ranking algorithm. It is also discussed why this might lead to a suboptimal ranking.
Beel, Joeran; Gipp, Bela
Google Scholar's Ranking Algorithm: The Impact of Citation Counts (An Empirical Study) Proceedings Article
In: Proceedings of the 3rd IEEE International Conference on Research Challenges in Information Science (RCIS'09), pp. 439–446, IEEE, Fez (Morocco), 2009, (Available at http://docear.org).
@inproceedings{Beel2009a,
title = {Google Scholar's Ranking Algorithm: The Impact of Citation Counts (An Empirical Study)},
author = {Joeran Beel and Bela Gipp},
url = {https://ieeexplore.ieee.org/document/5089308
https://isg.beel.org/pubs/Google Scholar's Ranking Algorithm - The Impact of Articles' Age (An Empirical Study) – preprint.pdf},
doi = {10.1109/RCIS.2009.5089308},
year = {2009},
date = {2009-04-01},
booktitle = {Proceedings of the 3rd IEEE International Conference on Research Challenges in Information Science (RCIS'09)},
pages = {439–446},
publisher = {IEEE},
address = {Fez (Morocco)},
abstract = {Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. In a recent study we partly reverse-engineered the algorithm. This paper presents the results of our second study. While the previous study provided a broad overview, the current study focused on analyzing the correlation of an article's citation count and its ranking in Google Scholar. For this study, citation counts and rankings of 1,364,757 articles were analyzed. Some results of our first study were confirmed: Citation counts is the highest weighed factor in Google Scholar's ranking algorithm. Highly cited articles are found significantly more often in higher positions than articles that are cited less often. Therefore, Google Scholar seems to be more suitable for searching standard literature than for gems or articles by authors advancing a view different from the mainstream. However, interesting exceptions for some search queries occurred. In some cases no correlation existed; in others bizarre patterns were recognizable, suggesting that citation counts sometimes have no impact at all on articles' rankings.},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. In a recent study we partly reverse-engineered the algorithm. This paper presents the results of our second study. While the previous study provided a broad overview, the current study focused on analyzing the correlation of an article's citation count and its ranking in Google Scholar. For this study, citation counts and rankings of 1,364,757 articles were analyzed. Some results of our first study were confirmed: Citation counts is the highest weighed factor in Google Scholar's ranking algorithm. Highly cited articles are found significantly more often in higher positions than articles that are cited less often. Therefore, Google Scholar seems to be more suitable for searching standard literature than for gems or articles by authors advancing a view different from the mainstream. However, interesting exceptions for some search queries occurred. In some cases no correlation existed; in others bizarre patterns were recognizable, suggesting that citation counts sometimes have no impact at all on articles' rankings.
Reference Management
Beel, Joeran; Smyth, Barry; Collins, Andrew
RARD II: The 94 Million Related-Article Recommendation Dataset Proceedings Article
In: Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR), pp. 39–55, CEUR-WS, 2019.
@inproceedings{Beel2019e,
title = {RARD II: The 94 Million Related-Article Recommendation Dataset},
author = {Joeran Beel and Barry Smyth and Andrew Collins},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR)},
pages = {39–55},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Feyer, Stefan; Siebert, Sophie; Gipp, Bela; Aizawa, Akiko; Beel, Joeran
Integration of the Scientific Recommender System Mr. DLib into the Reference Manager JabRef Proceedings Article
In: Proceedings of the 39th European Conference on Information Retrieval (ECIR), pp. 770–774, 2017.
@inproceedings{Feyer2017,
title = {Integration of the Scientific Recommender System Mr. DLib into the Reference Manager JabRef},
author = {Stefan Feyer and Sophie Siebert and Bela Gipp and Akiko Aizawa and Joeran Beel},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 39th European Conference on Information Retrieval (ECIR)},
pages = {770–774},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Gipp, Bela; Nuernberger, Andreas
The Architecture and Datasets of Docear's Research Paper Recommender System Journal Article
In: D-Lib Magazine, vol. 20, no. 11/12, 2014.
@article{Beel2014,
title = {The Architecture and Datasets of Docear's Research Paper Recommender System},
author = {Joeran Beel and Stefan Langer and Bela Gipp and Andreas Nuernberger},
doi = {10.1045/november14-beel},
year = {2014},
date = {2014-01-01},
journal = {D-Lib Magazine},
volume = {20},
number = {11/12},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Nuernberger, Andreas
Introducing Docear's Research Paper Recommender System Proceedings Article
In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13), pp. 459-460, ACM, 2013.
@inproceedings{Beel2013c,
title = {Introducing Docear's Research Paper Recommender System},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Andreas Nuernberger},
doi = {10.1145/2467696.2467786},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13)},
pages = {459-460},
publisher = {ACM},
series = {ACM International Conference Proceedings Series (ICPS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran
On the popularity of reference managers, and their rise and fall Journal Article
In: Docear Blog. https://www.docear.org/2013/11/11/on-the-popularity-of-reference-managers-and-their-rise-and-fall/, 2013.
@article{Beel2013,
title = {On the popularity of reference managers, and their rise and fall},
author = {Joeran Beel},
year = {2013},
date = {2013-01-01},
journal = {Docear Blog. https://www.docear.org/2013/11/11/on-the-popularity-of-reference-managers-and-their-rise-and-fall/},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran
SciPlore MindMapping now provides literature recommendations (Beta 15) Journal Article
In: http://www.sciplore.org/2011/sciplore-mindmapping-now-provides-literature-recommendations-beta-15/, 2011.
@article{Beel2011,
title = {SciPlore MindMapping now provides literature recommendations (Beta 15)},
author = {Joeran Beel},
year = {2011},
date = {2011-04-01},
journal = {http://www.sciplore.org/2011/sciplore-mindmapping-now-provides-literature-recommendations-beta-15/},
howpublished = {Blog},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Gipp, Bela; Langer, Stefan; Genzmehr, Marcel; Wilde, Erik; Nuernberger, Andreas; Pitman, Jim
Introducing Mr. DLib, a Machine-readable Digital Library Proceedings Article
In: Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL`11), pp. 463–464, ACM, 2011, (Available at http://docear.org).
@inproceedings{Beel2011b,
title = {Introducing Mr. DLib, a Machine-readable Digital Library},
author = {Joeran Beel and Bela Gipp and Stefan Langer and Marcel Genzmehr and Erik Wilde and Andreas Nuernberger and Jim Pitman},
doi = {10.1145/1998076.1998187},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL`11)},
pages = {463–464},
publisher = {ACM},
series = {JCDL '11},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Gipp, Bela; Mueller, Christoph
ŚciPlore MinṂapping' - A Tool for Creating Mind Maps Combined with PDF and Reference Management Journal Article
In: D-Lib Magazine, vol. 15, no. 11, 2009, (Available at http://docear.org).
@article{Beel2009d,
title = {ŚciPlore MinṂapping' - A Tool for Creating Mind Maps Combined with PDF and Reference Management},
author = {Joeran Beel and Bela Gipp and Christoph Mueller},
url = {https://isg.beel.org/pubs/SciPlore_MindMapping_–_in_brief–preprint.pdf},
doi = {10.1045/november2009-inbrief},
year = {2009},
date = {2009-11-01},
journal = {D-Lib Magazine},
volume = {15},
number = {11},
abstract = {Mind maps are useful tools for researchers. They can use mind maps, for example, to manage their literature or to draft research papers. Dozens of tools exist to create mind maps, for instance, FreeMind, MindManager, and XMind. However, researchers need special features such as the possibility of accessing their bibliographic databases (e.g., BibTeX) directly from within the mind mapping software. Therefore, we developed SciPlore MindMapping, the first mind mapping tool solely focusing on researchers' needs. It offers all the features one would expect from a standard mind mapping software, plus three additional features for researchers.},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Mind maps are useful tools for researchers. They can use mind maps, for example, to manage their literature or to draft research papers. Dozens of tools exist to create mind maps, for instance, FreeMind, MindManager, and XMind. However, researchers need special features such as the possibility of accessing their bibliographic databases (e.g., BibTeX) directly from within the mind mapping software. Therefore, we developed SciPlore MindMapping, the first mind mapping tool solely focusing on researchers' needs. It offers all the features one would expect from a standard mind mapping software, plus three additional features for researchers.
Event Detection
Weiler, Andreas; Beel, Joeran; Gipp, Bela; Grossniklaus, Michael
Stability Evaluation of Event Detection Techniques for Twitter Book Chapter
In: Boström, Henrik; Knobbe, Arno; Soares, Carlos; Papapetrou, Panagiotis (Ed.): Advances in Intelligent Data Analysis XV, pp. 368–380, Springer, 2016, ISBN: 978-3-319-46348-3.
@inbook{Weiler2016,
title = {Stability Evaluation of Event Detection Techniques for Twitter},
author = {Andreas Weiler and Joeran Beel and Bela Gipp and Michael Grossniklaus},
editor = {Henrik Boström and Arno Knobbe and Carlos Soares and Panagiotis Papapetrou},
doi = {10.1007/978-3-319-46349-0},
isbn = {978-3-319-46348-3},
year = {2016},
date = {2016-01-01},
booktitle = {Advances in Intelligent Data Analysis XV},
pages = {368–380},
publisher = {Springer},
series = {Lecture Notes in Computer Science (LNCS)},
keywords = {},
pubstate = {published},
tppubtype = {inbook}
}
Scholarly Communication & Reviewing
Beel, Joeran; Breuer, Timo; Crescenzi, Anita; Fuhr, Norbert; Li, Meije
Results-blind Reviewing Proceedings Article
In: Bauer, Christine; Carterette, Ben; Ferro, Nicola; Fuhr, Norbert; Faggioli, Guglielmos (Ed.): Frontiers of Information Access Experimentation for Research and Education (Dagstuhl Seminar 23031), pp. 68-154, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
@inproceedings{Beel2023,
title = {Results-blind Reviewing},
author = {Joeran Beel and Timo Breuer and Anita Crescenzi and Norbert Fuhr and Meije Li},
editor = {Christine Bauer and Ben Carterette and Nicola Ferro and Norbert Fuhr and Guglielmos Faggioli},
url = {https://isg.beel.org/pubs/2023-Results-Blind-Reviewing-Beel-et-al.pdf},
doi = {10.4230/DagRep.13.1.68},
year = {2023},
date = {2023-01-01},
booktitle = {Frontiers of Information Access Experimentation for Research and Education (Dagstuhl Seminar 23031)},
volume = {13},
number = {1},
pages = {68-154},
publisher = {Schloss Dagstuhl - Leibniz-Zentrum für Informatik},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Gipp, Bela; Meuschke, Norman; Beel, Joeran; Breitinger, Corinna
Using the Blockchain of Cryptocurrencies for Timestamping Digital Cultural Heritage Journal Article
In: Bulletin of IEEE Technical Committee on Digital Libraries (TCDL). To appear in early 2017., pp. 12–14, 2017.
@article{Gipp2017,
title = {Using the Blockchain of Cryptocurrencies for Timestamping Digital Cultural Heritage},
author = {Bela Gipp and Norman Meuschke and Joeran Beel and Corinna Breitinger},
doi = {10.1109/JCDL.2017.7991588},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)},
journal = {Bulletin of IEEE Technical Committee on Digital Libraries (TCDL). To appear in early 2017.},
pages = {12–14},
abstract = {The proportion of information that is exclusively available online
is continuously increasing. Unlike physical print media, online news
outlets, magazines, or blogs are not immune to retrospective modification.
Even significant editing of text in online news sources can easily
go unnoticed. This poses a challenge to the preservation of digital
cultural heritage. It is nearly impossible for regular readers to
verify whether the textual content they encounter online has at one
point been modified from its initial state, and at what time or to
what extent the text was modified to its current version. In this
paper, we propose a web-based platform that allows users to submit
the URL for any web content they wish to track for changes. The system
automatically creates a trusted timestamp stored in the blockchain
of the cryptocurrency Bitcoin for the hash of the HTML content available
at the user-specified URL. By using trusted timestamping to secure
a ‘snapshot’ of online information as it existed at
a specific time, any subsequent changes made to the content can be
identified.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The proportion of information that is exclusively available online
is continuously increasing. Unlike physical print media, online news
outlets, magazines, or blogs are not immune to retrospective modification.
Even significant editing of text in online news sources can easily
go unnoticed. This poses a challenge to the preservation of digital
cultural heritage. It is nearly impossible for regular readers to
verify whether the textual content they encounter online has at one
point been modified from its initial state, and at what time or to
what extent the text was modified to its current version. In this
paper, we propose a web-based platform that allows users to submit
the URL for any web content they wish to track for changes. The system
automatically creates a trusted timestamp stored in the blockchain
of the cryptocurrency Bitcoin for the hash of the HTML content available
at the user-specified URL. By using trusted timestamping to secure
a ‘snapshot’ of online information as it existed at
a specific time, any subsequent changes made to the content can be
identified.
Beel, Joeran; Gipp, Bela; Wilde, Erik
Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar and Co. Journal Article
In: Journal of Scholarly Publishing, vol. 41, no. 2, pp. 176–190, 2010, (University of Toronto Press. Available at http://docear.org).
@article{Beel2010h,
title = {Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar and Co.},
author = {Joeran Beel and Bela Gipp and Erik Wilde},
doi = {10.3138/jsp.41.2.176},
year = {2010},
date = {2010-01-01},
journal = {Journal of Scholarly Publishing},
volume = {41},
number = {2},
pages = {176–190},
note = {University of Toronto Press. Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Beel, Joeran; Gipp, Bela
The Potential of Collaborative Document Evaluation for Science Proceedings Article
In: Buchanan, George; Masoodian, Masood; Cunningham, Sally Jo (Ed.): 11th International Conference on Asia-Pacific Digital Libraries (ICADL'08) Proceedings, pp. 375–378, Springer, Heidelberg (Germany), 2008, ISBN: 978-3-540-89532-9.
@inproceedings{Beel2008,
title = {The Potential of Collaborative Document Evaluation for Science},
author = {Joeran Beel and Bela Gipp},
editor = {George Buchanan and Masood Masoodian and Sally Jo Cunningham},
url = {https://link.springer.com/chapter/10.1007/978-3-540-89533-6_48
https://isg.beel.org/pubs/The Potential of Collaborative Document Evaluation for Science - public preprint.pdf},
doi = {10.1007/978-3-540-89533-6},
isbn = {978-3-540-89532-9},
year = {2008},
date = {2008-12-01},
booktitle = {11th International Conference on Asia-Pacific Digital Libraries (ICADL'08) Proceedings},
volume = {5362},
pages = {375–378},
publisher = {Springer},
address = {Heidelberg (Germany)},
series = {Lecture Notes in Computer Science (LNCS)},
abstract = {Peer review and citation analysis are the two most common approaches for quality evaluations of scientific publications, although they are subject to criticism for various reasons. This paper outlines the problems of citation analysis and peer review and introduces Collaborative Document Evaluation as a supplement or possibly even a substitute. Collaborative Document Evaluation aims to enable the readers of publications to act as peer reviewers and share their evaluations in the form of ratings, annotations, links and classifications via the internet. In addition, Collaborative Document Evaluation might well enhance the search for publications. In this paper the implications of Collaborative Document Evaluation for the scientific community are discussed and questions are asked as to how to create incentives for scientists to participate.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Peer review and citation analysis are the two most common approaches for quality evaluations of scientific publications, although they are subject to criticism for various reasons. This paper outlines the problems of citation analysis and peer review and introduces Collaborative Document Evaluation as a supplement or possibly even a substitute. Collaborative Document Evaluation aims to enable the readers of publications to act as peer reviewers and share their evaluations in the form of ratings, annotations, links and classifications via the internet. In addition, Collaborative Document Evaluation might well enhance the search for publications. In this paper the implications of Collaborative Document Evaluation for the scientific community are discussed and questions are asked as to how to create incentives for scientists to participate.
Beel, Joeran; Gipp, Bela
Collaborative Document Evaluation: An Alternative Approach to Classic Peer Review Conference
Proceedings of the 5th International Conference on Digital Libraries (ICDL'08), vol. 31, Vienna (Austria), 2008, (Available at http://docear.org).
@conference{Beel2008a,
title = {Collaborative Document Evaluation: An Alternative Approach to Classic Peer Review},
author = {Joeran Beel and Bela Gipp},
url = {https://isg.beel.org/pubs/Collaborative Document Evaluation - An Alternative Approach to Classic Peer Review – Public Version.pdf
https://www.researchgate.net/publication/224059183_Collaborative_Document_Evaluation_An_Alternative_Approach_to_Classic_Peer_Review},
year = {2008},
date = {2008-08-01},
booktitle = {Proceedings of the 5th International Conference on Digital Libraries (ICDL'08)},
volume = {31},
pages = {410–413},
address = {Vienna (Austria)},
abstract = {Research papers are usually evaluated via peer review. However, peer review has limitations in evaluating research papers. In this paper, Scienstein and the new idea of 'collaborative document evaluation' are presented. Scienstein is a project to evaluate scientific papers collaboratively based on ratings, links, annotations and classifications by the scientific community using the internet. In this paper, critical success factors of collaborative document evaluation are analyzed. That is the scientists- motivation to participate as reviewers, the reviewers- competence and the reviewers- trustworthiness. It is shown that if these factors are ensured, collaborative document evaluation may prove to be a more objective, faster and less resource intensive approach to scientific document evaluation in comparison to the classical peer review process. It is shown that additional advantages exist as collaborative document evaluation supports interdisciplinary work, allows continuous post-publishing quality assessments and enables the implementation of academic recommendation engines. In the long term, it seems possible that collaborative document evaluation will successively substitute peer review and decrease the need for journals.},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Research papers are usually evaluated via peer review. However, peer review has limitations in evaluating research papers. In this paper, Scienstein and the new idea of 'collaborative document evaluation' are presented. Scienstein is a project to evaluate scientific papers collaboratively based on ratings, links, annotations and classifications by the scientific community using the internet. In this paper, critical success factors of collaborative document evaluation are analyzed. That is the scientists- motivation to participate as reviewers, the reviewers- competence and the reviewers- trustworthiness. It is shown that if these factors are ensured, collaborative document evaluation may prove to be a more objective, faster and less resource intensive approach to scientific document evaluation in comparison to the classical peer review process. It is shown that additional advantages exist as collaborative document evaluation supports interdisciplinary work, allows continuous post-publishing quality assessments and enables the implementation of academic recommendation engines. In the long term, it seems possible that collaborative document evaluation will successively substitute peer review and decrease the need for journals.
Document Engineering, Information Extraction & Citation Parsing
Scharpf, Philipp; Mackerracher, Ian; Schubotz, Moritz; Beel, Joeran; Breitinger, Corinna; Gipp, Bela
AnnoMathTeX - a Formula Annotation Recommender System for STEM Documents Proceedings Article
In: 13th ACM Conference on Recommender Systems (RecSys), 2019.
@inproceedings{Scharpf2019,
title = {AnnoMathTeX - a Formula Annotation Recommender System for STEM Documents},
author = {Philipp Scharpf and Ian Mackerracher and Moritz Schubotz and Joeran Beel and Corinna Breitinger and Bela Gipp},
year = {2019},
date = {2019-01-01},
booktitle = {13th ACM Conference on Recommender Systems (RecSys)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Grennan, Mark; Schibel, Martin; Collins, Andrew; Beel, Joeran
GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing Proceedings Article
In: 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, pp. 101–112, 2019.
@inproceedings{Grennan2019,
title = {GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing},
author = {Mark Grennan and Martin Schibel and Andrew Collins and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science},
pages = {101–112},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Tkaczyk, Dominika; Collins, Andrew; Beel, Joeran
NaïveRole: Author-Contribution Extraction from Biomedical Publications Proceedings Article
In: 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, 2019.
@inproceedings{Tkaczyk2019,
title = {NaïveRole: Author-Contribution Extraction from Biomedical Publications},
author = {Dominika Tkaczyk and Andrew Collins and Joeran Beel},
year = {2019},
date = {2019-01-01},
booktitle = {27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Tkaczyk, Dominika; Collins, Andrew; Sheridan, Paraic; Beel, Joeran
Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers Proceedings Article
In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 99–108, ACM, Fort Worth, Texas, USA, 2018, ISBN: 978-1-4503-5178-2.
@inproceedings{Tkaczyk2018a,
title = {Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers},
author = {Dominika Tkaczyk and Andrew Collins and Paraic Sheridan and Joeran Beel},
url = {http://doi.acm.org/10.1145/3197026.3197048},
doi = {10.1145/3197026.3197048},
isbn = {978-1-4503-5178-2},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries},
volume = {2259},
number = {1},
pages = {99–108},
publisher = {ACM},
address = {Fort Worth, Texas, USA},
series = {JCDL '18},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Tkaczyk, Dominika; Sheridan, Paraic; Beel, Joeran
ParsRec: A Meta-Learning Recommender System for Bibliographic Reference Parsing Tools Proceedings Article
In: Proceedings of the 12th ACM Conference on Recommender Systems (RecSys), pp. 387–388, ACM, Fort Worth, Texas, USA, 2018.
@inproceedings{Tkaczyk2018,
title = {ParsRec: A Meta-Learning Recommender System for Bibliographic Reference Parsing Tools},
author = {Dominika Tkaczyk and Paraic Sheridan and Joeran Beel},
url = {http://doi.acm.org/10.1145/3197026.3203907},
doi = {10.1145/3197026.3203907},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 12th ACM Conference on Recommender Systems (RecSys)},
pages = {387–388},
publisher = {ACM},
address = {Fort Worth, Texas, USA},
series = {JCDL '18},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Tkaczyk, Dominika; Gupta, Rohit; Cinti, Riccardo; Beel, Joeran
ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers Proceedings Article
In: Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS), pp. 162–173, CEUR-WS, 2018.
@inproceedings{Tkaczyk2018b,
title = {ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers},
author = {Dominika Tkaczyk and Rohit Gupta and Riccardo Cinti and Joeran Beel},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 26th Irish Conference on Artificial Intelligence and Cognitive Science (AICS)},
volume = {2259},
number = {1},
pages = {162–173},
publisher = {CEUR-WS},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Breitinger, Corinna; Langer, Stefan
Evaluating the CC-IDF citation-weighting scheme: How effectively can 'Inverse Document Frequency' (IDF) be applied to references? Proceedings Article
In: Proceedings of the 12th iConference, 2017.
@inproceedings{Beel2017,
title = {Evaluating the CC-IDF citation-weighting scheme: How effectively can 'Inverse Document Frequency' (IDF) be applied to references?},
author = {Joeran Beel and Corinna Breitinger and Stefan Langer},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 12th iConference},
abstract = {In the domain of academic search engines and research-paper recommender
systems, CC-IDF is a common citation-weighting scheme that is used
to calculate semantic relatedness between documents. CC-IDF adopts
the principles of the popular term-weighting scheme TF-IDF and assumes
that if a rare academic citation is shared by two documents then
this occurrence should receive a higher weight than if the citation
is shared among a large number of documents. Although CC-IDF is in
common use, we found no empirical evaluation and comparison of CC-IDF
with plain citation weight (CC-Only). Therefore, we conducted such
an evaluation and present the results in this paper. The evaluation
was conducted with real users of the recommender system Docear. The
effectiveness of CC-IDF and CC-Only was measured using click-through
rate (CTR). For 238,681 delivered recommendations, CC-IDF had about
the same effectiveness as CC-Only (CTR of 6.15% vs. 6.23%). In other
words, CC-IDF was not more effective than CC-Only, which is a surprising
result. We provide a number of potential reasons and suggest to conduct
further research to understand the principles of CC-IDF in more detail.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In the domain of academic search engines and research-paper recommender
systems, CC-IDF is a common citation-weighting scheme that is used
to calculate semantic relatedness between documents. CC-IDF adopts
the principles of the popular term-weighting scheme TF-IDF and assumes
that if a rare academic citation is shared by two documents then
this occurrence should receive a higher weight than if the citation
is shared among a large number of documents. Although CC-IDF is in
common use, we found no empirical evaluation and comparison of CC-IDF
with plain citation weight (CC-Only). Therefore, we conducted such
an evaluation and present the results in this paper. The evaluation
was conducted with real users of the recommender system Docear. The
effectiveness of CC-IDF and CC-Only was measured using click-through
rate (CTR). For 238,681 delivered recommendations, CC-IDF had about
the same effectiveness as CC-Only (CTR of 6.15% vs. 6.23%). In other
words, CC-IDF was not more effective than CC-Only, which is a surprising
result. We provide a number of potential reasons and suggest to conduct
further research to understand the principles of CC-IDF in more detail.
Beel, Joeran; Langer, Stefan; Kapitsaki, Georgia M.; Breitinger, Corinna; Gipp, Bela
Exploring the Potential of User Modeling based on Mind Maps Proceedings Article
In: Ricci, Francesco; Bontcheva, Kalina; Conlan, Owen; Lawless, Séamus (Ed.): Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP), pp. 3-17, Springer, 2015.
@inproceedings{Beel2015b,
title = {Exploring the Potential of User Modeling based on Mind Maps},
author = {Joeran Beel and Stefan Langer and Georgia M. Kapitsaki and Corinna Breitinger and Bela Gipp},
editor = {Francesco Ricci and Kalina Bontcheva and Owen Conlan and Séamus Lawless},
doi = {10.1007/978-3-319-20267-9_1},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP)},
volume = {9146},
pages = {3-17},
publisher = {Springer},
series = {Lecture Notes of Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Gipp, Bela
Utilizing Mind-Maps for Information Retrieval and User Modelling Proceedings Article
In: Dimitrova, Vania; Kuflik, Tsvi; Chin, David; Ricci, Francesco; Dolog, Peter; Houben, Geert-Jan (Ed.): Proceedings of the 22nd Conference on User Modelling, Adaption, and Personalization (UMAP), pp. 301-313, Springer, 2014.
@inproceedings{Beel2014a,
title = {Utilizing Mind-Maps for Information Retrieval and User Modelling},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Bela Gipp},
editor = {Vania Dimitrova and Tsvi Kuflik and David Chin and Francesco Ricci and Peter Dolog and Geert-Jan Houben},
doi = {10.1007/978-3-319-08786-3_26},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 22nd Conference on User Modelling, Adaption, and Personalization (UMAP)},
volume = {8538},
pages = {301-313},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan; Genzmehr, Marcel; Müller, Christoph
Docears PDF Inspector: Title Extraction from PDF files Proceedings Article
In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13), pp. 443-444, ACM, 2013.
@inproceedings{Beel2013b,
title = {Docears PDF Inspector: Title Extraction from PDF files},
author = {Joeran Beel and Stefan Langer and Marcel Genzmehr and Christoph Müller},
doi = {10.1145/2467696.2467789},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13)},
pages = {443-444},
publisher = {ACM},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Lipinski, Mario; Yao, Kevin; Breitinger, Corinna; Beel, Joeran; Gipp, Bela
Evaluation of Header Metadata Extraction Approaches and Tools for Scientific PDF Documents Proceedings Article
In: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries (JCDL'13), pp. 385-386, 2013.
@inproceedings{Lipinski2013,
title = {Evaluation of Header Metadata Extraction Approaches and Tools for Scientific PDF Documents},
author = {Mario Lipinski and Kevin Yao and Corinna Breitinger and Joeran Beel and Bela Gipp},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries (JCDL'13)},
pages = {385-386},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Langer, Stefan
An Exploratory Analysis of Mind Maps Proceedings Article
In: Proceedings of the 11th ACM Symposium on Document Engineering (DocEng'11), pp. 81-84, ACM, 2011, (Available at http://docear.org).
@inproceedings{Beel2011d,
title = {An Exploratory Analysis of Mind Maps},
author = {Joeran Beel and Stefan Langer},
doi = {10.1145/2034691.2034709},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 11th ACM Symposium on Document Engineering (DocEng'11)},
pages = {81-84},
publisher = {ACM},
note = {Available at http://docear.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Beel, Joeran; Gipp, Bela; Shaker, Ammar; Friedrich, Nick
SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size) Proceedings Article
In: Lalmas, M.; Jose, J.; Rauber, A.; Sebastiani, F.; Frommholz, I. (Ed.): Research and Advanced Technology for Digital Libraries, Proceedings of the 14th European Conference on Digital Libraries (ECDL'10), pp. 413–416, Springer, Glasgow (UK), 2010.
@inproceedings{Beel2010g,
title = {SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)},
author = {Joeran Beel and Bela Gipp and Ammar Shaker and Nick Friedrich},
editor = {M. Lalmas and J. Jose and A. Rauber and F. Sebastiani and I. Frommholz},
year = {2010},
date = {2010-09-01},
booktitle = {Research and Advanced Technology for Digital Libraries, Proceedings of the 14th European Conference on Digital Libraries (ECDL'10)},
volume = {6273},
pages = {413–416},
publisher = {Springer},
address = {Glasgow (UK)},
series = {Lecture Notes of Computer Science (LNCS)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Plagiarism Detection
Gipp, Bela; Meuschke, Norman; Beel, Joeran
Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag Proceedings Article
In: Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'11), pp. 255–258, ACM, Ottawa, Canada, 2011, (Available at: url http://sciplore.org/pub).
@inproceedings{Gipp2011a,
title = {Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag},
author = {Bela Gipp and Norman Meuschke and Joeran Beel},
doi = {10.1145/1998076.1998124},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'11)},
pages = {255–258},
publisher = {ACM},
address = {Ottawa, Canada},
abstract = {Various approaches for plagiarism detection exist. All are based on
more or less sophisticated text analysis methods such as string matching,
fingerprinting or style comparison. In this paper a new approach
called Citation-based Plagiarism Detection is evaluated using a doctoral
thesis, in which a volunteer crowd-sourcing project called GuttenPlag
identified substantial amounts of plagiarism through careful manual
inspection. This new approach is able to identify similar and plagiarized
documents based on the citations used in the text. It is shown that
citation-based plagiarism detection performs significantly better
than text-based procedures in identifying strong paraphrasing, translation
and some idea plagiarism. Detection rates can be improved by combining
citation-based with text-based plagiarism detection.},
note = {Available at: url http://sciplore.org/pub},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Various approaches for plagiarism detection exist. All are based on
more or less sophisticated text analysis methods such as string matching,
fingerprinting or style comparison. In this paper a new approach
called Citation-based Plagiarism Detection is evaluated using a doctoral
thesis, in which a volunteer crowd-sourcing project called GuttenPlag
identified substantial amounts of plagiarism through careful manual
inspection. This new approach is able to identify similar and plagiarized
documents based on the citations used in the text. It is shown that
citation-based plagiarism detection performs significantly better
than text-based procedures in identifying strong paraphrasing, translation
and some idea plagiarism. Detection rates can be improved by combining
citation-based with text-based plagiarism detection.
Gipp, Bela; Beel, Joeran
Citation Based Plagiarism Detection - a New Approach to Identify Plagiarized Work Language Independently Proceedings Article
In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, pp. 273–274, ACM, Toronto, Ontario, Canada, 2010, (Available at: url http://sciplore.org/pub).
@inproceedings{Gipp2010a,
title = {Citation Based Plagiarism Detection - a New Approach to Identify Plagiarized Work Language Independently},
author = {Bela Gipp and Joeran Beel},
doi = {10.1145/1810617.1810671},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 21st ACM Conference on Hypertext and Hypermedia},
pages = {273–274},
publisher = {ACM},
address = {Toronto, Ontario, Canada},
note = {Available at: url http://sciplore.org/pub},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Blockchain
Gipp, Bela; Meuschke, Norman; Beel, Joeran; Breitinger, Corinna
Using the Blockchain of Cryptocurrencies for Timestamping Digital Cultural Heritage Journal Article
In: Bulletin of IEEE Technical Committee on Digital Libraries (TCDL). To appear in early 2017., pp. 12–14, 2017.
@article{Gipp2017,
title = {Using the Blockchain of Cryptocurrencies for Timestamping Digital Cultural Heritage},
author = {Bela Gipp and Norman Meuschke and Joeran Beel and Corinna Breitinger},
doi = {10.1109/JCDL.2017.7991588},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)},
journal = {Bulletin of IEEE Technical Committee on Digital Libraries (TCDL). To appear in early 2017.},
pages = {12–14},
abstract = {The proportion of information that is exclusively available online
is continuously increasing. Unlike physical print media, online news
outlets, magazines, or blogs are not immune to retrospective modification.
Even significant editing of text in online news sources can easily
go unnoticed. This poses a challenge to the preservation of digital
cultural heritage. It is nearly impossible for regular readers to
verify whether the textual content they encounter online has at one
point been modified from its initial state, and at what time or to
what extent the text was modified to its current version. In this
paper, we propose a web-based platform that allows users to submit
the URL for any web content they wish to track for changes. The system
automatically creates a trusted timestamp stored in the blockchain
of the cryptocurrency Bitcoin for the hash of the HTML content available
at the user-specified URL. By using trusted timestamping to secure
a ‘snapshot’ of online information as it existed at
a specific time, any subsequent changes made to the content can be
identified.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The proportion of information that is exclusively available online
is continuously increasing. Unlike physical print media, online news
outlets, magazines, or blogs are not immune to retrospective modification.
Even significant editing of text in online news sources can easily
go unnoticed. This poses a challenge to the preservation of digital
cultural heritage. It is nearly impossible for regular readers to
verify whether the textual content they encounter online has at one
point been modified from its initial state, and at what time or to
what extent the text was modified to its current version. In this
paper, we propose a web-based platform that allows users to submit
the URL for any web content they wish to track for changes. The system
automatically creates a trusted timestamp stored in the blockchain
of the cryptocurrency Bitcoin for the hash of the HTML content available
at the user-specified URL. By using trusted timestamping to secure
a ‘snapshot’ of online information as it existed at
a specific time, any subsequent changes made to the content can be
identified.
Electronic Passport
Gipp, Bela; Beel, Joeran; Roessling, Ivo
ePassport: The World's New Electronic Passport Book
Createspace, Scotts Valley (USA), 2007, (ISBN 978-1434823182. Also available on http://www.epassport-book.com).
@book{Gipp2007,
title = {ePassport: The World's New Electronic Passport},
author = {Bela Gipp and Joeran Beel and Ivo Roessling},
url = {https://epassport-book.com/download.php},
year = {2007},
date = {2007-10-01},
publisher = {Createspace},
address = {Scotts Valley (USA)},
note = {ISBN 978-1434823182. Also available on http://www.epassport-book.com},
keywords = {},
pubstate = {published},
tppubtype = {book}
}
Beel, Joeran; Gipp, Bela
ePass - der neue biometrische Reisepass Book
Shaker Verlag, Aachen (Germany), 2005, (ISBN 9783-8322-4693-8. Also available on http://www.epass-buch.de).
@book{Beel2005,
title = {ePass - der neue biometrische Reisepass},
author = {Joeran Beel and Bela Gipp},
url = {https://epass-buch.de/
https://epass-buch.de/epass-html-kostenlos/index.html},
year = {2005},
date = {2005-10-01},
publisher = {Shaker Verlag},
address = {Aachen (Germany)},
note = {ISBN 9783-8322-4693-8. Also available on http://www.epass-buch.de},
keywords = {},
pubstate = {published},
tppubtype = {book}
}
Location Based Services
Alcala, Felix; Beel, Joeran; Frenkel, Arne; Gipp, Bela; Luelf, Johannes; Hoepfner, Hagen
UbiLoc: A System for Locating Mobile Devices using Mobile Devices Proceedings Article
In: Kyamakya, K. (Ed.): Proceedings of 1st Workshop on Positioning, Navigation and Communication 2004 (WPNC 04), pp. 43–48, University of Hanover, 2004, (Also available on http://beel.org).
@inproceedings{Alcala2004,
title = {UbiLoc: A System for Locating Mobile Devices using Mobile Devices},
author = {Felix Alcala and Joeran Beel and Arne Frenkel and Bela Gipp and Johannes Luelf and Hagen Hoepfner},
editor = {K. Kyamakya},
url = {https://amok.am/_joeran/existing paper, exchanged and invisible keywords, many additional reference/UbiLoc A System for Locating Mobile Devices using Mobile Devices - IMPROVED.pdf},
year = {2004},
date = {2004-01-01},
booktitle = {Proceedings of 1st Workshop on Positioning, Navigation and Communication 2004 (WPNC 04)},
pages = {43–48},
publisher = {University of Hanover},
note = {Also available on http://beel.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Alcala, Felix; Beel, Joeran; Frenkel, Arne; Gipp, Bela; Luelf, Johannes; Hoepfner, Hagen
Ortung von mobilen Geraeten fuer die Realisierung lokationsbasierter Dienste Proceedings Article
In: Tuerker, Can (Ed.): Mobilitaet und Informationssysteme - Workshop des GI-Arbeitskreises "Mobile Datenbanken und Informationssysteme", ETH Zuerich, Zuerich, 2003.
@inproceedings{Alcala2003,
title = {Ortung von mobilen Geraeten fuer die Realisierung lokationsbasierter Dienste},
author = {Felix Alcala and Joeran Beel and Arne Frenkel and Bela Gipp and Johannes Luelf and Hagen Hoepfner},
editor = {Can Tuerker},
url = {https://www.amok.am/_joeran/_real papers/Ortung von mobilen Geraeten fuer die Realisierung/Einzelbeitrag_mDBIS03_2.pdf},
year = {2003},
date = {2003-10-01},
booktitle = {Mobilitaet und Informationssysteme - Workshop des GI-Arbeitskreises "Mobile Datenbanken und Informationssysteme"},
publisher = {ETH Zuerich},
address = {Zuerich},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Gipp, Bela; Beel, Joeran
Der GSM-Schutzengel – Lebensretter für Autofahrer Journal Article
In: Jugend forscht, 2002.
@article{Gipp2002,
title = {Der GSM-Schutzengel – Lebensretter für Autofahrer},
author = {Bela Gipp and Joeran Beel},
url = {https://www.jugend-forscht.de/projektdatenbank/der-gsm-schutzengel-lebensretter-fuer-autofahrer.html},
year = {2002},
date = {2002-01-01},
journal = {Jugend forscht},
abstract = {Viele Menschenleben könnten gerettet werden, wenn Rettungskräfte schneller informiert werden würden. Im Zeitalter des Handys bauten Béla Gipp, Jöran Beel und Lars Petersen einen stoßempfindlichen Sensor in den Akku des Mobiltelefons ein. Er erkennt den Aufprall des Unfalls und löst automatisch einen Notruf in der Rettungszentrale aus. Hier kann - unterstützt durch ein selbstentwickeltes Software-Programm und den Mobilfunkstandard GSM - eine Positionsbestimmung des Unfallortes vorgenommen werden. Jetzt können die Rettungskräfte nicht nur starten, sondern auch vorher noch im Handy gespeicherte medizinische Daten abrufen oder Helfer in der Nähe ausfindig machen. Der Schutzengel im Akku macht's möglich.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Viele Menschenleben könnten gerettet werden, wenn Rettungskräfte schneller informiert werden würden. Im Zeitalter des Handys bauten Béla Gipp, Jöran Beel und Lars Petersen einen stoßempfindlichen Sensor in den Akku des Mobiltelefons ein. Er erkennt den Aufprall des Unfalls und löst automatisch einen Notruf in der Rettungszentrale aus. Hier kann - unterstützt durch ein selbstentwickeltes Software-Programm und den Mobilfunkstandard GSM - eine Positionsbestimmung des Unfallortes vorgenommen werden. Jetzt können die Rettungskräfte nicht nur starten, sondern auch vorher noch im Handy gespeicherte medizinische Daten abrufen oder Helfer in der Nähe ausfindig machen. Der Schutzengel im Akku macht's möglich.
Beel, Joeran; Petersen, Lars; Gipp, Bela
Der GSM-Schutzengel - Automatisches Notrufsystem zur Lokalisation von Unfallopfern mittels GSM-Technik Journal Article
In: Jugend forscht, 2001.
@article{Beel2001,
title = {Der GSM-Schutzengel - Automatisches Notrufsystem zur Lokalisation von Unfallopfern mittels GSM-Technik},
author = {Joeran Beel and Lars Petersen and Bela Gipp},
url = {https://www.jugend-forscht.de/projektdatenbank/der-gsm-schutzengel-automatisches-notrufsystem-zur-lokalisation-von-unfallopfern-mittels-gsm-technik.html},
year = {2001},
date = {2001-01-01},
journal = {Jugend forscht},
abstract = {Nach einem Verkehrsunfall entscheiden oft Minuten über Leben und Tod. Dass künftig die Rettungsdienste noch schneller zur Stelle sind, ermöglicht ein neues Notrufsystem, entwickelt von Jöran Beel, Lars Petersen sowie Béla Gipp und mittlerweile zum Patent angemeldet. Über ein normales Handy, das nach dem GSM-Standard arbeitet, teilt es die Position des Unfallfahrzeugs automatisch auf etwa 100 Meter genau mit. Dazu muss nur ein Beschleunigungssensor eingebaut und mit der Elektronik gekoppelt werden. Auch medizinische Daten wie die Blutgruppe lassen sich über die eingebaute SIM-Karte und die Zahl der Autoinsassen über die Gurtschlösser übermitteln. Sogar an einen Fehlalarm haben die Entwickler gedacht - der Notruf geht erst los, wenn das Handy nach einem Unfall nicht abgestellt wird. Damit das System auch funktioniert, müssen die Mobilfunkbetreiber allerdings noch Zentren einrichten, die die Position des Autos weitergeben.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Nach einem Verkehrsunfall entscheiden oft Minuten über Leben und Tod. Dass künftig die Rettungsdienste noch schneller zur Stelle sind, ermöglicht ein neues Notrufsystem, entwickelt von Jöran Beel, Lars Petersen sowie Béla Gipp und mittlerweile zum Patent angemeldet. Über ein normales Handy, das nach dem GSM-Standard arbeitet, teilt es die Position des Unfallfahrzeugs automatisch auf etwa 100 Meter genau mit. Dazu muss nur ein Beschleunigungssensor eingebaut und mit der Elektronik gekoppelt werden. Auch medizinische Daten wie die Blutgruppe lassen sich über die eingebaute SIM-Karte und die Zahl der Autoinsassen über die Gurtschlösser übermitteln. Sogar an einen Fehlalarm haben die Entwickler gedacht - der Notruf geht erst los, wenn das Handy nach einem Unfall nicht abgestellt wird. Damit das System auch funktioniert, müssen die Mobilfunkbetreiber allerdings noch Zentren einrichten, die die Position des Autos weitergeben.
0 Comments