Update 2016-01-12: The salary in Tokyo would be around 1.600 US$ per month, not 1.400.
2015 has been a rather quiet year for Docear, but 2016 will be different. We have lots of ideas for new projects, and even better – we have funding to pay at least 1 Master or PhD student, to help us implementing the ideas. There is also a good chance that we get more funding, maybe also for Bachelor students and postdoctoral researchers. The positions will be located in Tokyo, Copenhagen or Konstanz (Germany).
In the following, there is a list of potential projects. If you are interested, please apply, and if you have own ideas, do not hesitate to discuss them with us. What exactly you will be doing within each of the projects, depends on your preferences, skills, how much time you have, and how many other students will participate in the project. Either way, each project will be highly suitable for both, applying and enhancing your software development skills, and conducting research, e.g. as part of a Master or PhD thesis.
With respect to software development, you will be working with JAVA and state-of-the art recommendation and search frameworks (Apache Mahout, Lucene, Solr, and/or LensKit), data formats (XML, JSON and/or BibTeX), SQL and noSQL databases (PostgreSQL, MySQL, and/or Neo4j), PDF processing tools (jPod), distributed file storage and data processing (Hadoop or Spark), and you will get a deep dive into RESTful Webservices (Java Jersey), and recommendation technologies (e.g. content-based filtering and collaborative filtering). And the best: The results of your work will help tens of thousands of researchers around the globe to do better research!
With respect to research, there are options for all levels of researchers (Bachelor, Master, PhD, Postdoc), in various disciplines: Recommender systems, recommender system evaluation, digital libraries, web crawling, citation and network analysis, web services, databases, scalability, bibliometrics, security and privacy, interoperability, software quality, statistics, … So, if you want to do a Bachelor, Master, or PhD thesis, or just gain some research experience, there will be plenty of opportunities.
Table of Contents
Mr. DLib, the Machine-Readable Digital Library
Mr. DLib is a digital library with around 2 million academic articles, crawled from the Web, and all articles are accessible through a RESTful API. Mr. DLib does not aim to be used by “end-users”. Instead, Mr. DLib offers services for operators of other academic services and software tools such as reference managers and digital libraries. Through Mr. DLib’s API, these academic services and tools can search for academic articles, request specific articles, or send PDF files to the API and receive metadata for the PDF files. Mr. DLib will be the foundation for many of our future projects such as the recommendations as a service (see below) and PDF metadata retrieval for Docear and other reference managers.
Software Development Goals
In the next year, we want to significantly extend the functionality of Mr. DLib, and grow the document corpus. This includes the enhancement of the Web Crawler and Google Scholar Parser, PDF processing (jPod), citation and PDF metadata parser (parsCit), data storage (MySQL and Neo4j), search functionality (Apache Lucene & Solr), data delivery (Java Jersey), rights management, and performance (Hadoop or Spark?). We also planto re-design wide parts of the architecture (currently based on Apache Lucene; Java Jersey; Hibernate; MySQL) and data model (XML; JSON; MySQL). Hence, plenty of work for a capable software developer :-).
Mr. DLib offers various research opportunities. You could research how to extract titles and citations most effectively from PDF files; how to build a scalable Web Service and/or Web Crawler; how to measure code quality; how to request large amounts of data from Google Scholar without being detected as robot; you could compare the performance of Hadoop and Spark; or do other research in the field of digital libraries and open access …
- Databases, preferably MySQL or PostgreSQL and/or Neo4j
- Server administration (Linux & Tomcat)
- Any of the fields, technologies and data formats listed in the project description (Web Crawling, Apache Lucene, Jersey, Hibernate, Hadoop, XML, Restful Web Services, …)
Research-Paper Recommendations as a Service
There are countless academic services and software tools such as reference managers, academic search engines, (digital) libraries and (electronic) journals. However, only few of them offer research-paper recommender systems to their users, although recommender systems could provide lots of additional value to their users. Imagine, your reference manager would provide you regularly with a list of newly published papers that are relevant for your work; or your professor recommends a book to you, the book is not available any more in your university’s library, but the library’s website recommends alternative books to you; …
One reason why so many academic services and tools do not have research-paper recommender systems is that developing such systems requires a lot of knowledge and effort, and the libraries do not have the knowledge or resources.
Software Development Goals
Our goal is to develop a research-paper recommender system “as a service” that can be used by any academic service or software tool, without much knowledge about recommender systems and without a lot of resources being required. This recommender system will be build on top of Mr. DLib and it will allow third party tools to easily get “recommendations as a service”. For instance, a reference manager could send a users’ personal library to Mr. DLib and Mr. DLib returns a list of research-paper recommendations. Similarly, a digital library would send the metadata (title, ISBN, ….) of a certain article or book to Mr. DLib, and would receive a list of related articles and books. The entire communication between Mr. DLib and the client applications will be based on a RESTful Web Service and standard data formats such as XML and JSON. The first pilot partners to use Mr. DLib’s recommender system are Docear and JabRef (if you are the operator of an academic service, and want to be a pilot partner, please contact us). The task of developing a “Recommender System as Service” is truly challenging and multifaceted since the users’ data from various sources (Docear, JabRef, …) needs to be transferred to Mr. DLib, stored and processed, and recommendations need to be calculated and returned. This process requires many aspects to be considered such as scalability, security and privacy, interoperability, extendability, and, of course, calculating world-class recommendations that researchers love.
The project offers research opportunities in the field of interoperability, data-format standards in the domain of digital libraries, data processing, and scalability. More importantly, this project lays the foundation for all the following (research) projects (see below).
- Databases such as MySQL
- Linux (Basic)
- Knowledge of recommendation concepts
- Any of the fields, technologies and data formats listed in the project description (Jersey, Mahout, LensKit, …)
Research-Paper Recommendations: A Novel Approach
Research & Software Development Goals
There are around 90 different approaches to give research-paper recommendations. Your goal will be to develop a novel research-paper recommendation approach that is more effective than those ones currently being available. To achieve the goal, you will be integrating some of the existing approaches into Mr. DLib’s “Recommender System as a Service”, and either develop a completely new approach, or enhance the existing ones. We have many ideas how existing approaches could be enhanced. For instance, the ranking process could be improved by using Scientometrics (e.g. citation counts of papers, h-index, …), and there are many options more that we are happy to share with you in a personal discussion.
The project is highly attractive in two ways. First, you will be heavily working with standard recommendation frameworks that are used in all domains (news, movies, …). Hence, you will gain valuable skills for the job market. Second, you will get a deep insight to research-paper recommender systems, which is an attractive field to do further research e.g. as part of a PhD.
- Knowledge about recommendation concepts and recommender-systems evaluation
- Data Analysis Tools (Excel, SPSS, R, …)
- Recommendation Frameworks (Mahout & LensKit)
- Web Services
Recommender-Systems Evaluation & Reproducibility
The reproducibility of experimental results is the “fundamental assumption” in science, and the “cornerstone” for drawing meaningful conclusions about the generalizability of ideas. Recently, we found that reproducibility is rarely given in the recommendersystems community, particularly in the researchpaper recommendersystem community. In a review of 89 research-paper recommender-systems evaluations, we identified several cases in which only slight variations in the initial setup of the evaluation or approaches led to surprisingly different results.
Software Development & Research Goals
We want to find out, how recommender-systems should be ideally implemented and evaluated to ensure reproducible results. To achieve the goal,
- you will implement a number of research-paper recommendation approaches (or simply use some existing ones),
- then these approaches will be used to give recommendations to the users of the applications that use Mr. DLib’s recommender system (Docear, JabRef, and maybe others)
- you will analyze how the recommendation approaches perform in the different scenarios and you try to identify the factors that affect the recommendation effectiveness. This includes making controlled changes to the algorithms and applications that display the recommendations.
If you are interested in this project, please have a look at our paper that will soon be published in the journal “User Modeling and UserAdapted Interaction (UMAI)”. The paper gives you a detailed overview on the topic of reproduciblity. Your task would be to continue what we started for the paper.
- Knowledge in statistics
- Data Analysis Tools (Excel, SPSS, or R, …)
- Knowledge about recommendation concepts and recommender-systems evaluation
- Basic programming and database knowledge
- Web Services
In Tokyo, we cooperate with Prof. Akiko Aizawa at the National Institute of Informatics, which is one of Japan’s most respected research institutes in the field of information science. Our co-founder Prof. Bela Gipp, had spent his postdoctoral time at the NII, and I will also be at the NII from April 2016 onward.
Tokyo is vast: it’s best thought of not as a single city, but a constellation of cities that have grown together. Tokyo’s districts vary wildly by character, from the electronic blare of Akihabara to the Imperial gardens and shrines of Chiyoda, from the hyperactive youth culture Mecca of Shibuya to the pottery shops and temple markets of Asakusa. If you don’t like what you see, hop on the train and head to the next station, and you will find something entirely different.
The sheer size and frenetic pace of Tokyo can intimidate the first-time visitor. Much of the city is a jungle of concrete and wires, with a mass of neon and blaring loudspeakers. At rush hour, crowds jostle in packed trains and masses of humanity sweep through enormous and bewilderingly complex stations. Don’t get too hung up on ticking tourist sights off your list: for most visitors, the biggest part of the Tokyo experience is just wandering around at random and absorbing the vibe, poking your head into shops selling weird and wonderful things, sampling restaurants where you can’t recognize a single thing on the menu (or on your plate), and finding unexpected oases of calm in the tranquil grounds of a neighborhood Shinto shrine. It’s all perfectly safe, and the locals will go to sometimes extraordinary lengths to help you if you just ask.
In Copenhagen, we cooperate with Prof. Alesia Zuccala at the Royal School of Library and Information Science (RSLIS). The RSLIS has a long tradition of research in the field of (digital) libraries, bibliometrics, and information science, and hence represents an ideal partner for developing a machine-readable digital library, and research-paper recommender systems.
Copenhagen is the capital of Denmark and what a million Danes call home. This “friendly old girl of a town” is big enough to be a metropolis with shopping, culture and nightlife par excellence, yet still small enough to be intimate, safe and easy to navigate. Overlooking the Øresund strait with Sweden just minutes away, it is a cultural and geographic link between mainland Europe and Scandinavia. This is where old fairy tales blend with flashy new architecture and world-class design; where warm jazz mixes with cold electronica from Copenhagen’s basements. You’ll feel you’ve seen it all in a day, but could keep on discovering more for months.
The university of Konstanz is home of our team members Prof. Bela Gipp, Corinna Breitinger, and Norman Meuschke, and it is one of only eleven “Excellence” university in Germany. The Information Science Group, chaired by Bela Gipp, is doing research in the field of recommender systems, plagiarism detection, and document analysis, and provides an excellent environment for researchers.
Konstanz has traces of civilization dating from the stone age and was settled by the Romans in about 50 CE. Konstanz was an important trade centre and a spiritual centre. At the council of Konstanz in 1414-1418, a papal election was held, ending the papal schism. Konstanz attempted to join the Swiss Confederacy in about 1460, but was voted down. Due to its proximity to Switzerland, Konstanz was not bombed during world war II and its historic old town remains intact. It is a historic city with a charming old town, and could be called the jewel of the region.
Master & PhD students (for Tokyo)
The NII offers research internships for Master and PhD students, and one of these internships can be given to a Master of PhD student for supporting our projects. The compensation will be around
1.400 1.600 US$. Start of the internship would be between April and August for 2-12 months. To receive the scholarship, you apply and the Docear team preselects one or two candidates. Then, the candidates (supported by the Docear team) write a project proposal that is reviewed by the NII. If the NII approves the proposal, you can book your flight :).
German Master & PhD students (for Tokyo or Copenhagen)
For German Master and PhD students in the field of computer science, the DAAD offers the “FIT Weltweit” scholarships for 1-6 months. You could apply for a scholarship for both a research stay at the NII in Tokyo or the RSLIS in Copenhagen. The scholarship would be around 850 Euros (Master students) or 1700 Euros (PhD students) per month, plus travel expenses. If you are interested in applying for a scholarship, let us know, and we will support you in writing the application.
German Postdocs (for Tokyo)
The German Academic Exchange Service (DAAD) has a special scholarship program for postdoctoral researchers. The program DAAD Fit Weltweit for PostDocs provides excellent conditions for staying at the NII in Tokyio for up 3-24 months. During your scholarship, you can pursue a research project that you agreed on with the NII, and get a compensation of around 3.400 Euros per month, plus some additional benefits. If you are interested in such a scholarship, contact us, we will help you creating a project proposal and applying for the scholarship. Of course, we cannot give any guarantees for success. Eventually, the NII and DAAD reviewers have to accept the proposal. Contact us for more information.
German Bachelor & Master Students (for Konstanz)
If you are a student at the university of Konstanz, we might employ you as a student worker (Hiwi), or you might do any of the projects as a Bachelor, Master, or PhD project. Even if you are at another German university, we might be able to employ you as a student worker, though we cannot yet promise. Contact us, if you are interested in any of the projects.
There are usually many options for students to spend some time abroad and to do some research projects. Ask your professors or study advisers if they know of any funding opportunities. Look at the websites of your national research councils or similar organizations (in Germany that would be the DFG or DAAD). If you find a suitable program and need our help to apply for that program, let us know.
Even if you are not applicable for any funding opportunities, please, send us your application. If there should be new funding options in the future, we will contact you. In addition, feel free to join the Docear development team as a volunteer. You won’t get paid but you would work on an amazing project that is ideal for learning new technologies and doing great research.
Apply & Contact Details
To apply, send your cover letter (including 1-2 pages motivation) and CV to me, i.e. Joeran Beel firstname.lastname@example.org. Please explain in detail, which of the project(s) you are interested in, when you could start your internship, how long you could stay, where you would want to do the internship, and which funding options apply to you. We will then get back to you as soon as possible with further information on how to proceed. If you have any questions, do not hesitate to send me an email email@example.com.