Recommender System For Dating Services

The aim of the thesis is to research the utility of collaborative filtering based recommender systems in the area of dating services. The practical part of the thesis describes the actual implementation of several standard collaborative filtering algorithms and a system, which recommends potential personal matches to users based on their preferences (e.g. ratings of other user profiles). The collaborative filtering is built upon the assumption, that users with similar rating patterns will also rate alike in the future.Second part of the work focuses on several benchmarks of the implemented system’s accuracy and performance on publicly available data sets (MovieLens and Jester) and also on data sets originating from real online dating services (ChceteMˇe and L´ıb´ımSeTi). All benchmark results proved that collaborative filtering technique could be successfully used in the area of online dating services.

1.    Introduction: makes the reader acquainted with the concept of collaborative filtering and introduces its utilization in the environment of dating services.Since long before the industrial revolution, information has been the most valuable element in almost all human activities. For the last five decades, with the great expansion of technology, and especially with the Internet boom over the last few years, the amount of readily available information has increased exponentially. Objectively, searching through the vast amount of available information has become a true challenge for everyone. This phenomenon, mostly referred to as the information overload, describes the state of having too much information to make relevant decisions or remain informed about a certain topic [2].The task of making well-informed decisions supported by finding relevant information is addressed in research of the domain of information retrieval, which is “. . . the art and science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand alone databases or ypertext networked databases such as the Internet or intranets, for text, sound, images or data” [3].The research on information retrieval has developed several technology-based solutions addressing information overload: intelligent agents, ranking algorithms, cluster analysis,web mining/data mining, personalization, recommender systems and collaborative filtering. This thesis focuses on recommender systems relying on collaborative filtering. Collaborative Filtering, Recommender Systems Typically, a search for a specific piece of information is based on some kind of a query supplied by a user. For example, a search for textual documents or web pages is mostly performed with a simple query containing words or phrases desired in the returned documents. In search for audio files a structured query might be used, combining for example song title, artist, album name, genre and release date.

Existing Applications: One of the earliest implementations of collaborative filtering approach was used in the Tapestry experimental mail system developed at the Xerox Palo Alto Research Center in the year 1992 [17]. This system was used as a mail filtering information system utilizing the collaborative sense of a small group of participating users. The Tapestry system did not use a definite concept of automated collaborative filtering1 as the actual filtering technique was strictly bound to the concrete user-defined queries. Every Tapestry user maintained personal list of filtering queries based on SQL-like language – Tapestry Query Language (TQL). The collaborative part was included as an extension to TQL, allowing users to link their queries to the queries of other specific users. For example, a query returning all messages selected by Terry’s “Baseball” query that contain the word “Dodgers” would be: m IN Terry.Baseball AND m.words = (‘Dodgers’).The concept of real automated collaborative filtering first appeared in the GroupLens Research Project2 system using neighborhood-based algorithm for providing personalized predictions for Usenet news articles [31]. Since then, the research on collaborative filtering has drawn significant attention and a number of collaborative filtering systems in broad variety of application domains have appeared.Probably the most publicly aware system using collaborative filtering in the last few years has been the item-to-item recommender at It is used to personalize the online store for each customer. The store radically changes based on customer interests,showing programming titles to a software engineer and baby toys to a new mother [24].More examples of commercial systems using collaborative filtering are:
• – the world’s largest online DVD movie rental service
• GenieLab::Music – a music recommender system (compatible with iTunes)
• Musicmatch Jukebox – a music player, organizer and recommendation system
• – Alexandria Digital Literature, eBookstore under the Seattle Book
• – personalized news website
And the examples of non-commercial systems using collaborative filtering are:
• – music recommendation site (formerly RACOFI Music)
• – a public site where people can donate their bookmarks
• – the Global Network of Dreams, a recommender system for music, movies and authors of books
• – a movie recommender system
Dating Services Wikipedia definition of dating service (or dating system) states: “A dating system is any systemic means of improving matchmaking via rules or technology. It is a specialized meeting system where the objective of the meeting, be it live or phone or chat based, is to go on a live date with someone, with usually romantic implications” [1]. The matchmaking is any expert-run process of introducing people for the purposes of dating and mating,usually in the context of marriage [4].Typically, with most dating or matchmaking services, user has to waste considerable time filling out lengthy personality questionnaires and character trait analyzation charts as the matching phase of the service is strictly content-based. Most of the time the user.

2.    Data Sets Analysis: This chapter describes in detail several data sets used to develop and evaluate ColFi collaborative filtering algorithms. Collaborative filtering is not so interesting without any reasonably large data set. Four independent experimental data sets were chosen. These data sets were used to evaluate the emergent collaborative filtering algorithms during the design and development phase. At the end, these data sets were also used to benchmark the final implementations – see chapter 6 on the page 41.The data sets are based on the explicit data of various applications. Most of these applications do store other information along with the plain ratings matrix, but these additional data were discarded as they are out of the scope of this thesis. Some of the original data had to be modified in certain way to satisfy the requirements of the dating service application data set. These modifications were specific for each data source and they are described  individually in the following paragraphs.

2.1     MovieLens Data Set: The MovieLens data set is the most popular publicly available experimental data set. The MovieLens is a web-based recommender system for movies1, based on GroupLens technology (part of the GroupLens Research Project). The MovieLens data set originated from the EachMovie movie recommender data provided by HP/Compaq Research (formerly DEC Research).In this thesis, the so called 1 Million MovieLens data set was used in its full extent.Both MovieLens users and MovieLens movies were declared as ColFi users to conform the square ColFi ratings matrix. The original ratings scale h1..5i was linearly scaled into the ColFi’s internal scale h1..127i. This way transformed data set is further referred to as the MovieLens data set.

2.2     Jester Data Set: The Jester data set is another well known and publicly available experimental data set. The Jester Joke Recommender System is a web-based recommender system for jokes2,created by Prof. Ken Goldberg and the Jester team at the Alpha Lab, UC Berkeley. This data set is very specific due to the fact that it originally contained only 100 jokes which all the Jester users may have rated. This resulted in “almost empty” square ratings matrix with very dense 100 columns.

3.    Algorithms: This chapter focuses on the algorithms presented in this thesis and their main ideas. The fundamental task in collaborative filtering is to predict a set of missing ratings of a particular user (the active user) based on the data contained in the ratings matrix (sometimes referred to as the user database). There exists the following two general approaches to collaborative filtering:
i) Memory-based algorithms – these algorithms operate in the online manner over the entire ratings matrix to make predictions
ii) Model-based algorithms – these algorithms, in contrast, use the ratings matrix in the offline manner to estimate a model, which is then used for predictions
The memory-based algorithms usually yield better absolute results and are more adaptable to dynamic environments where the ratings matrix gets modified very often. On the other hand, the memory-based approach is very computationally expensive, so applications using large data sets may prefer the approximation offered by  model-based algorithms. This thesis examines two most popular collaborative filtering algorithms based on the (k)-Nearest Neighbours Algorithm: User-User Nearest Neighbours Algorithm and Item-Item Nearest Neighbours Algorithm. These are probably the most important algorithms in the memory-based category. Two more trivial algorithms (the Random Algorithm and the Mean Algorithm) are introduced to have a simple common base for later comparison. Random algorithm  predictions stay random, but the algorithm’s behaviour is invariable even when involved in the independent runs.This algorithm with its theoretical possible accuracy results are mentioned in the
3.1 Mean Algorithm: The Mean Algorithm is a very simple but surprisingly effective memory-based collaborative filtering algorithm. It is sometimes referred to as the Item Average Algorithm or the POP algorithm [9] as it falls within the so called popularity prediction techniques [37].The prediction pa,j is calculated as the mean value of all the non-empty ratings targeted to the user j (the mean score for the user j). If the Sj is a set of users which have rated.
3.2 User-User Nearest Neighbours Algorithm: The User-User variant of the (k)-Nearest Neighbours Algorithm is the most essential algorithm in the whole concept of collaborative filtering. The main idea of this approach is quite simple, yet ingenious. When predicting ratings of the active user a, the user database is first searched for users with similar ratings vectors to the user a (users with similar opinions or “taste” - the so called neighbours). The ratings similarity, in most of the literature denoted as w(i, a), can reflect for example the distance or correlation between each user i and the active user a. The opinions of the k most similar neighbours are then used to form the predictions of the active user.The predicted rating pa,j of the active user a for another user j is assumed to be a weighted sum of the ratings for user j of those k most similar neighbours.

4     Conclusion:    This concluding chapter summarizes the contributions of the thesis and proposes possible directions for future work.This paper was concerned with application of collaborative filtering approach in the domain of online dating services. The main goals and structure of the work were:
• General collaborative filtering research
• Application of collaborative filtering concept to the online dating services environment
• Open-source recommender system implementation
• Benchmarks of the presented solution on the real dating service data sets
All these goals were successfully accomplished. The application of collaborative filtering concept to the online dating services environment is presented as a detailed dating service data sets analysis. Chapters 4 and 5 deal with the implementation  of the open-source recommender system1 (the ColFi System). And finally, the benchmarks of the presented solution on the real dating service data sets were realized in previous chapter.Interesting results gathered in this thesis proved that collaborative filtering is a viable promising approach that could simplify and improve the matchmaking process in online dating services with a relatively little effort. And even though collaborative filtering algorithms are quite computationally expensive, for their application in dating services environment the matchmaking process does not necessarily have to be real-time, as offline recommendation computations are sufficient.

5    Future Work: The application of collaborative filtering is only the first important step in improving the online dating services. There are several more issues that need to be worked on in the future:
• Scalability – presented basic collaborative filtering algorithms are too general and therefore unoptimized and computationally expensive. More advanced algorithms should be tested or developed to answer the problem of growing number of users and huge data sets.
• Distributed solution – another answer to the scalability issue could be a distributed recommender system as most of the computations could utilize massive parallelism.
• Hybrid algorithms – there is an extensive area of so called hybrid collaborative filtering algorithms. These algorithms are combining the collaborative filtering approach together with a simple content-based searc tralia,

6.    References:
1.    Wikipedia on dating service, 2006.
2.    Wikipedia on information overload, 2006.
3.    Wikipedia on information retrieval, 2006.
4.    Wikipedia on matchmaking, 2006.
5.    Andrew Fiore And. Online personals: An overview.
6.    M. Anderson, M. Ball, H. Boley, S. Greene, N. Howse, D. Lemire, and S. McGrath.Racofi (2003), A rule-applying collaborative filtering system.
7.    P. Baudisch. Joining collaborative and content-based filtering.
8.    Shai Ben-David. A framework for statistical clustering with a constant time approximation algorithms for k-median clustering.
9.    John S. Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive algorithms for collaborative filtering. pages 43–52.
10.    Christopher N. Carlson, (2003), Information overload, retrieval strategies and internet user empowerment.
11.    Sonny Han Seng Chee, Jiawei Han, and Ke Wang. Rectree: An efficient collaborative filtering method. In DaWaK (2001): Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery, pp 141–151, London, UK, Springer-Verlag.
12.    Sonny Han Seng Chee, Jiawei Han, and Ke Wang. RecTree (2001), An efficient collaborative filtering method. Lecture Notes in Computer Science.
13.    Mark Connor and Jon Herlocker (1998). Clustering items for collaborative filtering. algorithm for large databases pp 73–84.
14.    Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl (1999). An algorithmic framework for performing collaborative filtering. In SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp 230–237, New York, NY, USA, ACM Press.