UNIVERSITY OF CALIFORNIA
RIVERSIDE






Information Retrieval in Peer-to-Peer Systems





A Thesis submitted in partial satisfaction
of the requirements for the degree of




Master of Science
in
Computer Science



by
Demetrios Zeinalipour-Yazti
June 2003




Thesis Committee:
Dr. Dimitrios Gunopulos, Chairperson
Dr. Vana Kalogeraki
Dr. Chinya V. Ravishankar
ACKNOWLEDGMENTS


The following faculties, student fellows, friends, and family members have contributed to this dissertation and played a significant role throughout this work. Without each of them, this work would be very difficult to accomplish. First, I would like to thank Prof. Dimitrios Gunopulos for his support during this work, his continuous guidance and all the fruitful discussions that laid the foundation of the research presented in this thesis. I also thank Prof. Vana Kalogeraki for her encouragement, her important feedback throughout our collaboration. Finally I would like to thank Prof. Chinya Ravishankar for taking the time out of his busy schedule to be on my thesis committee. I thank the colleagues and friends of the database lab and the department, for sharing the burden of the research through the years of my graduate study here at UCR. Theodoros Folias has made all the projects we 've been working on more interesting and colorful. Thanks to Anil, Dimitris, Foula, Jessica, Marios, Michalis, Mirella, Sharmila and all the other people here at UCR for the good times spent together. I would like to thank my parents Djalal and Androula Zeinalipour, as well as my brother Constantino and his fianceé Erika. They always supported me in pursuing my objectives and I own them my beliefs in education, and my willingness to learn and go through the requirements of the Masters Degree. Last I would like to thank my fianceé Christina, which stood on my side providing me with courage and support during the two year period we were apart. I dedicate this work to her.
An earlier version of this work appeared in the proceedings of the ACM CIKM Internatio-
nal Conference on Information and Knowledge Management, McLean, VA, pp 300-307, Nov. 2002


DEDICATION
.
To my fianceé Christina
ABSTRACT OF THE THESIS


Information Retrieval in Peer-to-Peer Systems by Demetrios Zeinalipour-Yazti



Master of Science, Graduate Program in Computer Science

University of California, Riverside, June 2003

Prof. Dimitrios Gunopulos, Chairperson




Peer-to-Peer systems are application layer networks which enable networked hosts to share resources in a distributed manner. An important problem in such networks is to be able to efficiently search the contents of the other peers. Existing search techniques are inefficient because they are either based on the idea of flooding the network with queries or because they require some form of global knowledge. We propose the Intelligent Search Mechanism (ISM) which is an efficient, scalable but yet simple mechanism for improving the information retrieval problem in Peer-to-Peer systems. Our mechanism is efficient since it is bounded by the number of neighbors and scalable because no global knowledge is required to be maintained. ISM consists of four components: A Profile Mechanism which logs query-hits coming from neighbors, a Cosine Similarity function which calculates the closeness of some past query to a new query, RelevanceRank which is an online ranking mechanism that ranks the neighbors of some node according to their potentiality of answering the new query and a Search Mechanism which forwards a query to the selected neighbors. We deploy and compare ISM with a number of other distributed search techniques. Our experiments are performed with real data over PeerWare, our middleware simulation infrastructure which is deployed on 50 workstations. Our results indicate that ISM is an attractive technique for keyword searching in Peer-to-Peer systems since it achieves in some cases 100% recall rate by using only 50% of the messages used in the flooding algorithm. Further its performance is also superior with respect to the total time for satisfying a query. Finally our algorithm improves over time as some node develops more knowledge.

Contents