CS 242: Information Retrieval & Web Search

Fall 2016

Announcements

General Info

Instructor: Vagelis Hristidis

Description: Description: Description: Description: Description: Description: Description: Description: U:\public_html\email.JPG

Lecture time: M/W 2:10-3:30 pm

Location: MSE 011

Office hours: Wed 1-2 pm

TA:  Nhat Le
office hour: Tuesday 1-2 pm

Grading

15% participation and quizzes (worst quiz will be discarded)

40% midterm

10% assignment

35% project

Course Description

Information Retrieval (IR) principles including indexing and searching document collections, Web search and advanced topics like search in social networks.

Some of the topics which will be tentatively presented are:

Assignment

assignment 

Project

project

Late submissions, submitted before assignments or projects are graded, will receive a 20% score reduction.

 

Presentations schedule:

11/28: Groups 1-8

11/30: Groups 9-16

Group First name Last name
1
Tanaya Vadgave
Shravani Madhavaram
2
Abdulrahman Aloraini
Akarsha Byadarahalli-Mahadeva
3
Jingnan Cao
Jiaheng Lin
4
Sharmistha Bardhan
Gisel Bastidas Guacho
5
Amr Elsisy
Lin Jiang
6
Raghavendra Dinesh Pasupuleti
Abhignana Kandepu
7
Zhida Liu
Bojian Du
8
Kevin Lam
Ping He
9
Umar Farooq
Zacharias Chasparis
10
Tianxiong Yang
Tianyi Xia
11
Yawei Li
Zhencong Li
12
Vishnu Chandrasekar
Shweti Mahajan
13
Kaicheng Shou
Brittany Cook
14
Haopeng Liu
Guangda Zhang
15
Sharan Kumar
Shiva Ramachandran
16
Jiahuan Liu
Chunhao Shan

Tentative Lectures’ Schedule

Date

Topic

Book Chapters

supplemental material for further reading
9/26

Class Overview, Overview of Information Retrieval and Search Engines

Ch. 1, 2

slides Ch. 1, slides Ch. 2 (slightly more detailed version of slides of Ch. 1) 

 
9/28,10/3

Ranking: Vector space model, Probabilistic Model, Language model

Ch 7.1, 7.2, 7.3 (except 7.3.2)
slides Ch. 7

 

 Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, Hsiao-Wuen Hon. Adapting Ranking SVM to Document Retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference (SIGIR’06), pages 186-193, 2006. (pdf)
10/5,10 Crawling, Storing Ch. 3, slides Ch. 3

(p1) Heydon, A. and Najork, M. 1999.Mercator: A scalable, extensible Web crawlerWorld Wide Web 2, 4 (Apr. 1999), 219-229. (slides)

10/12,17 Indexing, MapReduce, Query Processing Ch. 5 (except 5.4.2-5.4.7, 5.7.4-5.7.5), slides Ch. 5 (p2) R. Fagin, Amnon Lotem and Moni Naor. Optimal aggregation algorithms for middleware J. Computer and System Sciences 66 (2003), pp. 614-656. Extended abstract appeared in Proc. 2001 ACM Symposium on Principles of Database Systems (PODS '01), pp. 102-113
(p6) Jeffrey Dean and Sanjay Ghemawat.MapReduce: simplified data processing on large clusters. OSDI 2004
10/19 Use of class Hadoop cluster and Lucene by TA slides  
10/24,26

Link Analysis, Evaluation

Ch. 4.5

slides: link-based search

(p4) L. Page, S. Brin, R. Motwani, T.Winograd. The PageRank Citation Ranking: Bringing Order to the Web. 1999

(p5) J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(1999).

10/31

Evaluation (cont'd)

Ch. 8, slides Ch. 8

 (p3) R. Fagin, Ravi Kumar and D.Sivakumar: Comparing top-k lists. SIAM J. Discrete Mathematics 17, 1 (2003)
11/2 Text Processing Ch. 4.1, 4.2, 4.3, slides Ch. 4  
11/7,9 Query Refinement, Results Presentation (snippets), word2vec Ch. 6.1, 6.2, 63, slides Ch. 6, word2vec

G Salton, C Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 1990

Zamir, O. and Etzioni, O. 1998.Web document clustering: a feasibility demonstration. ACM SIGIR '98

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient es-
timation of word representations in vector space. CoRR, abs/1301.3781,
2013.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey
Dean. Distributed representations of words and phrases and their composi-
tionality. In Advances in Neural Information Processing Systems 26: 27th
Annual Conference on Neural Information Processing Systems 2013. Pro-
ceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United
States, pages 3111–3119, 2013.

11/14

Review session

slides

 
11/16 MIDTERM  

 

11/21

Social Search, Question Answering Systems

Ch 10, slides Ch. 10

(p11) Eric Brill, Susan Dumais, MicheleBanko An Analysis of the AskMSRQuestion-Answering System (EMNLP2002)

(p9) Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. 2008. Can social bookmarking improve web search?. In Proceedings of the international conference on Web search and web data mining (WSDM '08)

(p10) David Carmel, Naama Zwerdling, Ido Guy, Shila Ofek-Koifman, Nadav Har'el, Inbal Ronen, Erel Uziel, Sivan Yogev, and Sergey Chernov. 2009. Personalized social search based on the user's social network. In Proceeding of the 18th ACM conference on Information and knowledge management (CIKM '09)

11/23 No class, Instructor at out of town meeting    
11/28,30

Project Presentations

 

 

Interesting topics but no time to present in class

 Desktop Search

(p12) S. T. Dumais, E. Cutrell, E., J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 200

  QA slides

 

Relational DB and XML Search

1.    IR and DB

(p13) Sara Cohen, Jonathan Mamou,Yaron Kanza, Yehoshua Sagiv: XSEarch: A Semantic Search Engine for XML. 45-56, VLDB 2004

(p14) L. Guo, F. Shao, C. Botev, J.Shanmugasundaram: XRANK: Ranked Keyword Search over XML Documents. SIGMOD 2003

Web Search: Spam, topic-specific pagerank

1.    text classification

2.    Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly. 2006. Detecting spam web pages through content analysis. In Proceedings of the 15th international conference on World Wide Web (WWW '06)

3.    Taher H. Haveliwala, "Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search," IEEE Transactions on Knowledge and Data Engineering, vol. 15,  no. 4,  pp. 784-796,  Jul/Aug,  2003.

 

Other Resources

writing tips

presentation tips

 

Textbook

Search Engines: Information Retrieval in Practice

Bruce Croft, Donald Metzler, Trevor Strohman

Addison Wesley; 1 edition (February 16, 2009)

ISBN-10: 0136072240

ISBN-13: 978-0136072249

http://www.search-engines-book.com/

 

Also recommended for reference:

 

Policies

Academic Integrity:  http://conduct.ucr.edu/learnPolicies/Pages/AcademicIntegrity.aspx

Standards of Conduct: http://conduct.ucr.edu/learnPolicies/Pages/StandardsofConduct.aspx