CS 172: Introduction to Information Retrieval

Spring 2022

Announcements

General Info

Instructor: Vagelis Hristidis

Description: Description: Description: Description: Description: Description: Description: Description: U:\public_html\email.JPG

Lecture time: Mon/Wed 11-12:20 pm

Location: Humanities and Social Sciences | Room 1501

Office hours: Tuesdays 3-4 pm on Zoom, email me before 3 pm on Tuesday to make appointment and send you Zoom link

TA: Muhammad Shihab Rashid, mrash013@ucr.edu

Office hour: TBD

Reader (assignments, midterms, quizzes grading): Sri Ram Vemparala, svemp002@ucr.edu

Grading

15% quizzes (worst 2 quizzes will be discarded)

25% midterm 1

25% midterm 2

10% assignment

25% project

Course Description

Information Retrieval (IR) principles including indexing and searching document collections, Web search and advanced topics like search in social networks.

Some of the topics which will be tentatively presented are:

Assignment

assignment 1

assignment 2

Project

project

Late submissions, submitted before assignments or projects are graded, will receive a 20% score reduction.

Tentative Lectures' Schedule

Date

Topic

Book Chapters

supplemental material for further reading
3/28,30

Class Overview, Overview of Information Retrieval and Search Engines

Ch. 1, 2

slides Ch. 1, slides Ch. 2 

 
4/4,6

Ranking: Vector space model, Probabilistic Model, Language model

Ch 7.1, 7.2, 7.3 (except 7.3.2)
slides Ch. 7

 

 
4/11.13

Crawling, Storing

Ch. 3, slides Ch. 3

 

 (p1) Heydon, A. and Najork, M. 1999.Mercator: A scalable, extensible Web crawlerWorld Wide Web 2, 4 (Apr. 1999), 219-229. (slides)
4/18 review session 1 slides  
4/20 MIDTERM 1    
4/25,27

Indexing and Query Processing

Ch. 5 (except 5.4.2-5.4.7, 5.7.4-5.7.5), slides Ch. 5

 (p2) R. Fagin, Amnon Lotem and Moni Naor. Optimal aggregation algorithms for middleware J. Computer and System Sciences 66 (2003), pp. 614-656. Extended abstract appeared in Proc. 2001 ACM Symposium on Principles of Database Systems (PODS '01), pp. 102-113
(p6) Jeffrey Dean and Sanjay Ghemawat.MapReduce: simplified data processing on large clusters. OSDI 2004
5/2,4

Link Analysis

Ch. 4.5

slides: link-based search

 

 

(p4) L. Page, S. Brin, R. Motwani, T.Winograd. The PageRank Citation Ranking: Bringing Order to the Web. 1999

(p5) J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(1999).

5/9,11

Evaluation

Ch. 8, slides Ch. 8

 (p3) R. Fagin, Ravi Kumar and D.Sivakumar: Comparing top-k lists. SIAM J. Discrete Mathematics 17, 1 (2003)
5/16 Review session 2 slides  
5/18

MIDTERM 2

 

 
5/23, 5/25

Deep learning and IR

 


Deep Learning in IR

Lin, Jimmy, Rodrigo Nogueira, and Andrew Yates. "Pretrained transformers for text ranking: Bert and beyond." Synthesis Lectures on Human Language Technologies 14, no. 4 (2021): 1-325.
6/1 Social search Ch 10, slides Ch. 10 (p9) Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. 2008. Can social bookmarking improve web search?. In Proceedings of the international conference on Web search and web data mining (WSDM '08)
(p10) David Carmel, Naama Zwerdling, Ido Guy, Shila Ofek-Koifman, Nadav Har'el, Inbal Ronen, Erel Uziel, Sivan Yogev, and Sergey Chernov. 2009. Personalized social search based on the user's social network. In Proceeding of the 18th ACM conference on Information and knowledge management (CIKM '09)

interesting topics, but no time to present them

Text Processing, Ch. 4.1, 4.2, 4.3, slides Ch. 4  

Q&A systems, Desktop Search

1.    (p11) Eric Brill, Susan Dumais, MicheleBanko An Analysis of the AskMSRQuestion-Answering System (EMNLP2002)

2.    (p12) S. T. Dumais, E. Cutrell, E., J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 2003

3.    QA slides

 

Relational DB and XML Search

1.    IR and DB

(p13) Sara Cohen, Jonathan Mamou,Yaron Kanza, Yehoshua Sagiv: XSEarch: A Semantic Search Engine for XML. 45-56, VLDB 2004

(p14) L. Guo, F. Shao, C. Botev, J.Shanmugasundaram: XRANK: Ranked Keyword Search over XML Documents. SIGMOD 2003

Web Search: Spam, topic-specific pagerank

1.    text classification

2.    Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly. 2006. Detecting spam web pages through content analysis. In Proceedings of the 15th international conference on World Wide Web (WWW '06)

3.    Taher H. Haveliwala, "Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search," IEEE Transactions on Knowledge and Data Engineering, vol. 15,  no. 4,  pp. 784-796,  Jul/Aug,  2003.

 

Other Resources

writing tips

 

 

Textbook

Free download at https://ciir.cs.umass.edu/irbook/

Search Engines: Information Retrieval in Practice

Bruce Croft, Donald Metzler, Trevor Strohman

Addison Wesley; 1 edition (February 16, 2009)

ISBN-10: 0136072240

ISBN-13: 978-0136072249

http://www.search-engines-book.com/

 

Also recommended for reference:

 

Policies

Academic Integrity: https://conduct.ucr.edu/