CS 172: Introduction to Information Retrieval
Spring 2023
General Info
Instructor: Vagelis Hristidis
Lecture time: Mon/Wed 11-12:20 pm Location: Humanities and Social Sciences | Room 1501 Office hours: Monday 3-4 pm WCH 317 |
TAs: Shihab Rashid Office hour: Thursdays 2.00 - 3.00 PM at WCH 363 Meem Office hour: Friday 10:00 - 11:00 am at WCH 363 Readers (assignments, midterms, quizzes grading): Pooja Patil |
Grading
15% quizzes (worst 2 quizzes will be discarded)
25% midterm 1
25% midterm 2
10% assignment
25% project
Course Description
Information
Retrieval (IR) principles including indexing and searching document
collections, Web search and advanced topics like search in social networks.
Some of the topics which will be tentatively presented are:
Assignment
Project
Late submissions, submitted before assignments or projects are graded, will receive a 20% score reduction.
Tentative Lectures' Schedule
Date |
Topic |
Book Chapters |
supplemental material for further reading | |
4/3 |
Class Overview, Overview of Information
Retrieval and Search Engines |
Ch. 1, 2 |
||
4/5 |
Ranking:
Vector space model, Probabilistic Model, Language model |
Ch 7.1, 7.2,
7.3 (except 7.3.2) slides Ch. 7 |
||
4/10 | no class, instructor at conference | |||
4/12 |
Ranking (cont'd) |
|
||
4/17,19 |
Crawling, Storing |
Ch.
3, slides Ch. 3
|
(p1)
Heydon, A. and Najork, M. 1999.Mercator: A scalable, extensible Web crawler. World
Wide Web 2, 4 (Apr. 1999), 219-229. (slides) |
|
4/24 | review session 1 | slides | ||
4/26 | MIDTERM 1 | |||
5/1,3 |
Indexing
and Query Processing |
Ch. 5 (except 5.4.2-5.4.7, 5.7.4-5.7.5), slides Ch. 5 |
(p2) R.
Fagin, Amnon Lotem and Moni Naor. Optimal
aggregation algorithms for middleware J. Computer and System Sciences 66
(2003), pp. 614-656. Extended abstract appeared in Proc. 2001 ACM Symposium
on Principles of Database Systems (PODS '01), pp. 102-113 (p6) Jeffrey Dean and Sanjay Ghemawat.MapReduce: simplified data processing on large clusters. OSDI 2004 |
|
5/8,10 |
Link Analysis |
Ch. 4.5 slides: link-based search
|
(p4) L. Page, S. Brin,
R. Motwani, T.Winograd. The
PageRank Citation Ranking: Bringing Order to the Web. 1999 (p5) J. Kleinberg. Authoritative sources in a
hyperlinked environment. Journal of the ACM 46(1999). |
|
5/15,17 |
Evaluation |
|
(p3) R. Fagin, Ravi Kumar and D.Sivakumar: Comparing top-k lists. SIAM J. Discrete Mathematics 17, 1 (2003) | |
5/22 | Review session 2 | slides | ||
5/24 |
MIDTERM 2 |
|
||
5/31, 6/5 |
Deep learning and IR |
|
Lin, Jimmy, Rodrigo Nogueira, and Andrew Yates. "Pretrained transformers for text ranking: Bert and beyond." Synthesis Lectures on Human Language Technologies 14, no. 4 (2021): 1-325. | |
6/7 |
Social
search |
Ch 10, slides Ch. 10 |
(p10) David Carmel, Naama Zwerdling, Ido Guy, Shila Ofek-Koifman, Nadav Har'el, Inbal Ronen, Erel Uziel, Sivan Yogev, and Sergey Chernov. 2009. Personalized social search based on the user's social network. In Proceeding of the 18th ACM conference on Information and knowledge management (CIKM '09 |
|
interesting topics, but no time to present them |
Text Processing |
Ch. 4.1, 4.2, 4.3, slides Ch. 4 |
||
Q&A systems, Desktop Search |
1.
(p11) Eric
Brill, Susan Dumais, MicheleBanko An
Analysis of the AskMSRQuestion-Answering
System (EMNLP2002) 2.
(p12) S.
T. Dumais, E. Cutrell, E., J. J. Cadiz, G. Jancke,
R. Sarin and D. C. Robbins. Stuff I've Seen: A system for personal information
retrieval and re-use. SIGIR 2003 3.
QA slides |
|||
Relational DB and XML
Search |
1.
IR and DB |
(p13) Sara
Cohen, Jonathan Mamou,Yaron Kanza, Yehoshua Sagiv: XSEarch: A Semantic Search Engine for XML.
45-56, VLDB 2004 (p14) L.
Guo, F. Shao, C. Botev, J.Shanmugasundaram: XRANK:
Ranked Keyword Search over XML Documents. SIGMOD 2003 |
||
Web Search: Spam, topic-specific pagerank |
2.
Alexandros
Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly. 2006. Detecting spam
web pages through content analysis. In Proceedings of the 15th international
conference on World Wide Web (WWW '06) 3.
Taher
H. Haveliwala, "Topic-Sensitive PageRank: A
Context-Sensitive Ranking Algorithm for Web Search," IEEE
Transactions on Knowledge and Data Engineering, vol. 15,
no. 4, pp. 784-796, Jul/Aug, 2003. |
Other Resources
Textbook
Free download at https://ciir.cs.umass.edu/irbook/
Search Engines:
Information Retrieval in Practice
Bruce Croft, Donald Metzler, Trevor Strohman
Addison Wesley; 1 edition (February 16, 2009)
ISBN-10: 0136072240
ISBN-13: 978-0136072249
http://www.search-engines-book.com/
Also recommended for reference:
Policies
Academic Integrity: https://conduct.ucr.edu/