CS 242: Information Retrieval & Web Search
Winter 2021
Announcements
General Info
Instructor: Vagelis Hristidis
Lecture time: T/Th 2-3:20 pm Location: Zoom Office hour: Wednesday 4-5 pm |
TA: Luxun Xu Office hour: Monday 1pm to 2pm Reader: Huayue Gu |
Grading
25% participation and quizzes (worst 2 quizzes will be discarded, MSOL students will have until the weekend to take the quiz; others will take the quiz during the lecture time) 25% midterm 15% assignment 35% project |
Course Description
Information
Retrieval (IR) principles including indexing and searching document
collections, Web search and advanced topics like search in social networks.
Some of the topics which will be tentatively presented are:
Assignment
Project
Late submissions, submitted before assignments or projects are graded, will receive a 20% score reduction.
Tentative Lectures' Schedule
Date |
Topic |
Book Chapters |
supplemental material for further reading |
Jan 7 |
Class Overview, Overview of Information
Retrieval and Search Engines |
Ch. 1, 2 slides
Ch. 1, slides Ch. 2
(slightly more detailed version of slides of Ch. 1) |
|
Jan 12, 14 |
|
Ch 7.1, 7.2, 7.3 (except 7.3.2)
|
|
Jan 19, 21 | Crawling, Storing |
|
(p1) Heydon, A. and Najork, M. 1999.Mercator: A scalable, extensible Web crawler. World Wide Web 2, 4 (Apr. 1999), 219-229. (slides) |
Jan 26, 28 |
Indexing,
MapReduce, Query Processing |
Ch. 5 (except 5.4.2-5.4.7, 5.7.4-5.7.5), slides Ch. 5 |
(p2) R.
Fagin, Amnon Lotem and Moni Naor.
Optimal aggregation algorithms for middleware J. Computer and System
Sciences 66 (2003), pp. 614-656. Extended abstract appeared in Proc. 2001 ACM
Symposium on Principles of Database Systems (PODS '01), pp. 102-113 (p6) Jeffrey Dean and Sanjay Ghemawat.MapReduce: simplified data processing on large clusters. OSDI 2004 |
Feb 2 | Set up and use Hadoop and Lucene (by TA) | ||
Feb 4, 9 | Link Analysis |
C slides: link-based
search |
(p4) L. Page, S. Brin, R. Motwani,
T.Winograd. The PageRank Citation
Ranking: Bringing Order to the Web. 1999 (p5) J. Kleinberg. Authoritative sources in a
hyperlinked environment. Journal of the ACM 46(1999). |
Feb 11 |
Evaluation Part 1 |
Ch.
8,
slides Ch. 8 |
(p3) R.
Fagin, Ravi Kumar and D.Sivakumar: Comparing top-k lists. SIAM J.
Discrete Mathematics 17, 1 (2003) |
Feb 16 |
|
||
Feb 18 | MIDTERM on iLearn | ||
Feb 23 | Evaluation Part 2 |
Ch.
8 (cont'd),
slides Ch. 8 |
|
Feb 25 | Text Processing |
Ch. 4.1, 4.2, 4.3,
slides Ch. 4 |
|
Mar 2,4 |
Query Refinement, Results Presentation |
Ch. 6.1, 6.2, 63,
slides Ch. 6 |
G Salton, C Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 1990
Zamir,
O. and Etzioni, O. 1998.Web document clustering: a
feasibility demonstration. ACM SIGIR '98
|
Mar 9 |
|
|
|
Mar 11 |
Project Presentations |
|
|
Interesting topics but no time to present in class |
Relational DB and XML
Search |
1.
IR and DB |
(p13) Sara
Cohen, Jonathan Mamou,Yaron Kanza, Yehoshua Sagiv: XSEarch: A Semantic Search Engine for XML.
45-56, VLDB 2004 (p14) L.
Guo, F. Shao, C. Botev, J.Shanmugasundaram: XRANK:
Ranked Keyword Search over XML Documents. SIGMOD 2003 |
Web Search: Spam, topic-specific pagerank |
2.
Alexandros
Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly. 2006. Detecting spam
web pages through content analysis. In Proceedings of the 15th international
conference on World Wide Web (WWW '06) 3.
Taher
H. Haveliwala, "Topic-Sensitive PageRank: A
Context-Sensitive Ranking Algorithm for Web Search," IEEE
Transactions on Knowledge and Data Engineering, vol. 15,
no. 4, pp. 784-796, Jul/Aug, 2003. |
Other Resources
Textbook
Free download at https://ciir.cs.umass.edu/downloads/SEIRiP.pdf
Search Engines:
Information Retrieval in Practice
Bruce Croft, Donald Metzler, Trevor Strohman
Addison Wesley; 1 edition
ISBN-10: 0136072240
ISBN-13: 978-0136072249
http://www.search-engines-book.com/
Also recommended for reference:
Policies
Academic Integrity: https://conduct.ucr.edu/