Assignment 2

UCR - CS 172 –Spring 2018

Instructions: Submit in iLearn by 5/30. This is individual assignment.

 

Exercise A

1. Compute the first 3 iterations of PageRank scores (d=0.7) of each node in the graph below. Show your work.

2. Write a program (e.g., in Java) to compute the final scores of the nodes and the number of iterations needed to converge, if we use convergence constant epsilon=0.001.

3. If damping factor in increased, does it increase or decrease the score of the well-connected pages? How about the score of the isolated pages?

Description: http://upload.wikimedia.org/wikipedia/commons/0/0c/Small_directed_graph.JPG

 

Exercise B

Show how MapReduce can be used to efficiently solve the following problem:

Given a collection C of input documents, output a new collection C' of documents, where each document D in C is concatenated with the anchor text of the hyperlinks pointing to D from other documents. For example, if D="hello world", D1 has a hyperlink to D with text "link 1" and D2 has a hyperlink to D with text "link 2", then the output document for D should be D'="hello world link 1 link 2".

Write pseudocode for map and reduce functions.

Full points will be given to efficient solutions.

Exercise C

 Consider query Q that has a total of 6 relevant results in the collection, and a search engine that returns results:

x r r x x r x x r,   where x is a relevant result and r is not relevant.

1. Compute Precision-at-5, Recall-at-5, F1-at-5, Average Precision, and DCG-at-5 (assuming relevant results have score 1 and non-relevant 0).

2. Mention an application where higher precision is more important than higher recall and one for the opposite.