Manipulating the Example Network

Stephen Collins, Kathleen McLane, Andrew Bender, Julian Easley, Hannah Andrada

The first task was to verify the given dominant eigenvector for \(G\) using the example adjacency matrix. A simple function (rank.m) was written that constructed \(G\) from \(A\) then used MATLAB's built-in eig function to compute \(p\).

Constructing G and p

\[\begin{align} G &\approx 10^{-2} \times \begin{bmatrix} 1 & 1 & 1 & 1 & 44 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 44 & 1 & 29 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 29 & 1 & 44 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 44 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 29 & 1 & 1 & 1 & 1 & 1 & 1 & 29 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 29 & 1 & 1 & 1 & 1 & 1 & 29 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 29 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 29 & 1 & 1 & 1 \\ 1 & 1 & 29 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 29 & 1 & 1 & 1 \\ 44 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 44 & 1 & 1 \\ 1 & 1 & 1 & 1 & 44 & 44 & 44 & 1 & 29 & 1 & 1 & 1 & 1 & 22 & 1 \\ 1 & 1 & 1 & 1 & 1 & 44 & 44 & 44 & 1 & 1 & 1 & 29 & 1 & 22 & 1 \\ 1 & 1 & 1 & 44 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 44 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 86 & 1 & 1 & 1 & 22 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 44 & 1 & 44 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 86 & 1 & 1 & 22 & 1 \end{bmatrix} & p &= \begin{bmatrix} 0.0268 \\ 0.0299 \\ 0.0299 \\ 0.0268 \\ 0.0396 \\ 0.0396 \\ 0.0396 \\ 0.0396 \\ 0.0746 \\ 0.1063 \\ 0.1063 \\ 0.0746 \\ 0.1251 \\ 0.1163 \\ 0.1251 \end{bmatrix} \end{align}\]

Having verified the given page rank vector, the jump probablity \(q\) was changed to 0 and then 0.5 to observe the effect on page ranks.

Page ranks for q = 0.0 and q = 0.5

\[\begin{align} p_{q=0.0} &= \begin{bmatrix} 0.0154 \\ 0.0116 \\ 0.0116 \\ 0.0154 \\ 0.0309 \\ 0.0309 \\ 0.0309 \\ 0.0309 \\ 0.0811 \\ 0.1100 \\ 0.1100 \\ 0.0811 \\ 0.1467 \\ 0.1467 \\ 0.1467 \end{bmatrix} & p_{q=0.5} &= \begin{bmatrix} 0.0467 \\ 0.0540 \\ 0.0540 \\ 0.0467 \\ 0.0536 \\ 0.0536 \\ 0.0536 \\ 0.0536 \\ 0.0676 \\ 0.0946 \\ 0.0946 \\ 0.0676 \\ 0.0905 \\ 0.0786 \\ 0.0905 \end{bmatrix} \end{align}\]

With the jump probability set to 0.0, the probability of being on any given page is determined solely by the links feeding into it, both directly and by upstream nodes. Because a hypothetical web surfer must, in this case, follow the links to reach a web page, the difference in probabilities between high traffic and low traffic web pages becomes more pronounced. However, when \(q\) = 0.5 it begins to overwhelm the contribution from links and flatten the probability distribution. This is the jump probability's purpose - to dampen probability contribution from links. And it is described as a damping factor in The Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Lawrence Page.

We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.

In order to model an attempt to improve page rank by more prominently displaying links, the values for \(A_{2,7}\) and \(A_{12,7}\) were replaced by 2, and \(p\) was evaluated to see if node 7's rank improved. The original vector, labeled \(p_{original}\), has been included for comparison.

Page 7 rank after increasing link values

\[\begin{align} p_{original} &= \begin{bmatrix} 0.0268 \\ 0.0299 \\ 0.0299 \\ 0.0268 \\ 0.0396 \\ 0.0396 \\ \textbf{0.0396} \\ 0.0396 \\ 0.0746 \\ 0.1063 \\ 0.1063 \\ 0.0746 \\ 0.1251 \\ 0.1163 \\ 0.1251 \end{bmatrix} & p &= \begin{bmatrix} 0.0260 \\ 0.0285 \\ 0.0262 \\ 0.0239 \\ 0.0376 \\ 0.0390 \\ \textbf{0.0528} \\ 0.0328 \\ 0.0762 \\ 0.1115 \\ 0.1033 \\ 0.0723 \\ 0.1297 \\ 0.1173 \\ 0.1227 \end{bmatrix} \end{align}\] The strategy worked; node 7's rank improved. The ranking for node 10, which node 7 links to, also improved. However, nodes 2 and 12 sacrificed some of their own ranking by emphasizing their link to node 7 and transferring more importance downstream. As a final step in analyzing the example network, node 10 was removed and the effect on page rankings recorded. Again, the inital vector has been included along with a blank space being left in the new vector for easy comparison.

Page ranks after eliminating Page 10

\[\begin{align} p_{original} &= \begin{bmatrix} 0.0268 \\ 0.0299 \\ 0.0299 \\ 0.0268 \\ 0.0396 \\ 0.0396 \\ 0.0396 \\ 0.0396 \\ 0.0746 \\ \textbf{0.1063} \\ 0.1063 \\ 0.0746 \\ 0.1251 \\ 0.1163 \\ 0.1251 \end{bmatrix} & p &= \begin{bmatrix} 0.0471 \\ 0.0409 \\ 0.0359 \\ 0.0321 \\ 0.0428 \\ 0.0414 \\ 0.0517 \\ 0.0502 \\ 0.0482 \\ \\ 0.1710 \\ 0.1036 \\ 0.0412 \\ 0.1075 \\ 0.1865 \end{bmatrix} \end{align}\] As would be expected, the ranking for node 13, which benefited significantly from traffic through node 10, dropped precipitously, and this was indicitive of the overall trend. Nodes 10 and 11 are each positioned as traffic funnels within the network. Removing 10 increased the proportional share of traffic funneled through 11. Because of this, nodes immediately downstream of 11 saw significant increase in page rank while nodes formerly downstream of 10 lost page rank. Other nodes saw marginal increase in page rank, if for no other reason than the increase in jump probability due to fewer nodes exisiting. \[\begin{align}Increase &: {p_1, p_2, p_3, p_4, p_5, p_6, p_7, p_8, p_{11}, p_{12}, p_{15}} \\ Decrease &: {p_9, p_{13}, p_{14}} \end{align}\] The remainder of the project involves repeating these steps with our own networks.