PageRank is a link analysis algorithm which assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).
PageRank was developed at Stanford University by Larry Page (hence the name Page-Rank[1]) and Sergey Brin as part of a research project about a new kind of search engine. The project started in 1995 and led to a functional prototype, named Google, in 1998. Shortly after, Page and Brin founded Google Inc., the company behind the Google search engine. While just one of many factors which determine the ranking of Google search results, PageRank continues to provide the basis for all of Google's web search tools.[2]
The name PageRank is a trademark of Google. The PageRank process has been patented (U.S. Patent 6,285,999). The patent is not assigned to Google but to Stanford University.
PageRank (II): Mathematics
Linear system of equations
Calculating the PageRank is nothing else than solving the following linear system of equations
M * PR = ( 1 - d )where 0 <> M = 1 – d T where T stands for the transition matrix. The components of T are given by the number of outgoing links:
Tij = 1 / Cj (if page j is linking to page i)Cj is the number of links on page j. The solution of the linear system of equations is
Tij = 0 (otherwise)
PR = M-1 * ( 1 – d )The calculation of the inverse matrix M-1 can (in principle) be done analytically. For 0 < href="http://pagerank.suchmaschinen-doktor.de/matrix-inversion.html">Jacobi-iteration:
PR{k+1} = ( 1 –d ) + d T * PR{k}PR{k} denotes the value of the PageRank vector after the k-th iteration. PR{0} = 1 can be taken as the initial PageRank vector. Of course, the solution is independent from the initial guess. The convergence is guaranteed, since the largest eigen value of d T is smaller than 1. The damping factor specifies the amount of PageRank that is transferred. In case of d=1 the whole PageRank is passed while for d<1>Rewriting the last equation for the components and omitting the denotation for the iteration yields
PRi = ( 1 – d ) + d ( PRX1 / CX1 + ...Often this equation is referred as PageRank algorithm. However, strictly speaking this is an iteration scheme to solve the matrix inversion for the equation at the top. Anyway the Jacobi-iteration isn’t the faster way for computation. In general the following iteration scheme (minimal residue)
+ PRXn / CXn )
PR{k+1} = PR{k} + R{k} ( ∑ Ri Mij Rj ) /with
( ∑ Min Rn Mij Rj)
R{k+1} = R{k} – M * R{k} ( ∑ Ri Mij Rj ) /converge faster. There are several other iteration schemes (see for example A. S. Householder. "The Theory of Matrices in Numerical Analysis" New York, 1964) such as Gauss-Seidel, overrelaxation methods, conjugate gradient, preconditioning methods, multigrid and blocking techniques or Chebyshev. However, some iterations schemes are restricted to hermitian matrices.
( ∑ Min Rn Mij Rj)
In some cases the linear system of equations
M * PR = ( 1 - d ) / Nis considered instead of the first one given at the top. Obviously, the solution vector is the same apart from a normalisation factor of 1 / N.
The case d=1 (no damping) has to be treated separately. This corresponds to determine the eigen vector of T with eigen value 1. The problem are degenerate eigen values. They appear for linking structures where not every page can be reached from every other page. This is the case for dead ends (pages without outgoing links) or maple leaves (close structures). In these cases the numerical solution given by the iteration scheme depends on the initial vector. It is a combination of eigen vectors for the eigen value 1. However, for d <>
Random surfer model
For d <>
M * PR = ( 1 - d ) / NIn case of dead ends the probability vector is proportional to the PageRank vector.
No comments:
Post a Comment