Random Walks
Imagine a person, Paul, standing on the corner of 43rd Street and Madison Avenue in Midtown Manhattan during a blackout (Figures 2.19 and 2.20). Paul is a few blocks away from his home on the corner of 47th Street and Madison Avenue. However, in the dark, he cannot see anything and wanders aimlessly from one street corner to another, going up and down Madison Avenue. At each step, he has a 50 percent chance of going up the avenue and a 50 percent chance of going down. His goal is to get home safely rather than fall in the open manhole located at the corner of 42nd street and Madison Avenue. We assume that the random walk will stop when one of two things happens: Paul either ends up in the manhole or he arrives home safely. It can be proven mathematically that for a given probability p, he eventually will end in one of those two places with a probability no larger than p.
Figure 2.19. Map of Midtown Manhattan.
Figure 2.20. Random walk with 6 steps (ϕ to N = 5). For simplicity, node labels starting at zero are used.
If he is unlucky, Paul’s very first step will take him to the manhole. So, the probability of this event (let us call it M) is at least 0.5. If he is lucky, Paul will move away from the manhole in the first step; however, if after that he makes two more steps toward the manhole, he will fall into it. The probability of this happening is 0.5 ∗ 0.5 ∗ 0.5 = 0.53 = 0.125. Obviously, there are many other sequences of steps that lead to the event M. As can be seen, the probability of the event H = 1 – M (i.e., arriving home safely) seems rather small. How can we compute it?
Clearly, the value of pH is a function of the starting point. In one extreme case, Paul would start at 47th Street. In that case, he is already safely at home, so the probability pH(47) = 1. In the other extreme case, he starts at 42nd Street and falls immediately into the hole, resulting in pH(42) = 0. How can we find the value ofPH(i) for an arbitrary street number i? This is an example of a random-walk model in which a walker takes random steps on the graph G, with the walk being modeled as a Markov process – that is, the decision on what edge to follow is solely based on the vertex where the walker is currently located. Under certain conditions, this model converges to a stationary distribution of probabilities, associated with vertices in the graph. Chapter 5 provides more details on random-walk algorithms, and specifically on the PageRank algorithm used in information retrieval.