Key Features of the Web Graph
- The web graph is a directed graph, meaning the edges have a direction.
- If page A links to page B, there is an edge from A to B, but not necessarily from B to A.
Bowtie Structure
The bowtie structure is a model that describes the overall shape of the web graph. It divides the web into several distinct regions:
- Strongly Connected Core (SCC): A large central component where every page can be reached from any other page via a series of hyperlinks.
- IN Component: Pages that can reach the SCC but cannot be reached from it.
- OUT Component: Pages that can be reached from the SCC but cannot reach it.
- Tendrils: Pages that are connected to the IN or OUT components but not to the SCC.
- Disconnected Components: Pages that are completely isolated from the main structure.
Consider a university website:
- The homepage is part of the SCC because it links to and is linked from many other pages.
- An old event page might be in the OUT component if it is linked from the homepage but doesn't link back.
- A personal blog that links to the university but isn't linked from it would be in the IN component.
Strongly Connected Core (SCC)
The SCC is the heart of the web graph. It has several important properties:
- Mutual Reachability: Any page in the SCC can be reached from any other page within the SCC.
- High Connectivity: The SCC contains a large portion of the web's most important and frequently visited pages.
- Stability: The SCC is relatively stable over time, even as the web grows and changes.
- The SCC is crucial for web navigation.
- Search engines often prioritize pages in the SCC because they are more likely to be relevant and well-connected.
Diameter
- The diameter of the web graph is the longest shortest path between any two pages.
- In simpler terms, it measures how many clicks it takes to get from one page to another, in the worst-case scenario.
- Small-World Phenomenon: Despite the vast size of the web, the diameter is surprisingly small. Most pages can be reached from any other page in just a few clicks.
- Efficiency: The small diameter makes the web highly navigable, allowing users to find information quickly.
Studies have shown that the diameter of the web is typically around 16-20, meaning it takes at most 16-20 clicks to navigate between any two pages.
Emergence of the Web Graph Structure
The structure of the web graph is not designed by a central authority. Instead, it emerges from the behavior of web users and creators:
- Link Creation: When users create hyperlinks, they naturally form connections that lead to the bowtie structure.
- Popularity Bias: Popular pages tend to attract more links, reinforcing the SCC and OUT components.
- Content Evolution: As new pages are added and old ones are removed, the web graph evolves, but the overall structure remains consistent.
Understanding the web graph helps in designing better search engines, improving web navigation, and analyzing the spread of information online.