Odds not hopeless for new Web sites

By Eric Smalley and Kimberly Patch, Technology Research News

There is at least a little room at the top, according to a team of NEC researchers who found that the structure of groups of related sites on the World Wide Web is different than that of the Web as a whole.

Recent research has shown that the overall distribution of links on the Web follows a power-law structure, meaning that a small number of large Web sites gain most new links, making them larger still. "This means that an extremely small number of Web pages have the vast majority of inlinks," said David Pennock, a research scientist at NEC Research Institute.

The NEC researchers found that the distribution of inbound links within specific Web communities isn't quite as highly concentrated, however. They used the finding to build a model that accounts for the differing structures of various segments of the Web versus the Web as a whole.

The model can be used as a tool to measure the degree of competitiveness in a network, said Pennock. "This may be important, for example, to e-commerce companies looking to enter a new market niche," he said. The model also applies to natural and social networks like metabolic groups of cells and actor collaborations, he said.

In a rich-get-richer network, nodes that already have many inbound links tend to receive more over time. "Mathematically, the probability that a node receives another inlink is proportional to the number of inlinks it already has," said Pennock. If a network grows via this preferential attachment process alone, "the rich nodes keep getting richer and the poor nodes can never catch up," he said.

Although it is well established that the rich-get-richer phenomenon applies to the Web as a whole, the research shows that the distribution of inbound links within Web communities is sometimes different from that of the entire Web. Pages within a Web community that contain different numbers of links can have the same probability of receiving a given new link. The researchers refer to this as uniform attachment, in contrast to the preferential attachment of the Web as a whole.

In a uniform-attachment community "more Web pages fare better than would be the case under a pure power-law distribution," said Pennock.

Web communities have a mix of uniform and preferential attachments. For some communities, the percentage of preferential attachment is high and link-poor nodes will have difficulty ever catching up, said Pennock. If the percentage of uniform attachment is high, however, "poor nodes can often -- with some luck -- get rich, too," he said.

Using the model, the researchers found that the community of e-commerce Web sites selling publications, a category dominated by Amazon.com, is highly competitive and is structured similarly to the Web as a whole. In contrast, the community of e-commerce Web sites selling professional photographers' services is much less competitive, meaning smaller sites have a better chance of gaining links in order to grow.

The research also provides insights into the vulnerability of networks to both accidental failures and malicious attacks, Pennock said. "With more accurate models of different network types, we can begin to understand which are more robust and which are more delicate and prone to disruption," he said.

The researchers' next steps are to measure competition within different Web communities and to apply their model to better understand network fault-tolerance and robustness, said Pennock. They also plan to incorporate some sense of Web page topics because Web pages about the same topic tend to link to one another, Pennock said.

Anyone with access to a search engine and who has the ability to program the researchers' model can now measure the degree of competitiveness within Web communities, said Pennock. Applications of the model in other areas including network fault tolerance and mobile phone networks "are probably about two years off," he said.

Pennock's research colleagues were Gary W. Flake, Steve Lawrence and Eric J. Glover of NEC Research Institute and C. Lee Giles of NEC Research Institute and Pennsylvania State University. They published the research in the April 16, 2002 issue of the Proceedings of the National Academy of Sciences. The research was funded by NEC Corporation.

Timeline:   Now, 2 years
Funding:   Corporate
TRN Categories:   Internet
Story Type:   News
Related Elements:  Technical paper, "Winners Don't Take All: Characterizing the Competition for Links on the Web," Proceedings of the National Academy of Sciences, April 16, 2002


April 17/24, 2002

Page One

Shake and serve

Odds not hopeless for new Web sites

Content scheme banishes browser plug-ins

Polarized light speeds messages

File compressor ID's authors


Research News Roundup
Research Watch blog

View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 

Ad links:
Buy an ad link


Ad links: Clear History

Buy an ad link

Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN

© Copyright Technology Research News, LLC 2000-2006. All rights reserved.