Search scheme treads lightly

By Kimberly Patch, Technology Research News

In social circles, people usually know who would know who would know. In other words, we are good at knowing where to start asking around for information.

Researchers from Stanford University and Hewlett-Packard's Sand Hill labs are applying the concept of knowing whom to ask first in order to design more efficient search strategies for the Internet.

The work is based on recent research that shows that both social circles and the Internet are scale free, or power-law networks, which harbor a few nodes or people with many, many connections, and lots of nodes that have only one or two links.

The Stanford and Sand Hill researchers wrote a search algorithm that takes advantage of this structure in order to get answers to queries much more efficiently. "In a way, the strategy is already used. People naturally ask people who they consider well-connected to put them in touch with someone who knows about a particular topic," said Lada Adamic, one of the Stanford researchers and a consultant at Xerox Palo Alto Research Center (PARC).

The algorithm works in one of two ways:

In a network where each node keeps track of how well connected its neighbors and its neighbors' neighbors are, a node sends a query to the most well-connected node it knows of. If this node cannot answer the query, it sends it on to the most well-connected node it knows of. The process repeats until a node that can answer the query is found.

In a network where the nodes are unaware of how well connected their neighbors are, the query is passed on randomly. Though less efficient, the method scales reasonably well simply because large nodes have a lot of paths leading to them and so tend to attract queries.

The algorithm is useful in finding information in peer-to-peer networks distributed over the Internet, like Freenet and Gnutella. Nodes in peer-to-peer networks communicate with each other without using a central server to direct the traffic. Napster, for example, uses a central server.

The decentralized nature of peer-to-peer networks makes them less vulnerable to attack, because in order to shut down the network all of its member nodes must be shut down. The lack of a central server, however, makes them considerably less efficient in fielding requests for information.

This is because, lacking central direction, each node passes a request to its neighbors, which then pass the request to their neighbors. Eventually the request makes its way to the server harboring the information.

The trouble is, lots of nodes are involved in every request for information. "In the current Gnutella search algorithm, every node passes the query on to every one of its neighbors, which means that the number of nodes possessing the query grows exponentially with the number of steps," said Adamic. It takes an average of 4.3 steps to find 50 percent of the nodes in an 800 node peer-to-peer network, but in order to do this, 350 of the nodes will have handled the query, she said. "You're basically broadcasting your query to the whole network and this is a problem because as the network grows, the query traffic grows in proportion."

The researchers' algorithm, in contrast, takes almost twice as long, averaging eight steps to find 50 percent of the nodes in the same network, but because only one node handles the query at each step it is 40 times more efficient in terms of using network bandwidth, said Adamic.

The bandwidth efficiency of the researchers' algorithm addresses the key problem with these types of peer-to-peer networks: growing beyond 1,000 nodes.

This is currently a limit because the exponential bandwidth growth needed for broadcast querying causes the network to break up when the bandwidth requirements out run the network's slowest connections. "The network gets fragmented because the slow, or low bandwidth nodes such as the 56K modems can only hold something like 20 queries a second," said Adamic.

The researchers' algorithm scales well in power-law networks, requiring 12 steps to find 50 percent of the nodes in a 10,000-node network where nodes were aware of their neighbors' connections, said Adamic. When the algorithm did not target large hubs, but randomly chose a neighbor, the algorithm took an average of 40 steps for 1,000 nodes and 100 steps for 10,000 nodes, she said.

In contrast, in a non-power-law network whose nodes have roughly the same number of connections, the number of steps the algorithm takes scales exponentially, increasing from 100 to 1,000 as the number of nodes increases from 1,000 to 10,000.

The research is interesting, and potentially useful in the wide context of search engines on the Internet, said Alessandro Vespignani, a research scientist at the Abdus Salam International Centre for Theoretical Physics in Italy. "The paper shows for the first time that searching algorithms should be changed in order to take into account the complex connectivity nature of [power-law] networks, he said.

The researchers are currently working on using the algorithm in the field. "I think the first application of the search engine will be to help Gnuttella overcome its bandwidth barrier. Peer-to-peer networks are rapidly gaining in popularity, and the search methods are still being developed," Adamic said.

Adamic's research colleagues were Amit R. Puniyani of Stanford University, and Rajan M. Lukose and Bernardo A. Huberman of Hewlett-Packard's Sand Hill Labs. The research was funded by Stanford University and Xerox PARC.

Timeline:   Now
Funding:   Corporate, University
TRN Categories:   Internet; Information Retrieval
Story Type:   News
Related Elements:  Technical paper, "Search in Power-Law Networks," posted on CORR: http://xxx.lanl.gov/abs/cs.NI/0103016.




Advertisements:



June 6, 2001

Page One

Search scheme treads lightly

Bug-eye lenses set up desktop chipmaking

DNA parts make versatile nanotubes

Watermarks hide in plain text

Material bends sound waves


News:

Research News Roundup
Research Watch blog

Features:
View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 



Ad links:
Buy an ad link

Advertisements:







Ad links: Clear History

Buy an ad link

 
Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN


© Copyright Technology Research News, LLC 2000-2006. All rights reserved.