Search
scheme treads lightly
By
Kimberly Patch,
Technology Research News
In social circles, people usually know
who would know who would know. In other words, we are good at knowing
where to start asking around for information.
Researchers from Stanford University and Hewlett-Packard's Sand Hill labs
are applying the concept of knowing whom to ask first in order to design
more efficient search strategies for the Internet.
The work is based on recent research that shows that both social circles
and the Internet are scale free, or power-law networks,
which harbor a few nodes or people with many, many connections, and lots
of nodes that have only one or two links.
The Stanford and Sand Hill researchers wrote a search algorithm that takes
advantage of this structure in order to get answers to queries much more
efficiently. "In a way, the strategy is already used. People naturally
ask people who they consider well-connected to put them in touch with
someone who knows about a particular topic," said Lada Adamic, one of
the Stanford researchers and a consultant at Xerox Palo Alto Research
Center (PARC).
The algorithm works
in one of two ways:
In a network where each node keeps track of how well connected its neighbors
and its neighbors' neighbors are, a node sends a query to the most well-connected
node it knows of. If this node cannot answer the query, it sends it on
to the most well-connected node it knows of. The process repeats until
a node that can answer the query is found.
In a network where the nodes are unaware of how well connected their neighbors
are, the query is passed on randomly. Though less efficient, the method
scales reasonably well simply because large nodes have a lot of paths
leading to them and so tend to attract queries.
The algorithm is useful in finding information in peer-to-peer networks
distributed over the Internet, like Freenet and Gnutella. Nodes in peer-to-peer
networks communicate with each other without using a central server to
direct the traffic. Napster, for example, uses a central server.
The decentralized nature of peer-to-peer networks makes them less vulnerable
to attack, because in order to shut down the network all of its member
nodes must be shut down. The lack of a central server, however, makes
them considerably less efficient in fielding requests for information.
This is because, lacking central direction, each node passes a request
to its neighbors, which then pass the request to their neighbors. Eventually
the request makes its way to the server harboring the information.
The trouble is, lots of nodes are involved in every request for information.
"In the current Gnutella search algorithm, every node passes the query
on to every one of its neighbors, which means that the number of nodes
possessing the query grows exponentially with the number of steps," said
Adamic. It takes an average of 4.3 steps to find 50 percent of the nodes
in an 800 node peer-to-peer network, but in order to do this, 350 of the
nodes will have handled the query, she said. "You're basically broadcasting
your query to the whole network and this is a problem because as the network
grows, the query traffic grows in proportion."
The researchers' algorithm, in contrast, takes almost twice as long, averaging
eight steps to find 50 percent of the nodes in the same network, but because
only one node handles the query at each step it is 40 times more efficient
in terms of using network bandwidth, said Adamic.
The bandwidth efficiency of the researchers' algorithm addresses the key
problem with these types of peer-to-peer networks: growing beyond 1,000
nodes.
This is currently a limit because the exponential bandwidth growth needed
for broadcast querying causes the network to break up when the bandwidth
requirements out run the network's slowest connections. "The network gets
fragmented because the slow, or low bandwidth nodes such as the 56K modems
can only hold something like 20 queries a second," said Adamic.
The researchers' algorithm scales well in power-law networks, requiring
12 steps to find 50 percent of the nodes in a 10,000-node network where
nodes were aware of their neighbors' connections, said Adamic. When the
algorithm did not target large hubs, but randomly chose a neighbor, the
algorithm took an average of 40 steps for 1,000 nodes and 100 steps for
10,000 nodes, she said.
In contrast, in a non-power-law network whose nodes have roughly the same
number of connections, the number of steps the algorithm takes scales
exponentially, increasing from 100 to 1,000 as the number of nodes increases
from 1,000 to 10,000.
The research is interesting, and potentially useful in the wide context
of search engines on the Internet, said Alessandro Vespignani, a research
scientist at the Abdus Salam International Centre for Theoretical Physics
in Italy. "The paper shows for the first time that searching algorithms
should be changed in order to take into account the complex connectivity
nature of [power-law] networks,” he said.
The researchers are currently working on using the algorithm in the field.
"I think the first application of the search engine will be to help Gnuttella
overcome its bandwidth barrier. Peer-to-peer networks are rapidly gaining
in popularity, and the search methods are still being developed," Adamic
said.
Adamic's research colleagues were Amit R. Puniyani of Stanford University,
and Rajan M. Lukose and Bernardo A. Huberman of Hewlett-Packard's Sand
Hill Labs. The research was funded by Stanford University and Xerox PARC.
Timeline: Now
Funding: Corporate, University
TRN Categories: Internet; Information Retrieval
Story Type: News
Related Elements: Technical paper, "Search in Power-Law
Networks," posted on CORR: http://xxx.lanl.gov/abs/cs.NI/0103016.
Advertisements:
|
June
6, 2001
Page
One
Search scheme treads
lightly
Bug-eye lenses
set up desktop chipmaking
DNA parts make
versatile nanotubes
Watermarks hide in plain
text
Material bends sound waves
News:
Research News Roundup
Research Watch blog
Features:
View from the High Ground Q&A
How It Works
RSS Feeds:
News | Blog
| Books
Ad links:
Buy an ad link
Advertisements:
|
|
|
|