Queries guide Web crawlers
By
Kimberly Patch,
Technology Research News
Only a small percentage of the Internet's
vast collection of information is indexed by search engines, which makes
it important to improve the way search engines find what they do index.
Researchers from Contraco Consulting and Software Ltd., T-Online
International and Siegen University in Germany have written an algorithm
that improves Internet search results by factoring in what people are
looking for. The researchers took their cue from the audience analysis
that drives format and programming changes in television.
The algorithm, dubbed Vox Populi, picks up trends by analyzing
patterns in people's Web searching behavior, then directs search engine
crawlers to more thoroughly index relevant sites, according to Andreas
Schaale, a partner at Contraco Consulting and Software. For instance,
"if we see that the amount of queries about soccer is growing before entering
the World Cup, this algorithm would give more resources for... soccer
sites," he said.
The algorithm analyzes the queries people use to ask for information
to find those that represent what the average user is searching for, sends
these to the Web crawler component of an existing search system with instructions
to give the relevant domains more Web crawler resources. The algorithm
determines how much more attention each domain should gain. Web crawlers
travel around the Web making the raw indexes of Web pages that search
engines use.
Internet searching has gone through several changes in the past
decade. The first search engines, like AltaVista, ranked purely on relevancy.
Today the major search engines use static rank algorithms, which also
consider domain popularity. Google introduced this method in 1997.
Web crawlers have evolved as well. Focused crawlers index pages
related to specific topics, and adaptive crawlers reorder their lists
of uncrawled pages based on the relevancy of the pages they have crawled.
Vox Populi also takes into account the subjects the average user
is searching for. The algorithm "answers the question 'What are most of
the people searching for?'" Said Schaale. Vox Populi does not replace
the existing ranking algorithms, which retrieve their results from an
index, he said.
The need for directing crawlers based on feedback from queries
is driven by economics; data storage and handling is a growing cost, said
Shaale. "A shop owner orders his products [depending on] what his customers
ask for," said Schaale. "Vox Populi does basically the same," he said.
This type of ranking is only necessary because search engines are not
nearly powerful enough to crawl all Internet content in real-time, he
said. The Google crawler, for instance, does its main crawl to update
its index of the Web about once a month.
The researchers' scheme also includes methods to suppress spam,
or unwanted content. Spam suppression is especially important in this
method because in "most wanted" topic areas like free downloads, adult
content, and shopping, the amount of spam is clearly above-average," said
Schaale.
The main challenge to making the method work is not related to
the algorithm, but the filtering, Schaale added. "The spammers and the
search engine optimizers... adapt fast to new methods of filtering. This
is a challenge for each search engine," he said.
The basic idea of improving searching by incorporating user context,
including queries, has a lot of potential and is an active research area,
said Filippo Menczer, an associate professor of informatics and computer
science at Indiana University. The researchers' idea of improving a search
engine by modifying its crawling and ranking algorithms to capture the
preferences inferred from user queries is interesting, but its mathematical
framework is incomplete, he said.
The researchers' algorithm can be used in combination with the
ranking methods used by search engines, according to Schaale. It could
be used in vertical information systems that search by subject and personalized
searches that take into account a user's topics of interest, he said.
The method could be ready within a year, said Schaale.
Schaale's research colleagues were Carsten Wulf-Mathies from T-Online
International AG in Germany and Sönke Lieberam-Schmidt from Siegen University
in Germany. The research was funded by Contraco Consulting and Software.
Timeline: > 1 year
Funding: Corporate
TRN Categories: Internet; Databases and Information Retrieval
Story Type: News
Related Elements: Technical paper, "A New Approach to Relevancy
in Internet Searching - the "Vox Populi Algorithm", posted in the Computing
Research Repository (CoRR) at arxiv.org/abs/cs.DS/0308039
Advertisements:
|
October 22/29, 2003
Page
One
Body network gains speed
Queries guide Web crawlers
Nanowires make flexible
circuits
DNA forms nano waffles
Briefs:
Fiber handles
powerful pulses
Process prints
nanoparticles
Single electrons
perform logic
Embedded rotors mix
fluids
Nanowires boost
plastic circuits
Chip mixes droplets
faster
News:
Research News Roundup
Research Watch blog
Features:
View from the High Ground Q&A
How It Works
RSS Feeds:
News | Blog
| Books
Ad links:
Buy an ad link
Advertisements:
|
|
|
|