Text
software spots intruders
By
Kimberly Patch,
Technology Research News
The computer anti-virus programs in common
use today use signature detection schemes that can only protect a machine
from viruses that have been previously identified and entered into the
programs' virus databases.
Anomaly detection systems, however, sense when normal patterns of communications
change in order to stop new viruses -- or any other system intruders like
worms or unauthorized users -- in their tracks.
The trouble is, existing anomaly detection schemes all generate high error
rates -- they cry wolf so often that they are impractical. In order to
identify the real intrusions, system managers must spend time checking
out every possibility.
Researchers from the University of California at Davis have taken an unusual
tack in anomaly detection by adapting text classification techniques to
intrusion detection. Their initial results suggest that the technique
could produce an anomaly detection system with a reasonable error rate.
The idea to apply text classification to intrusion detection began with
a conversation about categorizing Web pages into clusters that share a
given property, said V. Rao Vemuri, a professor of applied science and
computer science at the University of California at Davis, and a scientist
at Lawrence Livermore National Laboratories.
Instead of categorizing Web pages, however, the researchers used the classification
system to categorize computer users into just two groups -- authorized
users and intruders. "The problem is to decide what 'text'" to use for
the problem, Vemuri said. We wanted some objective way of characterizing
a user that the user... cannot consciously influence" in order to prevent
an intruder from fooling the system, he said.
They turned to system calls to characterize a user. System calls are the
internal requests various pieces of software make to each other in the
course of carrying out a user's instructions. "The system calls are generated
by the computer, and the user cannot really influence the sequence in
which they are generated," said Vemuri. The scheme treats each system
call as a word and each sequence of system calls as a document, and classifies
each document as one generated during normal activity or intrusive activity,
he said.
The nearest-neighbor text categorization technique the researchers used
categorizes Web pages based on how they are linked. The nearest neighbors
in terms of links also tend to be closer in terms of content.
The researchers' detection scheme characterizes an authorized user by
building a profile of activities. "For example, in the course of my normal
life, I use email, browse Web pages, use Word, PowerPoint [and] printers,"
said Vemuri. "Let's suppose that I rarely, if ever, use Java or C++. I
rarely use root privileges. If someone logging onto my machine uses these,
that departure from normal usage should signal... abnormal, [possibly]
intrusive activity," he said.
The problem turned out to be easier than categorizing Web pages, said
Vemuri. "Usually we use many categories. In our example, we have only
two categories -- authorized or intruder, and in the worst-case three"
if the system has to resort to classifying activity as unknown.
In addition, Web pages can be very long and the size of the English vocabulary
is around 50,000 words, which makes categorizing Web pages a computer-intensive
task. "In our case, the vocabulary -- distinct system calls -- rarely
exceeds 100, and the size of the 'pages', [or groups of calls], is also
very small," he said.
Short sequences of system calls have been used before to characterize
a person's normal behavior, but this requires building a database of normal
sequences of system calls for each program a person uses. The text categorization
technique, however, calculates the similarities between program activities,
which involves fewer calculations.
This allows the system to detect an intruder as the intruder is affecting
the system, said Vemuri. "The computational burden in our case is much
smaller, to the extent we started to dream about the possibility of detecting
an intruder in real-time," like the way contestants called out titles
as songs played on the TV show "Name That Tune", he said.
The researchers' current implementation is almost real-time, said Vemuri.
"We have to wait until [a] process, terminates, or halts" before completing
the classification, he said. Intrusive attacks, however, are usually conducted
within one or more sessions, and every session contains several processes,
said Vemuri. Because the classifier method monitors the execution of each
process, it's likely that an attack can be detected while it is happening,
he said. The researchers are also working on allowing the system to make
a classification before a process terminates, he added.
The researchers tested their scheme with 24 attacks within a two-week
period. The method detected 22 of 24 attacks, and had a relatively low
false-positive rate of 31 false alarms out of 5,285 events, or 0.59 percent,
according to Vemuri.
The method shows promise, said Bennet Yee, an assistant professor of computer
science and engineering at the University of California at San Diego.
"The novelty is noticing that text classification techniques can be adapted
to intrusion detection, and doing the experiments that validate it," he
said.
If it proves practical and is widely deployed, the technique could help
prevent malicious software like the Internet worms Code Red and Klutz,
Yee said. "It should be able to recognize new attacks as anomalous behavior
and raise alarms earlier [than] signature detection schemes where a database
of bad behavior must be compiled first," he said.
There is still work to do to determine if the method can be improved to
a low enough false positive rate, however, said Yee. A practical anomaly
detection system must have a very low false positive rate in order to
be commercially useful because if system administrators spend too much
time chasing down false alarms, they "will not want to use the system
and will turn the intrusion detector off," he said.
Even a false positive rate of 0.44 percent could mean 23 false alarms
per day if there are 5,285 events per day, Yee said. "Most people will
not want to handle a false alarm per hour per machine," he said. The researchers'
method is an improvement over earlier anomaly detector designs, but "further
improvements are still necessary for broader use," he added.
It is theoretically possible to use the method today, said Vemuri. The
researchers are working on proving that the method can be used without
raising too many false alarms, he said.
To cut down on false alarms, the researchers are looking to make a redundant
system "where we use different methods on different data sets, combine
the results of both those methods, or use a best of three voting system,"
he said. One method could use system call data, for instance, while another
could analyze instructions used, he said.
The researchers hope to have their anomaly detection system worked out
and supported with performance data within a few years, said Vemuri.
Vemuri's research colleague was Yihua Liao. They published the research
in the Proceedings of the 11th Usenix Security Symposium, which was held
in San Francisco August 5 through 9, 2002. The research was funded by
the Air Force Office of Scientific Research.
Timeline: 2-3 years
Funding: Government
TRN Categories: Cryptography and Security; Computer Science;
Internet; Networking
Story Type: News
Related Elements: Technical paper, "Using Text Categorization
Techniques for Intrusion Detection," Proceedings of the 11th Usenix Security
Symposium, San Francisco August 5-9, 2002.
Advertisements:
|
October
30/November 6, 2002
Page
One
Nanoscale LED debuts
Data transfer demo
sets speed mark
Pulling nanotubes makes
thread
Text software spots
intruders
Microwave drill melts
concrete
News:
Research News Roundup
Research Watch blog
Features:
View from the High Ground Q&A
How It Works
RSS Feeds:
News | Blog
| Books
Ad links:
Buy an ad link
Advertisements:
|
|
|
|