Cornell's Jon Kleinberg

December 5, 2005
Technology Research News Editor Eric Smalley carried out an email conversation with Jon Kleinberg, a Professor of Computer Science at Cornell University and a member of the Visiting Faculty Program at the IBM Almaden Research Center.

Kleinberg's research is about the interface of networks and information, and spans computer network analysis and routing, data mining, comparative genomics, and protein structure. His work on network analysis helped form the foundation for the current generation of Internet search engines, and his work on "small world" networks has advanced sociology as well as computer network design.

He has expanded on the concept of six degrees of separation with a method for assessing how readily particular networks allow people to make distant connections using only local information. He has proved that while certain architectures can be computationally efficient, no algorithm can find the shortest path in networks with short, random connections. And he has developed an algorithm for analyzing Web site interactions that distinguishes sites that contain definitive information from sites with many links. The algorithm provides a way of identifying communities of interest on the Web, which promises to improve Web searching.

Kleinberg is the recipient of the National Science Foundation Career Award, the Office of Naval Research Young Investigator Award, research fellowships from the MacArthur, Packard, and Sloan Foundations, teaching awards from the Cornell Engineering College and Computer Science Department, and the 2001 National Academy of Sciences Award for Initiatives in Research, given annually to one U.S. scientist under the age of 35.

Kleinberg received a bachelor's degree from Cornell University in 1993 and a doctorate from the Massachusetts Institute of Technology in 1996.


TRN: What got you interested in science and technology?

Kleinberg: The earliest and most fundamental influence was certainly my parents -- both of them scientists -- who, by example, conveyed the excitement of intellectual pursuits to my brother and me.

Like many people my age, my first chance to play with computer technology came with the Apple II in the early 1980s. But it was only in college, at Cornell, that I discovered computer science as a subject, and the way in which it combines scientific inquiry with the fun of playing with new technology.

And through high school, college, and graduate school, I've had a series of wonderful mentors. The extent to which your interests over the long term can be influenced by a single person, or a sequence of key people, is an amazing thing.

Of course, today people's first contact with computers comes much earlier. A colleague of mine told me that his son knew how to change the font size on Microsoft Word before he knew how to read. (He enjoyed typing really big letters.) And computer science has moved into earlier and earlier positions in the curriculum.

I hope that over time, it will really assume a full-fledged place in people's minds alongside all the more "classical" sciences that we learn from an early age -- it certainly touches on questions that are every bit as deep and fundamental.

TRN: What are the important or significant trends you see in science and technology research in general?

Kleinberg: We tend to think in terms of the "traditional" branches of science and engineering, but many of the most exciting areas in science and technology incorporate, in a really essential way, two or more of these fields.

To take some simple examples, consider the interface of biology with materials science (the latter being a hybrid field already), or computer science with psychology and the social sciences. This is an issue that many large organizations -- universities, companies, government funding agencies -- are wrestling with.

To the extent it's possible, it's important to avoid worrying too much about how to classify current research activities into standard disciplinary categories, and rather to draw on whatever expertise is needed in the pursuit of important research problems.

TRN: Tell me about the trends in network theory and analysis.

Kleinberg: Network theory is a subject that blends ideas from many areas. It's grown rapidly as a field over the past decade, driven by the realization that many phenomena in technology, in the physical and natural sciences, and in the social sciences can be expressed in the common framework of networks.

Computer scientists have used network analysis to help people search and navigate the Web; social scientists have had to deal with large-scale data on the networks of interactions that exist within companies, financial markets, and professional communities; and biologists have found that the networks of interactions within a cell's metabolism provide insight into basic biological processes.

The fundamental theme here is to approach networks phenomenologically -- rather than treating networks as things to be designed and built, we view them as organic entities that arise naturally in the physical world, in the virtual world, and in society.

Some of the initial excitement here was based on the fact that networks in very different domains seemed to exhibit similar patterns. In the past few years, people have been focusing on how to make more fine-grained distinctions among different types of networks, how to meaningfully talk about the "function" of a network, and how to determine whether a particular mathematical model is really effective at explaining observed network data.

There's also current interest in making the "time axis" more explicit in network studies, understanding the basic properties by which networks grow and evolve.

TRN: Explaining observed network data and understanding the basic properties by which networks grow and evolve -- why are these important?

Kleinberg: There are a number of fundamental problems that rich network datasets, captured across time, can help us to understand.

For example, if we look at companies that succeeded in the marketplace, what are the crucial features that distinguish their trajectories from those of comparable companies that failed? We could ask the same question about new ideas, innovations, or technologies -- by watching the full network, we can see how some flourish and spread through a large population, while others fail to take hold.

In the process, we can try to understand the fundamental mechanisms at work in this process. Across different spheres of human activity, is the rise of a person or idea to fame or prominence the result of a steady, gradual increase in visibility, or is it more the result of a few, discrete jumps -- a few events that bring them to the attention of a large audience? Does understanding this distinction teach us something about the kinds of ideas that attract widespread attention?

There are many other kinds of questions we can study by looking at information networks over time. We could look, for example, at public reaction to a government or large company, and ask how it changes over time -- again, does it tend to change as a gradual process, or abruptly, in response to specific events? An information network like the Web contains information about many aspects of this process -- the opinions of millions of people, together with ways in which a large organization engineers a "public persona" in these on-line environments.

These types of effects are clearly at work in everyday life, but having the ability to study them at a detailed level is a novel and exciting prospect.

TRN: Your Web site says that your "research is concerned with algorithms that exploit the combinatorial structure of networks and information." Please explain in lay terms what combinatorial means, how it applies to networks and information, and the types of algorithms you are working on.

Kleinberg: A fascinating theme in computer science, and really in the sciences more generally, is the dichotomy between discrete phenomena and continuous phenomena; and I use the word "combinatorial" essentially as a synonym for "discrete."

It's tricky to specify the precise boundary between discrete and continuous, but roughly I think of something as discrete if it consists of a number of indivisible units, and as continuous if it can be divided more or less as finely as you want.

This is best described by example. So for instance, we buy eggs in discrete quantities but gasoline in continuous quantities -- you can buy 6.72 gallons of gas (and even then it's a rounded-off figure), but it's hard to find a store that will sell you 6.72 eggs. Information in a computer is a fundamental example of this tension between discrete and continuous: it's transmitted in the form of continuous voltages and currents, but conceptually represented in discrete units such as bits -- sequences of 0s and 1s.

Of course, the reason this is such a deep issue is that the distinction between discrete and continuous is often more conceptual than real. For example, the 6.72 gallons of gasoline you bought consists, in reality, of a fixed number of discrete units, namely molecules. But that number is astronomically large, and for reasoning about how it behaves physically, it's often much more useful to think about it in continuous terms.

Finally, we get to networks. Networks are fundamentally discrete objects, since they consist of a collection of objects -- the nodes -- connected together by links. This discrete representation of them has been a major focus of study in mathematics, through the field of graph theory.

Highly efficient algorithms to analyze networks have also been developed, and in my own work I've made use of algorithms that find short paths, that identify densely connected regions in a network, and that divide a network into coherent clusters. And as networks like the Web become enormous in scale, we see the tension between discrete and continuous coming into focus here as well.

Will we reach a point where it's conceptually easier to think of a giant network as a kind of continuous object, the way we choose to view the vast number of molecules in a container of gasoline as a continuous fluid? Approaches adopted by physicists to the study of networks have followed something akin to this strategy, and it remains to be seen what the most effective models will be as we move forward in this area.

TRN: "algorithms that find short paths, that identify densely connected regions in a network, and that divide a network into coherent clusters." Why are these useful?

Kleinberg: Algorithms to identify densely connected regions in a network have been used, for example, to identify thematically coherent "communities" of Web pages -- the set of all pages on a particular focused topic tend to be more densely interconnected than an arbitrary collection of pages would be, and so one can often discover topics simply by looking for these dense patterns of linkage.

The idea that structure alone can provide clues about topics is striking, though it also accords with common sense: if you walk into a large room where a party is going on, and you see a small cluster of people talking animatedly, you know there's a potentially interesting topic being discussed even if you don't know what it's about.

Algorithms to divide a network into coherent clusters are closely related to this; in addition to identifying topics, such algorithms have the potential to identify dichotomies or divisions within these topics. An early finding of Web link analysis was that, to some extent, one could identify opposing viewpoints -- like sets of pro-life and pro-choice pages related to abortion policies -- simply from the fact that pages on the same side of an issue linked to one another more extensively than they linked to pages on the other side of the issue.

Algorithms to find short paths are related to the small-world problem -- even if we know we are connected by "six degrees of separation", how, with limited information, can we find a short path connecting us? In addition to its role as a basic question in the structure of social networks, the problem of short paths arises naturally in the design of decentralized search methods, such as one sees in distributed peer-to-peer systems, where information must be found without a central index.

TRN: You have studied the structure of the Web. What are the technological, social and economic implications of what you've learned?

Kleinberg: The Web is an enormous repository of information that exhibits rich structure through its hyperlinks.

One of the research issues that interests me the most is this type of interplay between structure and information: To really understand a piece of information on the Web, we need to understand how it is embedded in the overall network of relationships defined by hyperlinks --- how other people refer to it, and how it connects to other information. The current generation of Web search engines draws heavily on this principle, via link analysis: to assess the quality and relevance of information, one needs to think about its position in the network.

This is a kind of reasoning we regularly use in non-Web contexts as well: for example, to understand the roles of people or organizations, we think about how they are connected to other parts of society.

Such analogies between social and technological networks run quite deeply; indeed, such networks have become increasingly intertwined with one another over the past few years. It becomes very difficult to think about e-mail, instant messaging, blogging, or almost any on-line medium without thinking both about the technological systems that support them and the social networks --- people talking to people --- that they weave together.

Earlier, I mentioned that understanding time evolution is an important current theme in network research, and in fact it ties in closely with the kinds of social and economic issues here. If we look at long-term social trends as they are reflected in the Web, we see a complicated mixture of continuous and discrete effects: gradual changes over time, punctuated by sudden, discrete, transformative events. Consider, for example, the role of the U.S. in the world generally, as it evolved throughout the 1990's, and then in the days following September 11.

These kinds of effects are something where on-line information can serve as a kind of historical record: by looking at many snapshots of the Web over a period of years, we may be able to develop better models of how governmental policy, corporate decisions, public opinion, and other large-scale forms of behavior evolve both in continuous patterns and in reaction to these rare, discrete events.

Web information is particularly appealing for this type of activity because of its enormous breadth --- where one might typically study, for example, the decisions made by successful companies (because they're the ones still around to be studied), past snapshots of the Web contain the records of companies that succeeded as well as those that failed, providing the opportunity to understand how these different trajectories can be distinguished.

TRN: You have just been named a MacArthur Fellow, the so-called genius award. What does this mean for you and your work?

Kleinberg: The MacArthur Foundation Fellowship came as a big surprise to me, and when I consider the group of people who have received this in the past, I'm very honored to have been chosen.

The award itself provides a truly rare level of freedom and flexibility in pursuing different research directions, and I think this can be particularly valuable for the kind of work we're talking about here, which spans a number of different areas.

TRN: What are the important social questions related to today's cutting-edge technologies?

Kleinberg: Much of the importance of on-line information systems today is independent of the specific technologies that underpin them; rather, it derives from the ways in which they connect people to information, and the ways in which they connect people to other people.

Many of the related social questions stem from this point.

For example, the Internet is a powerful medium for spreading ideas of all kinds, but we know that some ideas spread much more broadly and effectively than others. What are the qualitative causes behind the success or failure of a new technological innovation, political message, or educational initiative? Are there general features that we can learn from when we set out to disseminate some new message?

Over the past 5-10 years, I've regularly been asked whether the Internet revolution that we witnessed through the 1990s was truly as fundamental and transformative as the corresponding technological revolution around the turn of the previous century --- a development that brought us the autombile, the airplane, the telephone, and urban electrification. And while, in the end, it's probably nonsensical to line up these two social transformations and decide which was more profound, there are some things we can still learn from the comparison.

In particular, one thing I think we can do is to reject the comparison as not quite apt: the rise of the Internet is much more usefully viewed as an extension of an earlier, equally profound development --- the intellectual revolution that brought about the rise of universities, of libraries, of writing in the vernacular.

Like that earlier revolution, the Internet and the Web force us to think about the social consequences of a world in which information is more plentiful and travels more widely than ever before, and in which anyone has the potential, through new kinds of media and at very little cost, to become an author with a global audience.

But there are number of fundamental challenges here. We know that on-line discourse can be highly polarized; is it the case that the on-line tools we've created are contributing to a rising level of polarization in civic dialogue more generally? How might we accurately assess this phenomenon, and how might we think about designing new tools that make on-line discourse more productive?

There is also an educational opportunity here. Many of these kinds of questions are difficult to address because they require an understanding of the underlying technology, of the ways in which this technology is used by people, and of the social, economic, and legal systems in which the technology is embedded.

And one could argue that the difficulty in making progress on the problems here arises in large part from a lack of people who are knowledgeable about all sides of the issues --- the technological ones as well as the social and economic ones. At Cornell, we've worked over the past few years on creating a new undergraduate major the focuses on all these areas simultaneously, emphasizing both the technology and the contexts in which it is embedded.

TRN: What insights have come out of this?

Kleinberg: It's been striking to see how pervasive the ideas underlying computing really are -- as we design a curriculum in a way that can involve computer scientists, statisticians, cognitive psychologists, economists, linguists, and many other areas, we see how certain themes recur.

To take just a few examples: formalisms that can represent complicated systems by building them up in a sequence of layers; algorithms as step-by-step procedures for solving problems; the dichotomy between centralized and decentralized sytems; and the emerging role of massive datasets together with techniques to extract structure from them.

TRN: In terms of technology and anything affected by technology, what will be different about our world in five years? In 10? In 50? What will have surprised us in 10 years, in 50?

Kleinberg: I'm not sure my track record in making predictions should inspire much confidence here, but let me mention one theme that will almost surely grow in significance: the use of information devices as kinds of intellectual prosthetics.

I don't intend any sci-fi connotations here --- I'm referring simply to the complex personal information streams we all manage, in which we're exposed to several megabytes a day of personal e-mail, instant messages, Web pages, search engine query results, on-line news feeds, mailing list chatter, and many other type of information.

The information we process every day will become increasingly varied, complicated, and voluminous; and since it's already at the limit of our cognitive abilities, something has to give: we will either develop tools that can manage this information for us more effectively, or we will develop new styles of dealing with it.

As a consequence of these developments, we are accumulating incredibly detailed datasets of human activity on-line -- the way people author content on the Web, the way they browse and read information, and the way they communicate with one another. This is inevitable; personal information streams create enormous archives, and significant aspects of people's lives are encoded in these archives.

We face two challenges here: how to make sense of data at this resolution, and how to ensure people's privacy in a world where this kind of data exists. But if we can overcome these challenges, we'll have much better insight into how to make on-line tools better conform to the ways in which people read, write, search, and communicate --- and more generally how to design on-line tools for a world in which people are increasingly dependent on them.

TRN: What's the most important piece of advice you can give to a child who shows interest in science and technology?

Kleinberg: A lot of the fun in doing science is in making up your own questions, and it's easy to make up questions that no one knows the answer to. There's raw material for posing scientific questions all around, and the important thing is to follow what really interests you.

TRN: What's the most important piece of advice you can give to a college student who shows interest in science and technology?

Kleinberg: Much has been written about the era of "Big Science" -- the way in which efforts like the Human Genome Project can involve thousands of researchers and require many millions of dollars of funding.

It's true that there are an increasing number of scientific projects on this scale. But you shouldn't get the idea that this is all that's left of science in the 21st century -- that pursuing a scientific career will inevitably mean taking part in a thousand-person operation.

In most areas of science and technology, the origins of new breakthroughs can still be found in the work of a small number of people - or even a single person -- working at their own pace on their own questions, pursuing things that interest them. I certainly see this confirmed all the time in computer science, where it is so easy to get to the forefront of current research; but it is equally true of many disciplines.

And I know from my own experience that a lot of the most interesting work I've done came about when I decided to go back and think about a question from first principles, just to play around with it and see where it led.

TRN: What are your thoughts on the state of the general public's conventional wisdom on science and technology?

Kleinberg: When people become interested in a particular topic in science or technology, they often engage with it quite deeply. I think much of the challenge is to convey new developments in the sciences in such a way that people see a connection to their own lives.

The extraordinary adoption of Internet technology over the past ten years really drives home the more general point that people are remarkably good at adapting to the tools you give them. Search engines like Google, for example, are very powerful pieces of technology, but like all powerful tools, they become more useful as you understand better how to use them.

And it's striking how well the public has adapted to them; even if you think about people who otherwise have no experience with computers, they often have very accurate mental models of the kinds of queries on which you can expect Google to do well, and the kinds of queries on which it is likely to do poorly.

One can say the same thing about many other kinds of on-line tools as well; the power of these tools has been amplified by the ways in which even novice users have adapted to them.

TRN: How does this behavior bear on tool design?

Kleinberg: There's a fundamental circularity in the way on-line tools evolve: as people embrace certain features of a particular tool, the designers of that tool's future incarnations tend to emphasize and develop it. The way in which this meandering process interacts with good design principles is an interesting problem.

TRN: What could be done to improve the pursuit of science and technology research in terms of business trends, politics, and/or social trends?

Kleinberg: At the risk of repeating a topic that has been in the news a lot: It is critical to energize public support for the funding of scientific research. The value of investment in scientific research has had enormous payoffs over the long term, and science and technology are confronting challenges as large as any in history.

Part of the difficulty in promoting the idea of research funding is the nature of scientific research itself: even to those of us involved in doing it, it can seem so haphazard and inefficient. Most days when I'm working on a difficult problem, I wonder whether I'm getting anywhere at all.

Given this, I think it's crucial to help the public learn more about the process of doing science, and to emphasize the process and payoffs of research over a long time span. As one of my colleagues put it very nicely, research progress across a scientific community is completely unpredictable in the short term, but statistically a sure thing in the long term.

And it's worth remembering that repeatedly, over the years, fundamental breakthroughs have been made, billion-dollar industries have been created, and daily life has been changed by unplanned and unexpected outcomes of basic scientific investigations.

TRN: What are some of your favorite examples?

Kleinberg: Many of my favorite examples are among the most well-known: the Internet itself; electronic mail as an application that came to play such a central role in modern communication; the graphical Web browser; so many developments in graphics that moved from the domain of research into the film and computer game industries; Gerry Salton's vector space model for information retrieval that came to form the basis for modern search engines.

There is an interesting discussion of this general issue, emphasizing the path leading from fundamental research in computing to commercial developments, in the book Evolving the High Performance Computing and Communications Initiative to Support the Nation's Information Infrastructure (1995), the result of a National Research Council study chaired by Frederick Brooks and Ivan Sutherland.

TRN: What books that have some connection to science or technology have impressed you in some way, and why? What other readings do you recommend that would bring about more interest and/or a better understanding of science and technology?

Kleinberg: Close to my own area, we've seen a number of nice books in the past few years aimed at the general public. Six Degrees, by Duncan Watts, and The Search, by John Battelle, are two good examples; the first covers network theory and social networks, while the second covers the growth of the search industry and what it means for everyday life.

Both are engagingly written, and both succeed marvelously at conveying some of what I was discussing earlier -- the mix of excitement and frustration that comes from doing research in science and technology.

Duncan Watts's book is reminiscent of a very thought-provoking earlier book that I've been recommending to students for a number of years: Thomas Schelling's Micromotives and Macrobehavior. Through a sequence of compelling examples, he illustrates how clever insights and quantitative models can expose the ways in which large-scale social processes are often influenced by very localized mechanisms.

At a more general level, I'd also mention some "classic" collections of biographical essays about famous scientists of the past; for example, Paul de Kruif's Microbe Hunters and E.T. Bell's Men of Mathematics have remained inspirational reading long after they were first written.

TRN: Is there a particular image (or images) related to science or technology that you find particularly compelling or instructive? Why do you like it; why do you find it compelling or instructive?

Kleinberg: With the recent interest in complex networks has come a profusion of fascinating network images -- depictions of the networks that arise in various domains, laid out to show the nodes and their connections via links.

Some of these can be really compelling, like the first time you see a familiar city from above, in an airplane. At some level, you had sort of known that this is what it would look like; but the detail, the way it all fits together visually, can be striking.

Mark Newman has a nice collection of these images on his Web site at www-personal.umich.edu/~mejn/networks/. These range from illustrations of social networks, computer networks, and food webs, to the compelling and controversial "network art" of Mark Lombardi, which illustrates the interconnections among people and organizations in famous political scandals of the 1980s and 1990s.

TRN: What are your interests outside of work, and how do they inform how you understand and think about of science and technology?

Kleinberg: I've always been fascinated by writing, and have enjoyed trying to do it since I was young.

As for how it connects to my interests in science and technology -- for starters, there's the basic point that anyone who works in these areas ultimately ends up doing a lot of writing, though not always the kind of writing I imagined when I was younger. But more fundamentally, I think everyone who engages in research as a profession thinks of it, implicitly, in terms of some underlying personal metaphor.

There's no "best" metaphor for thinking about the act of doing research; it really varies from one person to another. Some people clearly imagine their research as a process of building intricate devices and gadgets; others think of it as a series of travels to distant places; but I've always imagined doing research as a kind of story-telling activity.

The more compelling the research, the more interesting a story I have to tell -- and getting to make up the story and then tell it the way I want has always been much of the appeal.

TRN: Is there anything else you would like to say?

Kleinberg: Thanks very much for giving me the opportunity to do this. Helping the public understand the process and benefits of scientific research is a truly important undertaking, and many of us really appreciate all the efforts your publication makes in this direction.

Last            Next


Discuss this interview or anything else to do with science and technology on the TRN Forum.

TRN needs your help. Please click here for details.


Advertisements:




News:

Research News Roundup
Research Watch blog

Features:
View from the High Ground Q&A
How It Works

New: TRN's Jobs Center

RSS Feeds:
News  | Blog  | Books 



Ad links:
Information on RFID, VoIP, VPN, ZigBee and more

Buy an ad link

Advertisements:







Ad links: Clear History

Buy an ad link

 
Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN

© Copyright Technology Research News, LLC 2000-2006. All rights reserved.