Store globally, access locally

By Ted Smalley Bowen, Technology Research News

Keeping track of information will likely become more difficult in the not-too-distant future of innumerable computing devices of all sizes churning out and navigating unfathomable volumes of information.

Researchers at the University of California at Berkeley are charting a data storage scheme that will cater to a world where computers are part of almost everything -- from smart shoes to smart buildings -- and are nearly always connected to a network.

Instead of keeping data only in a central place, like a single hard drive, the OceanStore scheme will archive data across computers world-wide. This will make it more readily available to people who use several computing devices to access to the same data and to groups of users sharing information.

The scheme constantly updates data, saving it in numerous places so it can be accessed even if some of the computers holding it are lost, and keeps it safe from unauthorized access.

"The original motivating factor was the idea that Moore's Law growth in storage is really almost a liability, because it encourages huge pools of data to be [stored] on little tiny devices," John Kubiatowicz, assistant professor of computer science at Berkeley. "If you have a terabyte of storage in a little pen, or something -- which is not all that far off -- and you run over it with your SUV, you've just destroyed a terabyte of storage."

The key elements of OceanStore will be software that routes, organizes, and encrypts information on the Internet, data searching and recovery tools, and programming interfaces that will allow other programs to access the scheme.

In order to do this, the system will have to support more than 100 trillion files, according to Kubiatowicz.

OceanStore's routing software, or service protocol, will augment Internet Protocol (IP), which controls the flow of traffic on the Internet. The service protocol will route information to data repository computers, or servers, around the Internet.

The data will be encrypted, tagged with global unique identifiers (GUIDs), split in pieces and dispersed among servers in different geographical areas.

This dispersed network of servers will be treated as 'untrusted,' meaning they will be able to read the global unique identifier tags, but not the underlying data. Some servers will also be able to interpret protocols for maintaining data consistency, although they still will not be able to read the underlying data.

This scheme assumes that users might access data from anywhere, and so will allow frequently accessed data to be cached, or temporarily stored in an easily accessible place, in order to speed up access. "You can decide to move data close to you that's important, and you can decide to move data you don't care about far away from you," said Kubiatowicz.

When users search for data, OceanStore will first use a probabilistic algorithm, which looks in the most likely place first. The algorithm will pass the request from one system to neighboring systems in a given vicinity. If the probabilistic search fails, a wider ranging hierarchical search will begin. Using the two types of search algorithms in this way will allow for faster access to cached data.

Instead of anchoring to a centralized control scheme, individual systems will keep track of their available storage and computing resources and communicate that information to the system at large.

The local systems will also track the flow of and interactions among the data traffic. They will automatically tune these interactions, adjusting the placement, number and location of objects, to, for example, cluster related files.

"The infrastructure itself observes patterns of access that you make and may decide that whenever you access file A you typically access files B and C," said Kubiatowicz. "If you go to Europe... and you access file A, it can know to immediately start getting files B and C somewhere close to you. That's called clustering, and that's just one of many [possible] optimizations."

Individual systems will communicate tracking information to a parent node, which will coordinate a hierarchical local arrangement of servers. If, for example, a single system cannot handle the volume of requests for a given set of data, it will communicate that to a parent node, which will trigger the creation of a replica of that data on another system.

If a number of servers fail, taking with them a percentage of a given set of data, OceanStore will use error correction codes similar to those used in computer memory to recover the original information.

Data that has been fragmented and distributed among multiple servers in this scheme can be recovered from as few as one quarter of the fragments, said Kubiatowicz.

Software developers will be able to write programs that make use of the scheme and modify older programs to gain limited access using application programming interfaces (APIs).

The researchers plan to demonstrate the scheme by the end of the year, according to Kubiatowicz. This proof-of-concept version will be written in the Java programming language, and will include UNIX system interfaces and a read-only proxy for the World Wide Web.

There are "many public companies and start-ups that are focusing on developing solutions for providing storage over the Internet," said Jehoshua Bruck, professor of computation and neural systems and electrical engineering at the California Institute of technology.

OceanStore is an interesting scheme, he said. "It is an integration of a number of existing concepts into a system. [But] I think the [researchers] underestimate the complexity of building the distributed computing/network side."

Given the scope of OceanStore, a business arrangement similar to cooperative utilities, or cell phone network partnerships would be needed to run it, according to Kubiatowicz. A limited version of the scheme could be adopted within three years, but a full implementation would take five to ten years, he said.

Kubiatowicz's research colleagues were David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao.

The scheme is described in "OceanStore: An Architecture for Global-Scale Persistent Storage," published in Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000) in Cambridge, Massachusetts, November, 2000.

The research was funded by Defense Advanced Research Projects Agency (DARPA), the National Science Foundation (NSF), EMC, IBM, and Nortel.

Timeline:   3 years, 5-10 years
Funding:  Government, Corporate
TRN Categories:  Internet; Data Storage Technology
Story Type:   News
Related Elements:  Technical paper, "OceanStore: An Architecture for Global-Scale Persistent Storage," Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000), Cambridge, Massachusetts November, 2000


January 31, 2001

Page One

Store globally, access locally

Ordinary light could drive quantum computers

Color deepens data storage

Motor goes all the way around

Switch channels atom beams


Research News Roundup
Research Watch blog

View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 

Ad links:
Buy an ad link


Ad links: Clear History

Buy an ad link

Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN

© Copyright Technology Research News, LLC 2000-2006. All rights reserved.