|
|
The OODT Data Grid Framework is an interoperable, extensible,
and standards-based set of software components that provide
data grid features for the automatic discovery and retrieval
of resources. The framework provides the following features:
-
It makes different data appear the same.
-
It provides transparent access to distributed resources.
-
It provides data discovery and query optimization.
The OODT Data Grid Framework is currently deployed
operationally in planetary science and biomedical domains.
Because it provides an architectural model for data and
infrastructure it's easy to set up for a variety of
information representation and resource access needs.
The following components currently comprise the OODT Data
Grid Framework:
-
Product Service
The Product Service provides access to data products. Products
can be scientific datasets, images, documents, or anything with an
electronic representation. The Product Service accepts standard
query expressions (see the Query Expression component) and returns
zero or more matching products. In addition, the product service
can transform products from proprietary formats and into Internet
standard formats or run other transformations, all without
impacting local stores or operations.
-
Profile Service
The Profile Service describes and locates resources using metadata
descriptions. These descriptions, called profil
es, tell of a
resource's inception, composition, and location using a mix of
Dublin Core and ISO-11179 metadata as well as URIs for locations.
The Profile Service catalogs metadata descriptions and provides
creating, updating, and querying capabilities.
-
Query Service
The Query Service is the client gateway into the grid of profile
(metadata) and product (data) nodes. The Query Service accepts
metadata (resource location) and data (resource retrieval) queries
and directs them to proper nodes for resoluton. For profile
queries, it also crawls a digraph of profile nodes where profiles
describe other profile servers, collecting and collating results.
-
Web Grid
The OODT grid services (product and profile services) use CORBA or
RMI as their underlying network transport. However, limitations
of CORBA and RMI make them suck horribly in deployment. For one,
both are procedural mechanisms, providing a remote interface that
resembles a method call. This makes streaming of data from a
service impossible, because there are limitations to the sizes of
data structures that can be passed over a remote method call.
Instead, repeated calls must be made to retrieve each block of a
product, making transfer speeds horribly slow compared to HTTP or
FTP. (Block-based retrieval of profiles was never implemented,
resulting in out of memory conditions for large profile results,
which is another problem.) Second, both CORBA and RMI rely on a
central name registry. The registry makes an object independent
of its network location, enabling a client to call it by name
(looking up its last known location in the registry). However,
this requires that server objects be able to make outbound network
calls to the re
gistry (through any outbound firewall), and that
the registry accept those registrations (through any inbound
firewall). This required administrative action at institutions
hosting server objects and at the institution hosting the
registry. Often, these firewall exceptions would change without
notice as system adminstrators changed at each location
(apparently firewall exceptions are poorly documented everywhere).
Further, in the two major deployments of OODT (PDS and EDRN),
server objects have almost never moved, nullifying any benefit of
the registry. This project, OODT Web Grid Services, avoids the
prolems of CORBA and RMI by using HTTP as the transport mechanism
for products and profiles. Further, it provides a
password-protected mechanism to add new sets of product and
profile query handlers, enabling seamless activation of additional
capabilities. In short, it doesn't suck.
|