|
|
A profile serves as a generic template for describing the
characteristics of a resource. The question posed to a profile
generally takes the form of, "Can you answer
X
?" or
"Do you know the location of
X
?" where
X
is some resource being sought. The more fully a profile describes
a particular resource, the better the profile can be used to
determine if the resource has the information in
X
.
Profile servers capture three kinds of information:
-
Resource Attributes
Resource attributes are
metadata about the resource's
inception
. These
attributes include the creator of the resource, in what
language it exists, when it was created, and so forth.
These attributes are based on the work of the
Dublin Core Metadata Initiative
.
-
Profile Elements
Profile elements are metadata
about the resource's
composition
. These tell you
about the morphology of the resource, such as data types
captured within in, minimum and maximum values, synonymous
elements, and so forth. These attributes are based on
ISO/IEC 11179
standards
.
-
Profile Attributes
Profile attributes are
metadata about the
profile itself
, such as who
made it, whether it's classified, revision notes, and so
forth. It also has a unique identifying
Object
Identifier (OID)
.
The following class diagram shows the relationship between
the different parts of a profile:
While this diagram shows the Java field names and Java
classes, the relationship applies to profiles whether they
exist as Java objects, as RDF documents, or as XML documents
in the profile vocabulary.
Profiles, whether expressed in RDF or in their own XML
vocabulary, have a section for capturing information about the
resource's inception. This includes information about when the
resource was created, who created it, in what language it
exists, and so forth. Profiles use the element set recommended
by the Dublin Core Metadata Initiative (DCMI) set in order to
describe the inception of a resource, with some extensions.
Collectively, these metadata are called the
resource
attributes
or
resAttributes
of the profile.
Every profile has one and only one set of
resAttributes
. The metadata elements within the
resAttributes
are defined in this section.
As defined by the DCMI, the
Identifier
of a
resource is some unambiguous way to identify the resource.
In the profile implementation, one and only one
Identifier
is
required
.
It's highly recommended that
Identifier
s and
resLocation
s (see below) be URIs, but there's
no software enforcment for this
unless you convert a
Java profile to RDF
with the
toRDF
method.
Identifiers should be more like URNs, while resLocations
should be more like URLs.
The
Title
names the resource, and is the name by
which the resou
rce is formally known. The
Title
is
optional; if present, it may occur only once in a profile.
The
Format
indicates the manifestation of the
resource. You can specify any number of
Format
s in
a profile.
The
Description
element contains a free text
account of the content of the resource. It's optional in a
profile; if present, it may occur only once.
Zero or more
Creator
s may be specified in a
profile.
Creator
s contain the name of people or
organizations that created the resource.
You can list zero or more
Subject
s in a profile.
The purpose of the
Subject
elements is to contain a
keywords that describe the resource, usually selected from a
controlled vocabulary.
Any number of
Publisher
elements may appear in a
profile. They contain the organization responsible for
making the resource available.
A
Contributor
is a person or organization
providing auxilliary work towards the resource's creation.
Any number of
Contributor
s may be listed in a
profile.
Date
elements indicate the times in history when
the resource was created. You can include any number of
Date
s in a profile, although typically you'll
specify just one if you speciy any at all.
The
Type
element indicates the nature of the
content of the resource, such as "fic
tion" for a work of
fiction or "image" for a dataset rendered graphically. You
can include any number of
Type
s in a profile.
When a resource is derived others, the
Source
element should indicate the
Identifier
s of the
referenced resources. You can specify any number of
Source
s in a profile.
For resources that contain natural language content, the
Language
element indicates the languages in use.
You can specify this element any number of times in a
profile.
When a resource is related to others, you can specify the
Identifier
s of the related resources using zero or
more
Relation
elements.
For resources that cover a space or time or jurisdiction,
use the
Coverage
element to indicate such coverage.
This element may be listed any number of times in a profile,
and its content should come from a controlled vocabulary.
For resources with specific coordinate systems, it's better
to use profile elements, described below.
Copyright, ownership, redistribution, use, and other legal
issues may exist for a resource. When that happens, use the
Rights
element to list the rights management
information. You can list zero or more
Rights
elements in a profile.
Note:
The official name of element for is plural
Rights
; this is inconsistent with the other
metadata elements, but is consistent with the DCMI.
The
resContext
element identifies the application
environment or discipline within which the resource
originates and is derived from a taxonomy of scientific
disciplines. This element is required in a profile and may
occur multiple times.
As an example, a
resContext
of
NASA.PDS.Geoscience
tells that the resource is
associated with the Geoscience node of the Planetary Data
System.
The
resAggregation
element indicates the
aggregative structure of the resource. It tells you what
you'll get if you retrieve the resource: a granule, a
dataset, or a collection of datasets. The legal values of
this optional elements are:
-
granule
, meaning the resource is a single
product
-
dataSet
, meaning the resource is a set of
products
-
dataSetCollection
, meaning the resource is
collection of datasets
The
resAggregation
element is optional; however,
if specified, it may appear in a profile only once.
The
resClass
element identifies the kind of the
resource within a taxonomy of resource types. It's a
required
element that is used by the OODT Framework
to determine how to treat the profile as well as the
resource named by the profile.
For example, a
resClass
of
system.productServer
indicates that the resource is
an OODT product server. A query that matches this profile
means that if the same query were given to the identified
product server, it would yield a result. A
resClass
of
system.profileServer
means the
resource is a profile server. That means that while the
current profile server may or may not provide a matching
profile, another profil
e server might, forming an implicit
digraph of profile servers. Other valid
resClass
values include
data.granule
,
data.dataSet
,
and
application.interface
.
Zero or more
resLocation
elements may appear in
a profile. They tell where the resource is located, easily
the most important part of the profile. Because this
element may appear several times, all locations should be
considered valid; the application may pick the one that's
most convenient. The
resLocation
may also appear zero
times. This means that the profile indicates solely that
the resource existswhere is unknown.
The interpretation of the resLocation is as a URI. For
example, a
resClass
of
system.productServer
or
system.profileServer
means that the
resLocation
indicates an URN to a software object
name. Querying that object will yield either the desired
result (for product servers) or more matching profiles (for
profile servers). For a resClass of
data.granule
or
data.dataSet
, the
resLocation
is an URL
to the granule or dataset.
The most interesting part of a profile is in the metadata
that describes the composition of the resource that the
profile profiles. The composition metadata is what enables a
profile server to tell if a particular resource can answer a
query.
The composition metadata is based on the data element
description standards in ISO/IEC standard 11179. They are the
profile elements
or
profElement
s of a profile.
Every profile may have zero or more
profElement
s, the
components of which are discussed in this section.
The
elemId
is an optional universally unique
identifier applied to the element.
The
elemName
is the
required
name of the
profile element. It serves as the title role of one of the
components of the resource.
The
elemDesc
is the description of the profile
element. Although the title may often be enough to identify
the purpose of the profile element, the description should
be used to provide any further, free-text information that
may be of importance to analysts and profile administrators.
The description is optional.
The
elemType
indicates the type of data
represented in the profile element, synonymous to the
ISO/IEC 11179
Datatype
attribute. The permissible
values are:
-
boolean
-
character
-
date_time
-
enumerated
-
integer
-
ordinal
-
rational
-
scaled
-
real
-
complex
-
state
-
void
This element is optional within a profile element. When
it's not present, the profile element merely indicates that
the resource's content possesses the attribute, but more is
not known.
The
elemUnit
indicates the units associated with
the values of the data element. This element is synonymous
to the ISO/IEC 11179 attribute
unit.of.quantity
.
Values for this optional element should be selected from
standardized
tables of units.
The
elemEnumFlag
tells how possible values of the
profile element are specified. It works with the
elemValue
,
elemMinValue
, and
elemMaxValue
elements:
-
If the
elemEnumFlag
's value is
T
and
one or more
elemValue
s appear, then the values
listed are the valid values of the element.
-
If the value is
F
, then a closed range of
values bounded by the profile's
elemMinValue
and
elemMaxValue
elements indicates the valid values.
-
If the value is
T
but no
elemValue
s
appear, then it means that any value is a valid
value for the resource.
Often, a characteristic of a resource will go by several
names, especially between scientific disciplines. What one
person may call
latitude
, another may call
x
coordinate
, for example. By specifiyng synonyms for a
profile element, you can assist in automatic correlation of
results and cross-disciplinary discovery.
The
elemSynonym
provides a way to do just that.
Zero or more
elemSynonym
s may appear in a profile
element. The values of this element are names from data
dictionaries other than the discipline data dictionary
hosting the profile.
The
elemObligation
tells whether the data element
is required to always or sometimes be present. This element
is synonymous to the ISO/IEC 11179 attribute
Obligation
, and is optional within a profile
element.
The legal values for this element are
Required
and
Optional
, with the obvious meanings.
The
elemComment
field provides a remark concerning
the application of the data element. This element is
synonymous to the ISO/IEC 11179 attribute
Comment
,
and is optional within a profile element.
For a profile server to manage a set of profiles, it's
necessary to have metadata contained within the profile that
describes the profile itself. This metadata, collectively
called the profile attributes, or
profAttributes
,
serves that purpose.
Most of the elements within the
profAttributes
are
optional. This sections describes each of them.
The
profId
serves to give a unique identifier to
the profile. It should be expressed as a URI, and often as
an URN.
The
profVersion
identifies the version number of
the profile.
The
profType
identifies the type of the profile.
The type that typically appears here is
profile
,
meaning the profile is a profile (obviously).
Another type that can be here is
dataDict
, which
indicates that the profile doesn't describe a resource, but
instead is a data dictionary for other profiles. Such a
profile's composition elements name the expected profile
elements and ranges of valid valuese that will appear in
other profiles. The
profDataDictId
element
identifies the profile serving as its data dictionary.
The
profStatusId
identifies the state of the
profile. Profiles may be either
active
or
inactive
. An inactive profile is likely maintained
for historical or exemplary reasons but is otherwise not
currently used for searches or resource descriptions.
The
profSecurityType
identifies whether the
information contained in the profile may be of a sensitive
nature. Any string is valid here as the current OODT
software does not use this field.
The
profParentId
optionally identifies the URI of
the parent of this profile. Profiles may be arranged
hierarchically in a singly rooted tree in a forest.
The
profChildId
identifies zero or more children (by duplicating the element) of
this profile.
The
profRegAuthority
names the registration authority responsible for authoring and
maintaining the profile.
The
profRevisionNote
appears zero or more times in
the profile to describe changes made to it over time. The
notes are free form text, and each element is ordered from
newest to oldest note.
The
profDataDictId
identifies the profile providing a data dictionary to the
current profile.
Let's take a look at how profiles would describe resources by
looking at an example set of scientific data. Suppose you
archive high temperature data for your weather service; this
data comes in the form of tables of latitude/longitude
locations and the high temperature recorded at each point.
Since you're archiving daily high temperatures, there's one
table per day, so each day's table is a discrete resource.
Let's say you've got just three days of data so far, though,
and it looks like this (to keep things simple).
|
Day Number
|
Lat
|
Lon
|
High Temp
|
|
1
|
104.1
|
39.2
|
26.5
|
|
110.3
|
42.4
|
29.9
|
|
121.5
|
45.6
|
23.3
|
|
2
|
104.1
|
39.2
|
31.5
|
|
110.3
|
42.4
|
30.9
|
|
121.5
|
45.6
|
27.5
|
|
2
|
104.1
|
39.2
|
20.8
|
|
110.3
|
42.4
|
19.5
|
(On day #3, vandals destroyed the weather sensor station at
(121.5, 45.6), so there are only two measurements that day.)
To make profiles for each day's of data, let's gather some
data that will be common to all of them. First, say the
weather service's OID is 2.6.1.9, and for all collected data
the weather service has reserved an OID 2.6.1.9.2, high
temperature measurements 2.6.1.9.2.1. They choose to make a
URI for each dataset,
urn:weather:data:highs:
day-number
where
day-number
is the day number of the data.
The official creator for all this data will be "Weather
Service", under subject keywords "weather", "temperatures",
and "measurements". They'll also make the data tables
accessible as web documents in MIME format
text/tab-separated-values
at the address
http://weather.gov/data/highs/
day-number
.txt
.
Here, then, is the profile for the day 1:
<profile>
<profAttributes>
<profId>2.6.1.9.2.1.1</profId>
<profType>profile</profType>
<profStatusId>active</profStatusId>
</profAttributes>
<resAttributes>
<Identifier>urn:weather:data:highs:1</Identifier>
<Title>High Temperatures - Day 1</Title>
<Format>text/tab-separated-values</Format>
<Creator>Weather Service</Creator>
<Subject>weather</Subject>
<Subject>temperatures</Subject>
<Subject>measurements</Subject>
<resContext>NOAA.NWS.Data</resContext>
<resClass>data.granule</resClass>
<resLocation>http://weather.gov/data/highs/1.txt</resLocation>
</resAttributes>
<profElement>
<elemName>latitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>104.1</elemMinValue>
<elemMaxValue>121.5</elemMaxValue>
</profElement>
<profElement>
<elemName>longitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>39.2</elemMinValue>
<elemMaxValue>45.6</elemMaxValue>
</profElement>
<profElement>
<elemName>temperature</elemName>
<elemType>real</elemType>
<elemUnit>celsius</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>23.3</elemMinValue>
<elemMaxValue>29.9</elemMaxValue>
</profElement>
</profile>
Someone searching for a high temperature that exceeded 25
degrees, for example, would find this as a matching
resource, as the
elemMinValue
for
temperature
is 23.3, and 25 is over that.
Here are all three profiles in one document:
<profiles>
<profile>
<profAttributes>
<profId>2.6.1.9.2.1.1</profId>
<profType>profile</profType>
<profStatusId>active</profStatusId>
</profAttributes>
<resAttributes>
<Identifier>urn:weather:data:highs:1</Identifier>
<Title>High Temperatures - Day 1</Title>
<Format>text/tab-separated-values</Format>
<Creator>Weather Service</Creator>
<Subject>weather</Subject>
<Subject>temperatures</Subject>
<Subject>measurements</Subject>
<resContext>NOAA.NWS.Data</resContext>
<resClass>data.granule</resClass>
<resLocation>http://weather.gov/data/highs/1.txt</resLocation>
</resAttributes>
<profElement>
<elemName>latitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>104.1</elemMinValue>
<elemMaxValue>121.5</elemMaxValue>
</profElement>
<profElement>
<elemName>longitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>39.2</elemMinValue>
<elemMaxValue>45.6</elemMaxValue>
</profElement>
<profElement>
<elemName>temperature</elemName>
<elemType>real</elemType>
<elemUnit>celsius</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>23.3</elemMinValue>
<elemMaxValue>29.9</elemMaxValue>
</profElement>
</profile>
<profile>
<profAttributes>
<profId>2.6.1.9.2.1.2</profId>
<profType>profile</profType>
<profStatusId>active</profStatusId>
</profAttributes>
<resAttributes>
<Identifier>urn:weather:data:highs:2</Identifier>
<Title>High Temperatures - Day 2</Title>
<Format>text/tab-separated-values</Format>
<Creator>Weather Service</Creator>
<Subject>weather</Subject>
<Subject>temperatures</Subject>
<Subject>measurements</Subject>
<resContext>NOAA.NWS.Data</resContext>
<resClass>data.granule</resClass>
<resLocation>http://weather.gov/data/highs/2.txt</resLocation>
</resAttributes>
<profElement>
<elemName>latitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>104.1</elemMinValue>
<elemMaxValue>121.5</elemMaxValue>
</profElement>
<profElement>
<elemName>longitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>39.2</elemMinValue>
<elemMaxValue>45.6</elemMaxValue>
</profElement>
<profElement>
<elemName>temperature</elemName>
<elemType>real</elemType>
<elemUnit>celsius</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>27.5</elemMinValue>
<elemMaxValue>31.5</elemMaxValue>
</profElement>
</profile>
<profile>
<profAttributes>
<profId>2.6.1.9.2.1.3</profId>
<profType>profile</profType>
<profStatusId>active</profStatusId>
</profAttributes>
<resAttributes>
<Identifier>urn:weather:data:highs:3</Identifier>
<Title>High Temperatures - Day 3</Title>
<Format>text/tab-separated-values</Format>
<Creator>Weather Service</Creator>
<Subject>weather</Subject>
<Subject>temperatures</Subject>
<Subject>measurements</Subject>
<resContext>NOAA.NWS.Data</resContext>
<resClass>data.granule</resClass>
<resLocation>http://weather.gov/data/highs/3.txt</resLocation>
</resAttributes>
<profElement>
<elemName>latitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>104.1</elemMinValue>
<elemMaxValue>110.3</elemMaxValue>
</profElement>
<profElement>
<elemName>longitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>39.2</elemMinValue>
<elemMaxValue>42.4</elemMaxValue>
</profElement>
<profElement>
<elemName>temperature</elemName>
<elemType>real</elemType>
<elemUnit>celsius</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>19.5</elemMinValue>
<elemMaxValue>20.8</elemMaxValue>
</profElement>
</profile>
</profiles>
Given this set of profiles, a profile search for resources
with
latitude
>
120.0
would match profiles
for day 1 and 2, but not day 3. Actual profile searches are
possible by taking the above document and loading it into
the
LightweightProfileHandler
, yet that becomes
impractical for many many profiles, as it holds all of the
profile objects in memory and "searches" them in place.
More likely, data such as these would be stored in a
relational database, and the matching profiles would be
generated on demand.
Let's make one more profile, a profile that describes
the entire collection
:
<profile>
<profAttributes>
<profId>2.6.1.9.2.1</profId>
<profType>profile</profType>
<profStatusId>active</profStatusId>
</profAttributes>
<resAttributes>
<Identifier>urn:weather:data:highs:index</Identifier>
<Title>High Temperatures</Title>
<Format>text/tab-separated-values</Format>
<Creator>Weather Service</Creator>
<Subject>weather</Subject>
<Subject>temperatures</Subject>
<Subject>measurements</Subject>
<resContext>NOAA.NWS.Data</resContext>
<resClass>system.profileServer</resClass>
<resLocation>urn:weather:data:highs:ProfileServer</resLocation>
</resAttributes>
<profElement>
<elemName>latitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>104.1</elemMinValue>
<elemMaxValue>121.5</elemMaxValue>
</profElement>
<profElement>
<elemName>longitude</elemName>
<elemType>real</elemType>
<elemUnit>degree</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>19.5</elemMinValue>
<elemMaxValue>31.5</elemMaxValue>
</profElement>
<profElement>
<elemName>temperature</elemName>
<elemType>real</elemType>
<elemUnit>celsius</elemUnit>
<elemEnumFlag>F</elemEnumFlag>
<elemMinValue>19.5</elemMinValue>
<elemMaxValue>31.5</elemMaxValue>
</profElement>
</profile>
Note that in addition to several changes in the resource
attributes, we've also changed the profile elements to
cover the entire range of latitude, longitude, and
temperature in the entire data set. So, for temperature,
the lowest high temperature for all three days was 19.5,
and the highest was 31.5. Now, a profile search can for
temperatures greater than 30 will match the profile for
the whole collection, as well as the profile for day #2.
In fact, the OODT framework supports automatic drill-down of
this kind. The
Query Service
, upon
encountering a matching profile, checks to see if the
resClass
is
system.profileServer
,
and if so, will pass the query to the profile server at the
resLocation
in the matched profile. It will
gather up all matching profiles and return them to the user.
In this way, it can follow a directed graph of linked profile
servers (automatically avoiding cycles), and gathering more
and more results.
|