|
|
OODT's
profile servers
,
product servers
, and other
components all use the same format for a query. It's
encapsulated by the class
jpl.eda.xmlquery.XMLQuery
. In this tutorial,
we'll look at this class and see how it represents queries.
You'll need this knowledge both to make queries to OODT
servers, as well as to understand queries coming into OODT
servers.
Capturing various aspects of a query is difficult to do in
general, and OODT's implementation is not stellar or complete.
But, it has proved succesful in a variety of applications, so
let's see what concepts it encapsulates.
First, forget the fact that the XMLQuery has "XML" in its
name. It doesn't mean you can query only XML resources.
It's called XMLQuery probably because the person who came up
with it thought XML was pretty cool, or that you can
represent an OODT query in XML format.
While you
can
represent an XMLQuery in XML, you
usually only use the Java representation, that is, you
create and manipulate Java objects of the class
jpl.eda.xmlquery.XMLQuery
.
In theory, the XMLQuery can represent
any
query
for information. It captures generic aspects of a query,
such as the domain of the question being posed, the range in
which the desired response should be formulated, and
constraints on what selects the response. In XMLQuery
parlance, we call these the "from element set" (domain), the
"select element set" (range), and the "where element set"
(constraints).
In practic
e, none of the current OODT implementations use
any but the "where element set." And indeed, for most
problems presented to OODT, that is sufficient. However,
the framework is there to support more aspects of a query,
and you're welcome to use them in your own deployments.
The XMLQuery concept captures metadata about a query as
well, such as the title for the query, whether the query
itself is secret or classified, how many results to return
at most, how to propagate the query through a network, and
so forth. In practice, though, none of these additional
attributes are used in current deployments of OODT.
Moreover, none of the current OODT components obey such
settings such as maximum number of results or propagation
types.
As a result, you should ignore these aspects of the
XMLQuery and merely use its default values. We'll see these
shortly.
The following diagram shows the XMLQuery and related classes:
A single
XMLQuery
object has three separate
lists of
QueryElement
objects, representing the
"from", "select", and "where" element sets. In practice,
the "from" and "where" sets are empty, though, as mentioned.
There's also a single
QueryHeader
object
capturing query metadata. Within the
XMLQuery
itself is additional query metadata. Finally, there's
exactly one
QueryResult
object which captures
the results of the query so far.
The XMLQuery class uses lists of
QueryElement
objects to represent its "from", "select", and "where" element
sets. The lists form a postfix boolean stack, with the
zeroth element of the list being the top of the stack.
Althou
gh you can populate these stacks by manipulating their
corresponding
java.util.List
s, the XMLQuery class
provides a boolean expression language that lets you directly
populate them.
The XMLQuery class also respects that some queries just
cannot be formulated as a boolean expression. In these cases,
you can pass in a string that the XMLQuery will otherwise
carry unparsed. Note that your profile and product servers
will then have the responsibility of handling that string in
some appropriate way.
The query language that XMLQuery uses to generate postfix
boolean stacks is a series of infix, not postfix,
element-and-value expression linked by boolean operators.
Here's an example:
temperature > 36 AND latitude < 45
As you can see, these are
triples
linked in a
logical expression. Each triple has the form
(
element
,
relation
,
literal
).
For example, the first triple has
element
=
temperature
,
relation
= GT
(greater-than), and
literal
= 36. That triple is
linked to the next one with the boolean
AND
operator.
The full set relation operators include:
=
(EQ),
!=
(NE),
<
(LT),
<
=
(LE),
>
(GT),
>
=
(GE),
LIKE
, and
NOTLIKE
. The logical operators include
AND
,
&
,
OR
,
|
,
NOT
, and
!
. You can
use parenthesis to group things too.
Here are a few more examples:
specimen = Blood
bac > 0.05 AND priors = 3
surname LIKE 'Simspon%' OR numChildren <= 3 AND RETURN = numEpisodes
The "where" element set is actually a
java.util.List
of
jpl.eda.xmlquery.QueryElement
objects, arranged
in a boolean stack with the top of the stack as the zeroth
element in the list.
QueryElement
objects
themselves have two attributes, a role and a value.
The role tells what role the
QueryElement
is
playing. It can be
elemName
for the
element
part of a triple,
RELOP
for
the
relation
part of a triple,
LITERAL
for the
literal
part of a triple, or
LOGOP
for a logical operator linking triples
together. The value tells what the element is, what the
relational operator is, what literal value is being related,
or what the logical operator is.
The
XMLQuery
parses a query expression and
generates a corresponding stack of
QueryElement
s.
Let's look at a couple examples. The expression
generates the "where" stack
While the expression
artist = Bach AND NOT album = Poem OR track != Aria
generates the "where" stack
A special element is reserved by XMLQuery:
RETURN
. It's used to indicate what to select,
and so any value specified with
RETURN
goes
into the "select" set, not the "where" set.
Moreover, the
RETURN
element doesn't pay
attention to how it's linked with boolean expressions in the
rest of query, or what relational operator is used with the
literal value being returned. For example, that means
all
of the following expressions would generate
identical
XMLQueries:
specimen = Blood AND RETURN = volume
specimen = Blood OR RETURN = volume
specimen = Blood AND RETURN != volume
specimen = Blood AND RETURN < volume
specimen = Blood AND RETURN LIKE volume
All
QueryElements
from RETURN triples would go
into the "select" instead of the "where" set.
To construct a query, you'll use a Java constructor of the
following form:
XMLQuery(String keywordQuery, String id, String title,
String desc, String ddId, String resultModeId, String propType,
String propLevels, int maxResults, java.util.List mimeAccept,
boolean parseQuery)
The parameters are summarized below:
|
Parameter
|
Purpose
|
Sample values
|
|
keywordQuery
|
A string representing your query
expression, in the query language described above, or in
some other application-sepcific
language.
|
numDonuts = 3
,
select volume_remaining from specimens where
specimen_type = 4
|
|
id
|
An identifier for your query
|
query-1, 1.3.6.1.1316.4.1, myQuery, urn:ibm:sys:0x39ad930a
|
|
title
|
A title for your query
|
My First Query, Query for Blood Specimens, Simpson's Query
|
|
desc
|
Description of the query
|
H.J. Simpson is looking for donut shops
|
|
ddId
|
Data dictionary ID. This identifies the
data dictionary that provides definitions for the elements
used in the query like "specimen" or "numDonuts". It's
not used by any current OODT deployment or the OODT
framework.
|
null
|
|
resultModeId
|
Identifies what to return from
the query. Defaults to
ATTRIBUTE
. Not used
by any current OODT deployment or the OODT
framework
.
|
null
|
|
propType
|
How to propagate the query, defaults
to
BROADCAST
. It's not used by any current
OODT deployment or the OODT framework.
|
null
|
|
propLevels
|
How far to propagate the query,
defaults to
N/A
. Not used by any current
OODT deployment or the OODT
framework.
|
null
|
|
maxResults
|
At most how many results to return; not enforced by OODT framework.
|
1, 100,
Integer.MAX_VALUE
, -6
|
|
mimeAccept
|
List of acceptable MIME types for
returned products, defaults to
*/*
|
List types = new ArrayList(); types.add("text/xml"); types.add("text/html"); types.add("text/*");
|
|
parseQuery
|
Should the class parse the query as
a boolean expression? True says to generate the boolean
expression stacks. False says to just save the expression
string.
|
true
,
false
|
All of the values above can be set to
null
to
use a default or non-specific value (except for
maxResults
and
parseQuery
, which are
int
and
boolean
types and can't be
assigned
null
). For most applications, using
null
is perfectly acceptable. Since the OODT
framework doesn't use
maxResults
, you can use any
value. However, specific profile servers' and product
servers' query handlers may pay attention to value if so
programmed.
The last parameter,
parseQuery
, tells if you
want the
XMLQuery
class to parse your query and
generate boolean expression stacks (discussed above) or not.
Set to
true
, the class will parse the string as
if in the XMLQuery language described above, and will generate
the "from", "select", and "where" element boolean stacks. Set
it to
false
and the class won't parse the string
or generate the stacks. It will instead store the string for
later use by a profile server's or product server's query
handler.
For example, if you pass in the XML query language
expression,
donutsEaten > 5 AND RETURN = episodeNumber
then set the
parseQuery
flag to
true
. As another example, suppose the
query expression is
select episodeNumber from episodes where donutsEaten > 5
This is an SQL expression, probably targeted to a product
server than can handle SQL expressions. In this case, set
parseQuery
to false.
The current OODT deployments for the Planetary Data System
and the Early Detection Research Network both use
parsed
queries.
Internet standards for mail, web, and other applications
use
MIME
types (described in
RFC-2046
amongst other documents) to describe the content and media
type of data. So does OODT. When you construct an
XMLQuery
, you can also pass in a list of MIME
types that are acceptable to you for the format of any
returned products, much in the same way your web browser
tells a web server what media types it can display.
The list of acceptable MIME types is only used for product
queries since products can come in any shape and flavor.
Profile queries ignore the list; profiles are always
returned as a list of Java
jpl.eda.profile.Profile
objects.
You've probably seen MIME types before, but here a
re some
examples in case you haven't:
-
text/plain
- a plain old text file
-
text/html
- a hypertext document
-
image/jpeg
- a picture in the JPEG/JFIF format
-
image/gif
- a picture in the GIF format
-
audio/mpeg
- an audio file, probably in the MP3 format
-
video/mpeg
- a video file, probably in the MP2 format
-
application/msword
- a Micro$oft Word document
-
application/octet-stream
- binary data
In the
XMLQuery
constructor, you can pass in a
list of MIME types that shows your
preference
for
returned products. Product servers' query handlers examine
the query to see if they can provide a matching product,
and
they examine the list of MIME types to see if
they can provide matching products in the format you desire.
As an example, suppose you create a MIME type list as follows:
List acceptableTypes = new ArrayList();
acceptableTypes.add("image/tiff");
acceptableTypes.add("image/png");
acceptableTypes.add("image/jpeg");
and you pass
acceptableTypes
as the
mimeAccept
parameter of the
XMLQuery
constructor. This tells query
handlers receiving your query that you'd really prefer a
TIFF format image. However, failing that, you'll accept a
PNG format image. And, as a last resort, a JPEG will do.
You can also use wildcards in your MIME types. Suppose we
did the following:
List acceptableTypes = new ArrayList();
acceptableTypes.add("image/tiff");
acceptableTypes.add("image/png");
acceptableTypes.add("image/*");
Now we tell query handlers in product servers that we
really prefer TIFF format images. If a query handler can't
do that, then a PNG format will be OK. And if a query
handler can't do PNG, then
any
image
format will be
fine, even loathesome GIF.
If you pass a
null
or an empty list in the
mimeAccept
parameter, the OODT framework will
convert into a single item list:
*/*
, meaning
any format is acceptable.
The
XMLQuery
class is also an executable class.
By running it from the command-line, you can see how it
generates its XML representation. It also lets you pass in a
file containing an XML representation of an XMLQuery and
parses it for validity.
Let's try just seeing that XML representation. (In these
examples, we'll be using a Unix
csh
like
command environment. Other shells and non-Unix users will
have to adjust.)
First up, we'll need two components:
-
EDM Common Components
. This
is needed by all of OODT software; it contains general
utilities for starting servers, parsing XML, logging, and
more.
-
EDM Query Expression
. This
contains the
XMLQuery
and related classes.
Download the binary distribution of each of these packages
and extract their contents. Then, create a single directory
and collect the jar files together in one place:
% mkdir query
% cd query
% cp ~/edm-commons-2.2.5/*.jar .
% cp ~/edm-query-2.0.2/*.jar .
% ls -l
total 192
-rw-r--r-- 1 kelly kelly 149503 25 Feb 09:53 edm-commons-2.2.5.jar
-rw-r--r-- 1 kelly kelly 43879 25 Feb 09:53 edm-query-2.0.2.jar
To generate the query, pass the command-line argument
-expr
. That tells the XMLQuery that the rest
of the command line is the query expression. It will expect
it to be
in the XMLQuery query language (meaning that it
will create an
XMLQuery
object with
parseQuery
set to
true
).
Here's an example:
% java -Djava.ext.dirs=. \
jpl.eda.xmlquery.XMLQuery \
-expr donutsEaten \> 5 AND RETURN = episodeNumber
kwdQueryString: donutsEaten > 5 AND RETURN = episodeNumber
fromElementSet: []
results: jpl.eda.xmlquery.QueryResult[list=[]]
whereElementSet:
[jpl.eda.xmlquery.QueryElement[role=elemName,value=donutsEaten],
jpl.eda.xmlquery.QueryElement[role=LITERAL,value=5],
jpl.eda.xmlquery.QueryElement[role=RELOP,value=GT]]
selectElementSet:
[jpl.eda.xmlquery.QueryElement[role=elemName,value=episodeNumber]]
======doc string=======
<?xml version="1.0" encoding="UTF-8"?>
<query> . . .
The program prints out some fields of the XMLQuery such as
the "from" element set, the current results (which should
always be empty since we haven't passed this query to any
product servers), the "where" element set, and the "select"
element set. It then prints out the XML representation.
If you examine the XML representation closely, you'll see
things like the list of acceptable MIME types:
<queryMimeAccept>*/*</queryMimeAccept>
This says that any type is acceptable. You'll also see the
passed in query string:
<queryKWQString>donutsEaten > 5 AND
RETURN = episodeNumber</queryKWQString>
Regardless of whether you passed
true
or
false
in the
parseQuery
parameter,
the
XMLQuery
always saves the original query
string. For unparsed queries, this is how the string is
packaged on its way to a product server. For parsed
queries, product servers will use the boolean stacks.
(Since this was a parsed query, you'll also see the boolean
stacks in XML format if you look closely. They're there.)
Alert readers will have noticed that the results of a query
have a place in
XMLQuery
objects. This actually
applies to product queries only. After sending an
XMLQuery
to a product server, the query object
comes back adorned with zero or more matching results. You
then access the
XMLquery
object methods to
retrieve those results.
The following class diagram demonstrates the relationship:
As you can see, a single query has a single
jpl.eda.xmlquery.QueryResult
, which contains a
java.util.List
of
jpl.eda.xmlquery.Result
objects.
Result
objects may have zero or more
Header
s, and
Result
objects may
actually be
LargeResult
objects.
To retrieve the list of
Result
objects, call the
XMLQuery
's
getResults
method, which
returns the
java.util.List
directly.
Each result also includes
-
An identifier. In the case there's more than one matching
results, this identifier (a string) should be unique amongst
results.
-
A MIME type. This tells you what format the matching product is in.
-
A profile ID. This is currently unused.
-
A resource ID. This is also unused.
-
A validity period. This is the number of milliseconds for
which the product is considered valid. You can use this
information to decide how long to cache the product within
your own program before having to retrieve it again.
-
A flag indicating whether the product is classified.
Classified or secret products shouldn't be cached or should
otherwise be handled carefully by your application program.
The headers of a result are optional. They
're used for
tabular style results to indicate column headings. Each
Header
object captures three strings, a name, a
data type, and units.
For example, suppose you retrieved a product that was a
table of temperatures at various locations on the Earth.
There might be three headers in the headers list:
|
List Index
|
Header
|
|
Name
|
Data Type
|
Units
|
|
0
|
latitude
|
float
|
degrees
|
|
1
|
longitude
|
float
|
degrees
|
|
2
|
temperatuer
|
float
|
kelvins
|
Suppose the product you get back as a picture of a tissue
specimen. In this case, there would be
no
headers.
To retrieve the actual data comprising your product, call
the
Result
object's
getInputStream
method. This returns a standard
java.io.InputStream
that lets you access the
data. How you interpret that data, though, depends on the
MIME type of the product, which you can get by calling the
Result
's
getMIMEType
method.
For example, if the MIME type was
text/plain
,
then the byte stream would be a sequence of Unicode
characters. If it were
image/jpeg
, then the
bytes would be image data in JPEG/JFIF format.
In this tutorial, we learned about the structure of the
standard query component in OODT, the
XMLQuery
.
We saw the query language that XMLQuery supports and how it
generates postfix boolean expression stacks. You can also
encode any query expression by using a special constructor
argument that tells XMLQuery to not parse the query string.
We also execute the
XMLQuery
class directly.
Finally, we saw how product data is embedded in the XMLQuery
and how to deal with such results.
As a client of the OODT framework, you can now create
XMLQuery
objects to query product servers from
within your Java applications. As a server in the framework,
you know how to deal with incoming query objects.
|