Title: CQL a Common Query Language
1CQL a Common Query Language
?
- What CQL is
- Motivation
- Examples and explanation
- Applications
- Implementation
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
2Chapter 1 What CQL is
- CQL is a query language
- For humans to type
- For query forms to generate
- For translating other languages into
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
3Chapter 1 What CQL is
- CQL is a query language
- For humans to type
- For query forms to generate
- For translating other languages into
- The only query language of SRW/SRU
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
4Chapter 1 What CQL is
- CQL is a query language
- For humans to type
- For query forms to generate
- For translating other languages into
- The only query language of SRW/SRU
- Also applicable in other contexts
- Z39.50 (instead of the Type-1 Query)
- Vendor-neutral format for Metasearch
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
5Specifications and implementations
- CQL is a specification for expressing queries
abstractly. - you don't need to know the database schema.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
6Specifications and implementations
- CQL is a specification for expressing queries
abstractly. - you don't need to know the database schema.
- It has to be parsed by a CQL parser.
- parser produces a form easy to program with.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
7Specifications and implementations
- CQL is a specification for expressing queries
abstractly. - you don't need to know the database schema.
- It has to be parsed by a CQL parser.
- parser produces a form easy to program with.
- It has to be executed by some specific database
engine. - implementations will vary in what they support.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
8Chapter 2 Motivation
- Most query languages fall into one of two camps
- Complex and powerful, but cryptic and hard to
learn - SQL, Prefix Query Format (PQF), XML Query
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
9Chapter 2 Motivation
- Most query languages fall into one of two camps
- Complex and powerful, but cryptic and hard to
learn - SQL, Prefix Query Format (PQF), XML Query
- Easy to learn and use, but lacking in power
- Google, AltaVista, CCL
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
10Chapter 2 Motivation
- Most query languages fall into one of two camps
- Complex and powerful, but cryptic and hard to
learn - SQL, Prefix Query Format (PQF), XML Query
- Easy to learn and use, but lacking in power
- Google, AltaVista, CCL
- CQL aims to make simple queries easy, and
complex - queries possible (to paraphrase Larry Wall, of
Perl)
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
11Learning curves for query languages
SQL
Effort in learning query language
Power of query that can be expressed
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
12Learning curves for query languages
SQL
Effort in learning query language
Google
Power of query that can be expressed
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
13Learning curves for query languages
SQL
CQL
Effort in learning query language
Google
Power of query that can be expressed
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
14Chapter 3 Examples and explanation
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
15Chapter 3 Examples and explanation
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
16CQL features simple terms
- Here are some perfectly good CQL queries
- fish
- Churchill
- dinosaur
- comp.sources.misc
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
17CQL features quoting
- Double-quote marks remove the special meanings of
- special characters like space (which otherwise
separates - tokens) and of keywords such as and and or.
- "dinosaur"
- "the complete dinosaur"
- "extgtu.generic"
- "and"
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
18CQL features quoting
- Double-quote marks remove the special meanings of
- special characters like space (which otherwise
separates - tokens) and of keywords such as and and or.
- "dinosaur"
- "the complete dinosaur"
- "extgtu.generic"
- "and"
- (Backslash removes the special meaning of
following - double-quote characters.)
- "the \"nuxi\" problem"
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
19CQL features booleans
- The keywords and and or are boolean
operators. - The keyword not is an and-not binary operator.
- There is no unary negation operator. Case is not
- significant, so AND and aNd also work.
- dinosaur or bird
- dinosaur not reptile
- dinosaur and bird and reptile
- dinosaur and bird or dinobird
- dinosaur not theropod not ornithischian
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
20CQL features boolean precedence
- The and, or and not booleans all have equal
- precedence and are evaluated left-to-right.
- dinosaur and bird or dinobird
- MEANS
- (dinosaur and bird) or dinobird
- dinosaur or bird and dinobird
- MEANS
- (dinosaur or bird) and dinobird
- NOT
- dinosaur or (bird and dinobird)
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
21CQL features parentheses
- Parentheses may be used to override the default
- left-to-right parsing of boolean operators.
- dinosaur and (bird or dinobird)
- dinosaur or (bird and dinobird)
- (bird or dinosaur) and (feathers or scales)
- "feathered dinosaur" and (yixian or jehol)
- (((a and b) or (c not d) not (e or f and g)) and
h not i) or j
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
22CQL features pattern matching
- There are two pattern-matching characters
- matches any number of characters
- ? matches any single character
- dinosaur matches dinosaurs, dinosauria
- sauria matches dinosauria, carnosauria
- man?raptor matches maniraptor, manuraptor
- man?raptor matches the plurals of these
- "comp saur" matches complete dinosaur
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
23CQL features pattern matching
- There are two pattern-matching characters
- matches any number of characters
- ? matches any single character
- dinosaur matches dinosaurs, dinosauria
- sauria matches dinosauria, carnosauria
- man?raptor matches maniraptor, manuraptor
- man?raptor matches the plurals of these
- "comp saur" matches complete dinosaur
- A preceding backslash removes their special
meaning. - char\ matches literal char
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
24CQL features indexes
A term of the form namevalue is a query for the
specified value occurring within the named index.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
25CQL features indexes
- A term of the form namevalue is a query for the
specified - value occurring within the named index.
- titleChurchill finds biographies of
Churchill - authorChurchill finds books written by him
- titledinosaur and authorfarlow
- title(dinosaur and bird)
- subject(dinosaur or pterosaur)
- Index names are case-insensitive, so title is
the same - index as TITLE, Title or tiTLe.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
26CQL features prefixes
The meaning of an index can be specified more
fully by a prefix indicating what context set it
is from. The meaning of title is different in
cross-domain searching (Dublin Core),
bibliographic searching (Bath Profile) and
heraldry.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
27CQL features prefixes
- The meaning of an index can be specified more
fully - by a prefix indicating what context set it is
from. The - meaning of title is different in cross-domain
searching - (Dublin Core), bibliographic searching (Bath
Profile) - and heraldry.
- dc.title"the complete dinosaur"
- property.titlefreehold
- heraldry.title(viscount or duke)
- cql.serverChoicefruit
- cql.resultSetYXJjaGJpc2hvcAp
- Prefixes are case-insensitive.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
28CQL features context sets
A context set is a set of indexes that are
related to a particular area (plus some other
more esoteric stuff that you can ignore). For
example, the Dublin Core context set
contains indexes for searching against the
fifteen DC elements title, creator, subject,
description, publisher, contributor, date, type,
format, identifier, source, language, relation,
coverage, rights. The context set prose must
define their semantics.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
29CQL features some context sets
- A few core sets created by the SRW editorial
board - CQL for core indexes such as resultSetId
- DC for metadata searching with Dublin Core
- Rec metadata about the record, not the resource
- Net network concepts such as host-name and port
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
30CQL features some context sets
- A few core sets created by the SRW editorial
board - CQL for core indexes such as resultSetId
- DC for metadata searching with Dublin Core
- Rec metadata about the record, not the resource
- Net network concepts such as host-name and port
- Also, many application-specific sets
- Bath, Zthes, CCG, Music
- Rel deep voodoo for relevance matching
- GILS and GEO are in development
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
31A digression on the CQL context set
The CQL context set is special. It contains some
magic indexes
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
32A digression on the CQL context set
- The CQL context set is special. It contains some
magic - indexes
- cql.anywhere searches in all the indexes
available - cql.serverChoice allows the server to choose
whatever - index or indexes are suitable
- cql.resultSetId finds the records obtained in a
previous - search, e.g. for refinement by combining with
other - query terms.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
33CQL features relations
- Usually connects an index with its relation,
but all the - other obvious numeric relations are supported
- Height 13
- numberOfWheels lt 3
- numberOfPlates 18
- lengthOfFemur gt 2.4
- BioMass gt 100
- NumberOfToes ltgt 3 (inequality)
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
34CQL features special relations
The keywords any and all can be used as
relations, indicating that any one of, or all of,
the words specified in the term must be found in
the index
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
35CQL features special relations
- The keywords any and all can be used as
relations, - indicating that any one of, or all of, the words
specified - in the term must be found in the index
- author all "kernighan ritchie"
- shorthand for
- authorkernighan and authorritchie
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
36CQL features special relations
- The keywords any and all can be used as
relations, - indicating that any one of, or all of, the words
specified - in the term must be found in the index
- author all "kernighan ritchie"
- shorthand for
- authorkernighan and authorritchie
- author any "kernighan ritchie thompson"
- shorthand for
- authorkernighan or authorritchie or
- authorthompson
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
37CQL features whole-field searching
The keywords exact can be used as a relation,
indicating a search for the value of a whole
field rather than words within it
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
38CQL features whole-field searching
- The keywords exact can be used as a relation,
indicating - a search for the value of a whole field rather
than words - within it
- titlejaws
- finds Jaws and The Jaws of Fate.
- title exact jaws
- finds Jaws but NOT The Jaws of Fate.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
39CQL features whole-field searching
- The keywords exact can be used as a relation,
indicating - a search for the value of a whole field rather
than words - within it
- titlejaws
- finds Jaws and The Jaws of Fate.
- title exact jaws
- finds Jaws but NOT The Jaws of Fate.
- title exact "The Jaws of Fate"
- finds The Jaws of Fate but NOT Jaws.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
40Chapter 4 Applications
- CQL has been deployed in many kinds of
application - Google-like structureless searching
- Simple metadata searching with the Dublin Core
- Bath Profile for bibliographic data
- Zthes profile for hierarchical thesaurus
navigation - CCG for collectable card games
- Music musicalKey, arranger, duration, etc.
- GILS (Global Information Locator Service)
- ... your application goes here!
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
41Chapter 5 Implementations
- There are good-quality free CQL implementations
- in several important languages
- Java (Mike Taylor's CQL-Java package)
- C/C (Adam Dickmeiss in Index Data's YAZ)
- Python (Rob Sanderson in Cheshire)
- Perl (Ed Summers' CQLParser module)
- Visual Basic is in development (Thomas Habing)
- ... your language goes here!
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
42Conclusion What to take home
- CQL makes easy queries easy and hard ones
possible - You can use it well without learning the hard
bits - It is used in SRW/SRU but also applicable
elsewhere - It is extensible through context sets
- Existing context sets support lots of
applications - There are free implementations in several
languages - Tutorial on-line at
- http//zing.z3950.org/cql/intro.html
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt