Oracle Database 11g New Search Features and Roadmap - PowerPoint PPT Presentation

About This Presentation

Title:

Oracle Database 11g New Search Features and Roadmap

Description:

Title: Slide 1 Description: This presentation contains information proprietary to Oracle Corporation Last modified by: kiss Created Date: 9/8/2004 11:34:22 PM – PowerPoint PPT presentation

Number of Views:246

Avg rating:3.0/5.0

Slides: 28

Provided by: peopleIn6

Category:

more less

Transcript and Presenter's Notes

Title: Oracle Database 11g New Search Features and Roadmap

1
(No Transcript)
2
Oracle Database 11g New Search Features and
Roadmap

Roger Ford
Senior Principal Product Manager

3
Contents
ltInsert Picture Heregt

Oracles Search Products
Oracle Text 11g New Features
Oracle Text 11.2.0.2 New Features
Entity Extraction
Name Search
Result Set Interface
Search Product Roadmap
Oracle Text
Secure Enterprise Search

4
Oracles Search Products

Oracle Text
A SQL and PL/SQL based toolkit for creating
full-text search applications
Free with all database versions
Previously known as Context Option, interMedia
Text
Secure Enterprise Search
A complete search based on Oracle Text
capabilities
Crawlers for datasources such as web, email,
document repositories, databases
End-user query application and APIs for embedding

5
Oracle Text 11g New Features

Composite Domain Indexes and SDATA sections
Allows storage of structured info (eg numbers,
dates) within text index
Makes for much faster mixed queries
Auto Lexer
Automatic Language Recognition
Segmentation and Stemming for 32 languages
Context-sensitive stemming for 23 of these
languages
Off-line and time-limited index creation
Enables rebuild of indexes offline in quiet
periods for true 24x7 operation

6
Demo Auto Lexer
7
11.2.0.2 New Features - Summary

Entity Extraction
Find entities such as people, countries,
cities, states, zip codes, phone numbers etc from
the text
Use default dictionary and rules or define your
own dictionary and rules based on regular
expressions
Name Search (NDATA sections)
Inexact searches, copes with mis-spellings,
segmentation errors, contractions and word
reversal
Useful for many searches, but particular good for
names
ResultSet Interface
Query request in XML and results returned as XML
Avoids SQL layer and requirement to work within
SELECT semantics

8
Entity Extraction

Indentify names, places, dates, times, etc
Tag each occurence with type and subtype
Entities are defined by DICTIONARY and RULES
Implemented by CTX_ENTITY package
create_extract_policy create a policy to which
you can add extract rules
Choose to use/not use built in rules and
dictionary
add_extract_rule create an XML-based rule to
define an entity
add_stop_entity prevent defined entities from
being used
compile build the policy with its rules
extract get an XML-based list of entities for a
doc
Also can use ctxload to load user dictionary

9
Demo Entity Extraction
10
Entities built-in types

building
city
company
country
currency
date
day
email_address
geo_political
holiday
location_other
month
non_profit
organization_other

percent
person_jobtitle
person_name
person_other
phone_number
postal_address
product
region
ssn
state
time_duration
tod
url
zip_code

11
Entity Extraction Example 1 Defaults

ctx_entity.create_extract_policy('my_default_polic
y')
ctx_entity.compile('mypolicy')
ctx_entity.extract('mypolicy', mydoc, mylang,
myresults)
Output in "myresults"
ltentitiesgt
ltentity id"0" offset"75" length"8"
source"SuppliedDictionary"gt
lttextgtNew Yorklt/textgt
lttypegtcitylt/typegt
lt/entitygt
ltentity id"1" offset"55" length"16"
source"SuppliedRule"gt
lttextgtHupplewhite Inc.lt/textgt
lttypegtcompanylt/typegt
lt/entitygt
lt/entitiesgt

12
Entity Extraction Example 2 User rule

ctx_entity.create_extract_policy('mypolicy')
ctx_entity.add_extract_rule('mypolicy', 5,
'ltrulegt ltexpressiongt((NorthSouth)?
America)lt/expressiongt
lttype refid"1"gtxContinentlt/typegt
lt/rulegt')
ctx_entity.compile('mypolicy')
ctx_entity.extract('mypolicy', mydoc, mylang,
myresults)
Note parentheses around expression. refid"1"
means take the first expression in paren so
"North America" or just "America".
User defined types must be prefixed with a "x"
hence "xContinent"
ltentitiesgt
ltentity id"0" offset"75" length"13"
source"UserRule"gt
lttextgtNorth Americalt/textgt
lttypegtxContinentlt/typegt
lt/entitygt
lt/entitiesgt

13
Ent Ext Adding a user dictionary

Create file ud.xml
ltdictionarygt ltentitiesgt
ltentitygt ltvaluegtDow Jones Industrial
Averagelt/valuegt lttypegtxIndexlt/typegt lt/entitygt
ltentitygt ltvaluegtSampP 500lt/valuegt
lttypegtxIndexlt/typegt lt/entitygt
ltentitiesgt lt/dictionarygt
Create the policy with CTXLOAD (can add rules
later)
ctxload -user scott/tiger -extract -name pol1
-file ud.xml
Compile the policy
ctx_entity.compile('pol1')
Results
ltentity id"69" offset"1010" length"7"
source"UserDictionary"gt
lttextgtSampP 500lt/textgt
lttypegtxIndexlt/typegt
lt/entitygt

14
Entity Extraction other stuff

Extracting only certain entity types
ctx_entity.extract('p1', mydoc, null, myresults,
'city,company,xContinent')

15
Name Search

Searching names has many difficulties
Spelling (steven stephen)
Alternate Names (fred alfred, chuck charles)
Transcription (copying from spoken to written
form)
Transliteration (copying from one writing system
to another)
Segmentation (Mary Jane, Maryjane)
First, Middle, and Last Name Classification
Name search does intelligent matching across all
these issues

16
Demo Name Search
17
NDATA section type

Basic implementation for name search
Limitations
511 characters
255 whitespace-delimited terms
No offset information, therefore no
Highlighting / Markup
NEAR or phrase search with NDATA
Uses WORDLIST preference attributes
NDATA_ALTERNATE_SPELLING
NDATA_BASE_LETTER
NDATA_THESAURUS (for alternate names default
thesaurus provided)
NDATA_JOIN_PARTICLES (list such as
'dedumcmac')
Query Syntax
NDATA(fieldname, search terms , order ,
proximity )

18
Result Set Interface

Some queries are difficult to express in SQL
eg "Give me the top 5 hits in each category"
Result set interface uses a simple text query and
an XML result set descriptor
Hitlist is returned in XML according to result
set descriptor
Uses SDATA sections for
Grouping
Counting

19
Result Set Example Query

ctx_query.result_set('docidx', 'oracle',
'ltctx_result_set_descriptorgt
ltcount/gt
lthitlist start_hit_num"1"
end_hit_num"2" order"pubDate desc, score desc"gt
ltscore/gt ltrowid/gt
ltsdata name"author"/gt
ltsdata name"pubDate"/gt
lt/hitlistgt
ltgroup sdata"pubDate"gt
ltcount/gt
lt/groupgt
ltgroup sdata"author"gt
ltcount/gt
lt/groupgt
lt/ctx_result_set_descriptorgt ', rs)

20
Result Set Output

ltctx_result_setgt
lthitlistgt
lthitgt
ltscoregt3lt/scoregtltrowidgtAAAPoEAABAAAMWsAAClt/r
owidgt
ltsdata name"AUTHOR"gtJohnlt/sdatagt
ltsdata name"PUBDATE"gt2001-01-03
000000lt/sdatagt
lt/hitgt
lthitgt
ltscoregt3lt/scoregtltrowidgtAAAPoEAABAAAMWsAAGlt/r
owidgt
ltsdata name"AUTHOR"gtJohnlt/sdatagt
ltsdata name"PUBDATE"gt2001-01-03
000000lt/sdatagt
lt/hitgt
lt/hitlistgt
ltcountgt100lt/countgt

21
Result Set Output - Continued

ltgroups sdata"PUBDATE"gt
ltgroup value"2001-01-01 000000"gtltcountgt25lt/
countgtlt/groupgt
ltgroup value"2001-01-02 000000"gtltcountgt50lt/
countgtlt/groupgt
ltgroup value"2001-01-03 000000"gtltcountgt25lt/
countgtlt/groupgt
lt/groupsgt
ltgroups sdata"AUTHOR"gt
ltgroup value"John"gtltcountgt50lt/countgtlt/groupgt
ltgroup value"Mike"gtltcountgt25lt/countgtlt/groupgt
ltgroup value"Steve"gtltcountgt25lt/countgtlt/groupgt
lt/groupsgt
lt/ctx_result_setgt

22
Preview
23
Roadmap merging Text and SES
Secure Enterprise Search
Oracle Text
Full Control
Full Featured

Fine-grained Index Options
Data Storage Options
Lexer Options
Stoplists
Use existing database
RAC, Exadata

Built in database and mid-tier
Crawlers for many sources
Simple Query Interface
End user GUI / API
Embedded security

24
Coming Search Features

Natural Language Processing enhancements
Ontology based classification
Question answering
Automatic Partitioning
Query load load balancing
Full support for facetted navigation (MVDATA
sections)
Functional completeness for Result Set Interface
Result Iterator streaming support
Parallel Query
Replication Support
Golden Gate / Logical Standby / Streams
Operator improvements
NEAR2 best query in one operator
MNOT mild not, eg YORK mnot NEW YORK
Nested near
Substring index and query performance improvements

25
Coming Search Features - Continued

Multiple enhancements to query performance
BIGIO leverages Secure Files CLOBs
Automatic optimization of indexes with stage
index
Two level index keep common search terms in
memory
Partition maintenance without reindexing
Off-load filtering from database server
Section specific index options
Choose different options, eg language, stopwords,
PRINTJOINS for each section
Regular expression based stopwords
Forward Index
Hugely improved performance for highlighting,
snippets
PDF Native Highlighting
Unlimited SDATA, MDATA and Field Sections

26
The preceding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into
any contract. It is not a commitment to deliver
any material, code, or functionality, and should
not be relied upon in making purchasing
decisions.The development, release, and timing
of any features or functionality described for
Oracles products remains at the sole discretion
of Oracle.
27
(No Transcript)

Write a Comment

User Comments (0)