Title: HyKSS: Hybrid Keyword and Semantic Search
1HyKSS Hybrid Keyword and Semantic Search
Andrew Zitzelberger
1
2Keyword Search
2
3Form Based Search
3
4What about?
over 8,000 meters in elevation
less than 100K miles
faster than 100 mph
4
55
6HyKSS
- Hybrid Keyword and Semantic Search
- Semantics extracted annotations
- Multiple ontologies
- Keywords text
6
7Thesis Statement
- HyKSS (hybrid search)
- Outperforms keyword and semantic search
- Dynamic query weighting outperforms various other
hybrid search approaches - Allows queries over multiple ontologies
- Allows pay-as-you-go improvement
7
8Extraction Ontologies
8
9Data Frames
9
10Indexing Architecture
Document Collection
Keyword Indexer
Semantic Indexer
Keyword Index
Semantic Index
10
11Indexing Architecture Implementation
Ontology Library
Lucene
OntoES
Sesame
11
11
12Query Processing
Free Form Query
Keyword Processing
Semantic Processing
Pre-Process Query
Pre-Process Query
Execute Query
Execute Query
Post-Process Query
Post-Process Query
Combine Results
12
13Keyword Query Pre-Processing
- Remove Lucene special characters (except quotes)
- Remove (inequality) comparison constraints
- Remove non-phrase stopwords
- hondas in "excellent condition" in orem for under
12 grand - hondas excellent condition orem
13
14Keyword Query Execution and Post-Processing
- Executed by Lucene
- Empty Post-Processing step
14
15Semantic Query Pre-ProcessingIndividual Ontology
Scoring
hondas in "excellent condition" in orem for
under 12 grand
15
16Semantic Query Pre-ProcessingOntology Set
Creation
- For each ontology sorted by score
- For each remaining ontology
- Add point for each new or subsuming match
- If added points gt 0 add ontology
- Completely subsumed ontologies are removed during
query generation
16
17Semantic Query Pre-ProcessingOntology Set
Creation
Location
Vehicle
Price lt 12000
US_Cityorem
Vehicle
Price lt 12000
Vehicle_Score 1
ContractualServices
Location
Contractual Services
Price lt 12000
US_Cityorem
ContractualServices_Score 1
Vehicle_Score
17
18Semantic Query Pre-ProcessingStructured Query
Generation
- Open world assumption
- SPARQL query
18
19Semantic Query Execution and Post-Processing
- Sesame query execution
- Semantic ranking
- 1 point for each requested projection satisfied
- Normalized by of projections requested
- hondas in "excellent condition" in orem for under
12 grand - Projections on Make, Price and US_City
19
20Hybrid Query Processing
- Linear interpolation
- (kw_weight kw_score) (sm_weight sm_score)
- Dynamic solution
- keywords remaining (kw)
- concept match score (cms)
- ½ (selections projections)
- kw_weight kw/(kw cms)
- sm_weight cms/(kw cms)
20
21Basic Search
21
22Results Display
22
23Form Based Search
23
24Results Display
25Experimental Setup Ontology Libraries
- 5 Ontology Levels
- Number
- Generic Units
- Vehicle Units
- Vehicle
- Vehicle
25
26Experimental Setup Query Sets
- 113 syntactically unique queries from database
students - 60 syntactically unique queries from linguistic
students
26
27Experimental Setup Document Collection
- 250 vehicle advertisements (Craigslist)
- 100 training, 50 validation, 100 test
- 318 mountain pages (Wikipedia)
- 66 roller coaster (Wikipedia)
- 88 video game advertisements (Craigslist)
27
28Experiments
- Training queries over test vehicle documents
- Test queries over test vehicle documents
- Training queries over test vehicle documents
additional noise - Test queries over test vehicle documents
additional noise - 5 queries over noisy data (Generic Units only)
28
29Experiments - Metric
29
30Experimental Results
30
31Experimental Results
31
32Experimental Results
32
33Conclusions
- Hybrid search outperforms keyword and semantic
search - HyKSSs dynamic query weighting approach
outperforms various other weighting techniques - Using multiple does not outperform selecting and
using a single ontology
33
34External Image Citations
- Slide 2 Google search screenshot
http//www.google.com (07/30/11) - Slide 3 partial car search form screenshots
http//autotrader.com/fyc (07/30/11) - Slide 4 mountain image http//en.wikipedia.org/wi
ki/Lhotse (04/26/11) - Slide 4 car image http//en.wikipedia.org/wiki/Ho
nda (04/26/11) - Slide 4 roller coaster image http//en.wikipedia.
org/wiki/Kingda_Ka (04/26/11) - Slide 4 Wikipedia logo http//en.wikipedia.org/wi
ki/Main_Page (04/26/11) - Slide 4 craigslist logo http//provo.craigslist.o
rg/ (04/26/11)
34