Title: Your Grandmother Doesn
1Your Grandmother Doesnt LikeSurprisesA case
study of ANMs Travel Site
- Jeffrey Catlin Lexalytics, Inc.
- Bob Pierce Fast Search Transfer
- May 3, 2005
2Overview
- Project Overview
- Project Goals
- Technology Elements
- Site Features
- Improved Search
- Automated Processing of Hotel Reviews
- Knowledge Management in Action
- Sentiment / Tone capability is unique and fully
automated - Improvement over 1 to 5 star ratings
- Customer Reaction and Futures
- Go Live for this site
- Other sites utilizing this technology
- Contacts
- Jeff Catlin jeff_at_lexalytics.com
- Bob Pierce bob.pierce_at_fastsearch.com
3Project Overview
- ANM Associated News Media is a publisher in the
UK that is leveraging its content to reach into
Internet Applications like Travel - Project Goals
- Improve Stickiness of the site, which is key to
generating more add Dollars - Improve and simplify the search features of the
site, including sorting by a variety of field
types and making search available throughout the
site - Expose and Automate user reviews. Providing
accurate and ready access to user reviews
improves stickiness and acceptance of the site - Reduce the cost of utilizing user reviews
- Dramatically increase the breadth of coverage of
user reviews
4Project Overview
- Technology Elements
- ANM
- Custom Application interface
- Utilizing FAST ESP for search features
- FAST Marketrac
- FAST ESP provides Application Search Features
- FAST Content Processing Pipeline and web spider
for reviews - Lexalytics
- Salience Server for Scoring hotel and travel
reviews - Sentiment Toolkit Build out a travel focused
Sentiment/Tone database
5Site Features
6Site Features 4 starNYC (the best)
7Site Features-4 StarNYC (the worst)
8Knowledge Management in Action
- Trustworthy User Reviews are a key to the
stickiness of the site - Reviews are obtained through feeds and spidering
- Feeds IgoUgo Fodors
- Spidering tripadvisor.com virtualtourist.com
- Reviews are monitored and updated continuously
and processed through the FAST Content Processing
Pipeline - Automated reviews are more consistent, trusted
and up to date than star ratings - Unique feature
- Totally automated and more consistent than human
ratings
9Knowledge Management in Action
- How does it all work?
- Lexalytics provides out-of-the-box sentiment tone
analysis - Toolkit to build scoring databases for verticals
like travel, finance, security - System builds up a dictionary of scored phrases
that indicate good or bad depending on the
vertical its used for - Phrase scores are determined using a training set
and msn search - Scores are measuring nearness of phrases with
good and/or bad terms - Results in a phrase dictionary with phrases like
- Sunny Day 1.2706
- Unsafe food -0.7634
- The Lexalytics Salience Server is embedded within
FASTs Marketrac product, so integration of
sentiment/tone is very straightforward
10Knowledge Management in Action
Lets drill in to see how reviews are scored
11Knowledge Management in Action
Lets score this review
12Knowledge Management in Action
- Looking at the scoring of an individual review
- Review for Marriott Marquis
- Great stay, no elevator problems
- Reviews are scored, averaged and displayed on a 1
to 10 scale
13Customer Feedback
- Customer is pleased with the site
- Goes live today (5/3/05)
- Tuning of the hotel scoring has allowed the
customer to put their own touch on the system,
giving them a unique offering - Combination of information discovery features and
integrated booking should allow ANM to compete
with any of the well known travel sites.
14Information Intelligence Examples
- Financial news and market analysis
- Market intelligence portal and alerts for brokers
- Pharmaceutical competitive analysis
- Tracking molecules, drugs and companies in the
rear-view mirror - Intellectual property protection
- Content similarity analysis and alerting
- Illegal e-commerce
- Contraband trafficking and the whack-a-mole
problem - Cracking pornography rings
- Automated image analysis
- Chat room monitoring and alerting
- Threat detection and analysis
15Market Intelligence in Financial Services
- Leading European financial services group
- Capital markets, insurance, real estate, asset
management, securities - Goal Trade more competitively, create better
analyst reports - Leveraged FAST ESP and FAST Marketrac
- Collect actionable information ahead of general
market availability - Premium sources, blogs, local web sites, research
reports, etc. - Real-time, personalized analysis
- Search domains selected by individual analysts
- Correlate price movements with related news
- Analyze news flow for market-moving potential
- Communicate and act
- Minimal latency
- Profile-based SMS/e-mail alerting
- Automated morning reports
16Because Timing is Money First-mover Advantage
in Markets
17Accelerate The Decision Cycle
BETTER Decisions, FASTER!
After
ACT
Decide
Analyze
Discover
Gather
ACT
Decide
Analyze
Search/Gather
Identify
Before
Decision point
Decision point
Impact point
time
18Futures
- Text analysis software has matured to the point
where powerful applications can be deployed at a
reasonable expense and high degree of confidence - Search and Text Analysis will play an
increasingly important part in Business
Intelligence, High Volume Storage and Consumer
Electronics - Entity extraction is relatively mature and fairly
high-quality - Classification (subject and tone) is being
deployed in real-world apps - Relationships between content elements is on the
short-term horizon
19Intellectual Property ProtectionWhen
Information, Time are the Assets
- Article extraction from websites
- Computation of similarity primitives
Validate content and determine changes
Seed URL DB Target Site profile
Real-Time Content Analysis
Detected matches
WWW
Check similarity
Detailed similarity check
Document
Document
...The Wimbledon
and U.S. Open
...The Wimbledon
champion,
and U.S. Open
seeded second,
Similar doc.
champion,
breezed past...
...The Wimbledon
seeded second,
...The Wimbledon
Similarity vector
and U.S. Open
Crawler
breezed past...
and U.S. Open
ltwimbledon, 1US
champion,
champion,
seeded second,
open,
seeded second,
breezed past...
breezed past...
0,7champion,
0,6gt
Notification, Enforcement
Similarity
Similarity
Results
Queries
Sequential analysis compares longest common
subsequenceand maximum overlap.
API - Similarity
Real-time index
IP Database