Title: E-Commerce
1E-Commerce
2Outline
- Introduction
- Customer Data on the Web
- Automated Recommender Systems
- Networks and Recommendations
- Web Path Analysis for Purchase Prediction
3Introduction
- Some Motivating Questions
- Can we design algorithms to help recommend new
products to visitors based on their browsing
behavior? - Can we better understand factors influencing how
customers make purchases on a website? - Can we predict in real time who will make
purchases based on their observed navigation
patterns?
4Customer Data on the Web
- Data collection on client, server sides and
anywhere in between - Goal determine who is purchasing what products
- Tracking customer data
- Web logs, E-Commerce logs, cookies, explicit
login - Data then used to provide personalized content to
site users to - Assist customers in locating their target
selections - Encourage customers to make certain selections
5Automated Recommender Systems
- Problem framed in two ways
- Users vote for pages/items (binary)
- Users rank pages/items (multivalued)
- Results are captured in a generally sparse matrix
(users x items) - Complication no votes can occur because users do
not vote on items they do not like (Breeze, et al
1998) - Ignored by most recommender systems
6Automated Recommender Systems
7Evaluating Recommender Systems
- Cautions in data interpretation
- Users may purchase items regardless of
recommendations - Users may also avoid purchases they might have
made based on recommendations - Approaches to recommender algorithms
- Nearest-neighbor
- Model-based collaborative filtering
- Others?
8Nearest-Neighbor Collaborative Filtering
- Basic principle utilize users vote history to
predict future votes/recommendations - Find most similar users to the target user in the
training matrix and fill in the target users
missing vote values based on these
nearest-neighbors - A typical normalized prediction scheme
- goal predict vote for item j based on
other users, weighted towards those with
similar past votes as target user a
9Nearest-Neighbor Collaborative Filtering
- Another challenge defining weights
- What is the most optimal weight calculation to
use? - Requires fine tuning of weighting algorithm for
the particular data set - What do we do when the target user has not voted
enough to provide a reliable set of
nearest-neighbors? - One approach use default votes (popular items)
to populate matrix on items neither the target
user nor the nearest-neighbor have voted on - A different approach model-based prediction
using Dirichlet priors to smooth the votes (see
chapter 7) - Other factors include relative vote counts for
all items between users, thresholding, clustering
(see Sarwar, 2000)
10Nearest-Neighbor Collaborative Filtering
- Structure based recommendations
- Recommendations based on similarities between
items with positive votes (as opposed to votes of
other users) - Structure of item dependencies modeled through
dimensionality reduction via singular value
decomposition (SVD) aka latent semantic indexing
(see chapter 4) - Approximate the set of row-vector votes as a
linear combination of basis column-vectors - i.e. find the set of columns to least-squares
minimize the difference between the row
estimations and their true values - Perform nearest-neighbor calculations to project
predictions for all items
11Model Based Collaborative Filtering
- Recommendations based on a model of
relationships between items based on historical
voting patterns in the training set - Better performance than nearest-neighbor analysis
- Joint distribution modeling
- Uses one model as basis for predictions
- Conditional distribution modeling
- A model for each item predicting future vote
based on votes for each of the other items
12Model Based Collaborative Filtering
- Joint distribution modeling A practical approach
- Model joint distribution as a finite mixture of
simpler distributions - Additional simplification is achieved by assuming
that votes are independent of others within a
component - Limitation assumes that users can be described
with one model of the K mixture components - Hoffman and Puzicha (1999) propose a workaround
asserting that each row of votes represents up to
K mixture components, rather than a single
component
13Model Based Collaborative Filtering
- Another limitation all predictions are based on
the (static) training set - Conditional distribution modeling
- Better results by creating a model for each item
conditioned on the others rather than using a
single joint density model - Decision trees Heckerman et al. (2000)
- Greedy approach to approximate tree structure
- Predictions are made for each item not purchased
or visited - Performance
- Accuracy nearly equal to Bayesian networks
- Offline memory usage significantly less than
Bayesian networks - Offline computation time complexity better than
Bayesian networks
14Model-Based Combining of Votes and Content
- Combine content-specific information with other
information (e.g. structure, vote) - Useful for determining item similarity (Mooney
and Roy 2000) and creating user models - Useful when there is no vote history
- Implementation (Popescul et al 2000)
- Extension of (Hoffman and Puzicha 1999)
- Joint density is determined assuming a hidden
latent variable making users, documents, and
words conditionally independent i.e.
15Model-Based Combining of Votes and Content
- The hidden variable represents multiple (hidden)
topics of a document - Conditional probabilities of the hidden parameter
are calculated using EM - Sparsity still remains a problem for
content-based modeling
16Challenges
- Noisy Data
- The same user may use multiple IP
addresses/logins - Different users may use the same IP address/login
- Privacy
- No cookies!
- Changing user habits
- Previous history may not accurately predict
present purchase selection - Continuous updating of user activities
17Networks Recommendation
- Word-of-Mouth
- Needs little explicit advertising
- Products are recommended to friends, family,
co-workers, etc. - This is the primary form of advertising behind
the growth of Google
18Email Product Recommendation
- Hotmail
- Very little direct advertising in the beginning
- Launched in July 1996
- 20,000 subscribers after a month
- 100,000 subscribers after 3 months
- 1,000,000 subscribers after 6 months
- 12,000,000 subscribers after 18 months
- By April 2002 Hotmail had 110 million subscribers
19Email Product Recommendation
- What was Hotmails primary form of advertising?
- Small link to the sign up page at the bottom of
every email sent by a subscriber - Spreading Activation
- Implicit recommendation
20Spreading Activation
- Network effects
- Even if a small number of people who receive the
message subscribe (0.1), the service will
spread rapidly - This can be contrasted with the current practice
of SPAM - SPAM is not sent by friends, family, co-workers
- No implicit recommendation
- SPAM is often viewed as not providing a good
service
21Modeling Spreading Activation
- Diffusion Model
- Montgomery (2002)
- Applied models used in marketing literature, Bass
(1969) to the hotmail phenomena - Similar word-of-mouth networks used in selling
consumer electronics such as refrigerators and
televisions - We want to predict at time t how many individuals
k(t) will adopt the product out of a population
of N possible adopters
22Modeling Spreading Activation
- Diffusion Model
- Two ways individuals will subscribe
- Direct Advertising
- At time t, N k(t) individuals have not
subscribed - a 0 percent of these individuals will subscribe
due to direct advertising - Word-of-Mouth
- At time t, there are k(t)(N k(t)) possible
connections between subscribers and
non-subscribers - ß 0 percent of these connections will cause a
non-subscriber to subscribe
23Modeling Spreading Activation
- Combine these and we get the following
expression - Solve this and we get
24Modeling Spreading Activation
25Modeling Spreading Activation
26Modeling Spreading Activation
- Diffusion Model
- This does not completely model the what actually
occurred - However, it is simple and provides a lot of
interesting (useful) information - Other work
- Domingos Richardson (2001) Markov Random Field
Model - Daley Gani (1999) various deterministic and
stochastic models
27Purchase Prediction
- We want to predict whether or not a shopper will
make a purchase - We know demographics
- We know page view patterns
- Can we accurately predict if the user will make a
purchase or not?
28Purchase Prediction
- Li et al. (2002)
- Study 1160 shoppers at www.barnesandnoble.com
between April 1 and April 30, 2002 - The data was collected client side so they knew
exactly what pages were displayed to the user - They also knew the demographics (predominantly
well-educated and affluent)
29Purchase Prediction
- Li et al. (2002)
- There were 14,512 page views which they divided
into 1659 sessions - Mean 8.75
- Median 5
- Standard deviation 16.4
- Min 1
- Max 570
- 7 of sessions contained a purchase
30Purchase Prediction
- Li et al. (2002)
- Divided the pages into 8 classes
- Home (H), main page
- Account (A), account information pages
- List (L), pages with lists of items
- Product (P), page with a single item
- Information (I), informational pages (shipping
etc.) - Shopping cart (S)
- Order (O), indicates a completed order
- Entry or Exit (E), entering or leaving the site
31Purchase Prediction
- Li et al. (2002)
- Each session was represented by a string of the
form I H H I I L I I E - A session containing an O is considered having
made a purchase - The average length of a session with a purchase
was 34.5 and without was only 6.8
32Purchase Prediction
- Markov transition matrix
- For sessions with no purchase
33Purchase Prediction
- Li et al. (2002)
- They did several models based on this data
- Tested on predicting next page and predicting a
purchase - Best models 64 accurate at predicting next page
- After 2 page views the best models predicted 12
true positives and 5.3 false positives - After 6 page views 13.1 true positives and 2.9
false positives