MARK2039 - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

MARK2039

Description:

Lecture 7 MARK2039 Summer 2006 George Brown College Wednesday 9-12 – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 44
Provided by: LukeF5
Category:
Tags: mark2039 | upsell

less

Transcript and Presenter's Notes

Title: MARK2039


1
Lecture 7
  • MARK2039
  • Summer 2006
  • George Brown College
  • Wednesday 9-12

2
Exam
a)Acct ID, Date of Promotion, credit score,
postal code b)Index account ID and make the DB
relational
3
Exam
1)Col A mean8086, median120 COL. B
mean15,median15 Col. C mean 16840
median20000 2)Normal dist. Is B because mean
and median are same. 3)Median as it is not skewed
by otliers
4
Exam
Str. A std.dev .00189 CI .0361lt.0380lt.03989
Str.B std.dev.003 -CI .014lt.02lt.026
Do not use either strategy and continue with
existing strategy
5
Exam
6
Exam
a)Cubeb)dimensionsproduct type,1st digit of
postal code,payment type Measure acct Id c)Give
me count of all customers who bought prod. A with
cash
Determine number of customers in postal code,
determine number of persons in postalcode from
Stats Can data. Create penetration index Number
of customers/ number of persons at postal code.
Rank postal codes by penetration index and use
ranked postal codes to targetprospects.
7
Exam
Stats Can Census is richer as it has more
records(50000 vs. 28000 for taxfiler
Advantage of using Taxfiler data is that data is
more recent
  1. Implementation
  2. Reducing costs
  3. Must be one to one in analytical file
  4. Standard deviation or variation

8
Exam
Sample A , although std. dev. is larger, if we
look at std. dev. on a relative basis when
comparing to the range or magnitude of values in
the sample, we will observe that we are getting
a much tighter bound around A rather than B
Legacy billing or call detail files,external
data such as Stats CanAdvantage to building data
mart is the following -data aggregated and
summarized-easier to use for analysis -Quicker
processing -Easier intrpretation as data deals
solely with functional area
9
Exam
i)No,ii)yes,iii)No,iv)No,v)yes
Prom.Date-interval,not useful,only one
value Prom.codes-nominal-not useful-too
granular Income-interval-not useful too many
missing values Number of children
interval-useful-few missing values Credit decile
rank ordinal-useful-0 missing values
10
Creating the Analytical File-Reviewing Data Dumps
  • Initial dump of 1st few records

11
Creating the Analytical File-Reviewing Data Dumps
Initial dump of 1st few records
12
Creating the Analytical File-Reviewing Data Dumps
View of the Transaction File
13
Creating the Analytical File-Reviewing Data Dumps
View of the Promo History File
14
Creating the Analytical File-Reviewing Data Dumps
  • Using your marketing knowledge, give me examples
    of variables that we might create from the last
    three slides
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 11 Age, region of country, tenure
  • Slide 12 Total Amount, Total amount for a given
    product, and recency of purchase.
  • Slide 13 Total promotions, Total Promotions by
    Type and recency of last promotion

15
Creating the Analytical File-Data Hygiene and
Cleansing
  • Once the data has been dumped in order to view
    records, typically data hygiene and cleansing
    have to take place
  • Two key deliverables
  • Clean name and address information
  • Standard rules for coding of data values

16
Creating the Analytical File-Data Hygiene and
Cleansing
  • Clean Name and Address Information
  • Market to right Individual
  • Create Match keys

17

Creating the Analytical File Name and Address
Standardization
  • Clean Name and Address Information
  • Market to right Individual
  • Create Match keys
  • Name and Address Standardization

18
Creating the Analytical File-Name and Address
Standardization
  • DATA CLEANING
  • Address correction
  • Name parsing
  • Genderizing
  • Casing

19
Creating the Analytical File-Merge Purge of Names
  • What are the reasons for creating unique match
    customer keys
  • Generating a marketing list
  • Conducting analysis

Should the match keys be the same for both above
scenarios? No, tighter matchkeys in generating
lists and looser matchkeyswhen conducting
analysis
What are the situations when match keys that are
numeric? When dealing with existing customer data
where you are matching Files involving only
existing customer data.
20
Creating the Analytical File-Merge Purge of Names
  • Common fields to use in creating Match keys
  • First Name
  • Surname
  • Unique Individual ID
  • Postal Code
  • Credit Card Number
  • Duns Number for Businesses
  • Phone NumberUnique I.Ds or number type I.Ds
    are the preferred choice when creating match
    keys
  • Lets take a closer look at creating match keys
    using name and address

21
Creating the Analytical File-Merge Purge of Names
  • Lets take a look at 6 records and see what this
    means.

22
Creating the Analytical File-Merge Purge of Names
  • Example You have one record here
  • Richard Boire-4628 Mayfair Ave. H4B2E5
  • How would you use the above information for a
    backend analysis if I were a responder to an
    acquisition campaign?BOIREH4B2E5
  • What about if you were conducting analysis on me
    as an existing customer who responded to a
    cross-sell campaign.
  • Need only customer id
  • How about if you wanted to send me a direct mail
    piece
  • BOIRERICHARDH4B2E54628MAYFAIR

23
Creating the Analytical File- Data standardization
  • Refers to a process where values from a common
    variable from different files are mapped to the
    same value. Some common examples
  • SIC Code Industry Classification Table
  • Industry categories have common set of codes
  • Postal Code Variable
  • Postal code has to have 6 digits comprised of
    alpha,numeric,alpha,numeric,alpha,numeric which
    exclude the following alphas D,F,O,Q,U, and Z.
  • Give me examples of bad postal codes vs. good
    postal codes.
  • D4B2E5, H442E6,etc. are bad postal codes.
  • M5J1A1, A1A1A3,etc. are good postal codes

24
Creating the Analytical File- Data Standardization
  • Here is an example of how disposition codes for
    telemarketing outcomes might be handled

25
Creating the Analytical File- Data Standardization
  • Postal Code Standardization
  • Six digit code comprising Alpha,numeric,alpha,nume
    ric,alpha,numeric
  • 1st letters A,B,C,E,G,H,J,K,L,M,N,P,R,S,T,V,X,Y
  • SIC(Standard Industry Code Classification
  • 4 digit code used to classify all companies into
    standard set of industries

26
Creating the Analytical File- Data standardization
  • Example
  • You have been asked to build retention model You
    have two years worth of transaction data.Changes
    in the product category codes occurred six months
    ago. Key information that you would look at would
    be as follows
  • Income category
  • Product Category
  • Transaction Codes
  • Transaction Amount
  • Postal Code
  • Transaction Date
  • Gender
  • What would you need to do
  • Need to map the old product category code
    definitions from prior to six months ago to the
    new product category code definitions

27
Creating the Analytical File- Geo-Codingn
  • Geocoding is the process that assigns a
    latitude-longitude coordinate to an address. Once
    a latitude-longitude coordinate is assigned, the
    address can be displayed on a map or used in a
    spatial search.
  • Data miners often use these coordinates to
    calculate such things as distance to the nearest
    store

28
Demographic Analysis
Geo Profile
Age Distribution
Average Age
Population Count
Store Location
29
Creating the Analytical File-What is Geocoding?
  • Lets look at a sample of what some data might
    look like?

How do we use this data to create meaningful
variables? -using the pythagorean theorem where
distance2lat2 longitude2. This is
extremely useful in calculating distance
typevariables between a customer and a given
location
30
Creating the Analytical File-What is Geocoding
  • Example
  • A retailer has the following information
  • Name and address of its customers
  • Address of its stores
  • Stats Can Information
  • As a marketer, how would you intelligently use
    this information
  • Find the distance between the nearest store and a
    given customer.
  • Create a trading area around a given store. Find
    out which stores have the best penetration. At
    the same time, analyze these best penetration
    stores and determine some key stats can
    attributes around these best penetration stores

31
Frequency Distribution
  • The report below uses first digit of postal code
    to assign customers to region.
  • For example, postal codes beginning with G,
    H, or J represent the Quebec region.

Region of Customers of Total Prairie
Provinces 25 M 2.5 Quebec 100 M 10 Ontario 350
M 35 West 25 M 2.5 Missing Values 500
M 50 Total 1 MM 100
Customer Profiling
32
Frequency Distribution
33
Frequency Distribution
34
Creating Variables
  • Source/ Raw File Variables
  • in Household
  • Income
  • Credit score
  • Total lifetime spend
  • Total number of promotions
  • Derived Variables
  • Region of country
  • Total spend within certain period
  • Age
  • Tenure
  • Number of promotions in last year by campaign
    category
  • Example of source variables
  • Example of derived variables

35
More Creations
  • Other variables
  • Total spend in certain time periods
  • Total spend by product category in certain time
    periods
  • Decline in spend-total by product type
  • Trend variables related to spending and product
    category
  • Median
  • Mean
  • Variation
  • Index Variables
  • Grouping of variable into meaningful categories
    where category values are index values
  • Binary Variables-yes/no type variables such as
    gender

36
Creating the Analytical File-Reviewing Data Dumps
View of the Transaction File
  • What kind of variables can be derived.

37
Creating Binary Groups
38
Creating Indices
of Months
of
Response
Response
Months
Since Last
Customers
Rate
Index
Since Last
Promotion
Promotion
1
16
2.50
0.71
0.57
0.62
2
16
1.50
0.43
3
16
3.75
1.07
1.00
1.00
4
16
3.25
0.93
5
16
6.00
1.71
1.43
1.43
6
16
4.00
1.14
Average
100
3.50
1.00
39
More Variable Creation
  • What would you do here
  • Is there any trend? Given that there seems to be
    no trend or impact between spend and response,
    it is highlyunlikely that further information
    would be derived from this field.

40
More Variable Creation
  • What would you do here?
  • Here, this variable in all likelihood would
    beuseful given its trend with response rate.

41
Stage 3 of Data Mining
  • What stage are we at
  • Application of data mining tools
  • Give me some examples of what data miners would
    be doing in stage 3
  • Data discovery
  • Data Audit/Frequency Distribution Analysis, Value
    Segmentation
  • Models,profiles,etc.
  • Post Campaign Analysis
  • Reporting i.e such as standard KBM-Key Business
    Measure Reports
  • AdHoc Reports
  • Modelling and profiling represent some examples
    of what we might be doing in this stage.

42
Types of Predictive Models
  • ExamplesDiscrete Models
  • Response Models
  • Cross Sell
  • Upsell
  • Acquisition
  • Attrition Models
  • Product Affinity Models
  • Risk Models

43
Types of Predictive Models
  • Examples-Continuous Models
  • Profitability/Value Models
  • Spending Models
  • What is the concept of the objective function or
    dependant variable?
  • This the variable that we trying to predict
  • Response,bad credit,defection,spend,etc.
  • What are we trying to optimize essentially
    becomes our objective function.
  • This is the variable we are trying to predict
Write a Comment
User Comments (0)
About PowerShow.com