Methodology of Allocating Generic Field to its Details - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Methodology of Allocating Generic Field to its Details

Description:

Title: General to Detail Allocation Author: Jessica Andrews Last modified by: Carl Girard Created Date: 9/20/2006 7:39:26 PM Document presentation format – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 32
Provided by: Jessica378
Category:

less

Transcript and Presenter's Notes

Title: Methodology of Allocating Generic Field to its Details


1
Methodology of Allocating Generic Field to its
Details
  • Jessica Andrews
  • Nathalie Hamel
  • François Brisebois
  • ICESIII - June 19, 2007

2
Outline
  • Background Information on Tax Data
  • Objective
  • Current Methodology
  • Other Methodologies Considered
  • Comparison of the Methodologies
  • Future Work and Conclusions

3
Tax Data
  • Statistics Canada receives annual data from
    Canada Revenue Agency (CRA) on incorporated (T2)
    businesses
  • Tax data
  • Balance Sheet
  • Income Statement
  • 88 different Schedules

4
Tax Data
  • About 700 different fields to report
  • Most companies provide only 30-40 fields
  • Only 8 fields are actually required by CRA
    (section totals)
  • Non-farm revenue
  • Non-farm expenses
  • Farm revenue
  • Farm expenses
  • Assets
  • Liabilities
  • Shareholder Equity
  • Net Income/Loss

5
Objective
  • To impute the missing detail variables
  • Why ?
  • Tax data users need detailed data (tax
    replacement project (TRP))
  • Different concepts and definitions between tax
    and survey data
  • A subset of details linked to the same generic
    can be mapped to different survey variables
    (Chart of Account)

6
Challenges to meet
  • Methodology must
  • Work well for a large number of details
  • Be capable of dealing with details which are
    rarely reported and those which are frequently
    reported
  • Give good micro results for tax replacement, but
    also give good macro results when examined at the
    NAICS or full database level

7
First attempt to complete Tax Data
  • Edit rules
  • Outlier detection within a record
  • Deterministic edits (to ensure the record
    balances within section)
  • Review and manual corrections
  • Overlap between fiscal period
  • Negative values
  • Consistency edits between tax variables
  • Outlier detection between records
    (Hidiroglou-Berthelot)
  • CORTAX balancing edits
  • Deterministic imputation of key variables
  • Inventories
  • Depreciation
  • Salaries and wages

8
GDA Concepts
  • Corporation can use either generic or detail
    fields to report their results

      Case 1 Case 2 Case 3
Generic 8810 Office expenses amount 100 30
Details 8811 Office stationery and supply expense amount 20
Details 8812 Office utilities expense amount 30 10
Details 8813 Data processing expense amount 50 60
Total Total Total 100 100 100
9
GDA Concepts
  • Block is defined by a generic and its details
  • Generic field is not a total
  • Goal is to impute the most significant detail
    variables when a generic amount has been reported
  • GDA Generic to detail allocation

10
Current method
  • Uses imputation classes based on industry codes
    and size of company
  • First 2 digits of NAICS (about 25 industries)
  • Three sizes of revenue (boundaries of 5 and 25
    million)
  • Calculates ratios within imputation classes for
    each block
  • Uses all non-zero and non-missing details
  • Uses only details reported at least 10 of the
    time (5 for block General Farm Expense)
  • Assigns ratios to businesses with a generic

11
Current method
  • Originally proposed as a solution with good macro
    (aggregate) results
  • Now need good micro (business) level results for
    TRP
  • Problems
  • Imputation classes are frequently not homogeneous
    in terms of distribution
  • A large number of small imputation classes

12
Other methods considered
  • Historic imputation method
  • Scores method
  • Cluster method

13
Historic imputation method
  • Assumes distributions of details are the same
    from one year to the next
  • Problems
  • A change in business strategies/properties will
    not be considered this way
  • Most businesses which report details in the
    previous year will report them also in the
    current year, leaving few businesses which could
    be imputed with this method (5 on all blocks
    tested)
  • Requires use of another method for remaining
    businesses

14
Scores method
  • Uses response/non response models for each detail
  • Groups businesses into imputation classes on the
    basis of percentiles of response probability
  • Calculates ratios within imputation classes
  • Assigns ratios to businesses with a generic

15
Scores method
  • Problems
  • Need to create a model for each detail
  • Difficult to resolve what to do in the case of
    blocks with many details (5 or more) which are
    frequently reported
  • This method was excluded due to its difficulty
    in coping with blocks with a moderate to large
    number of details

16
Cluster method
  • Divides businesses into imputation classes on the
    basis of response patterns to details
  • Uses clustering or dominant detail method
  • Uses discriminatory models (parametric or not) to
    assign businesses with generic to imputation
    classes
  • Calculates ratios within imputation classes
  • Assigns ratios to businesses with a generic

17
Cluster method
  • Problems
  • For certain blocks it can be difficult to find
    good variables on which to discriminate
  • Issue of how often clustering method and models
    should be reviewed

18
Comparing the methods
  • Estimate distributions of known data for year n
    from ratios calculated for year n-1
  • Create a benchmark file
  • Reported details in years n-1 and n
  • Put all details into generic fields in year n
  • Calculate ratios from businesses in year n-1 for
    all methods
  • Assign ratios to businesses in year n
  • Compare the results to the reported fields

19
Comparing the methods
  • Compare the results at the micro (businesses) and
    the macro (aggregate) levels
  • Compare true and estimated distributions

20
Comparing the methods
  • Macro statistics
  • for the jth detail in the block

21
Comparing the methods
  • Micro Statistics
  • Median Pseudo CV
  • for the jth detail and ith business in the block

22
Comparing the methods
  • Micro Statistics
  • Median Pearson Contingency Coefficient
  • for the jth detail and ith business in the block
  • f values represent the marginal distributions
  • d2 represents the degree of dependency (depends
    on n, r and c)

23
Comparing the methods
  • We show results for Block 8230 Other Revenue
  • This block has 20 details covering revenue
    distribution
  • Important for clients as used in many surveys
  • The scores method is not shown as it is difficult
    to implement with this many details

24
Comparing the methods
OTHER REVENUE FLDS 8230 TO 8250 OTHER REVENUE FLDS 8230 TO 8250
8230 Other revenue
8231 Foreign exchange gains/losses
8232 Income/loss of subsidiaries/affiliates
8233 Income/loss of other divisions
8234 Income/loss of joint ventures

8248 Insurance recoveries
8249 Expense recoveries
8250 Bad debt recoveries
25
Results
Block 8230 Micro Statistics Micro Statistics Micro Statistics Micro Statistics Macro Statistics Macro Statistics
Median PseudoCV IQR Median PearsonCont. Coeff. IQR SSE SSEP
Current Method 1.08 0.43 0.66 0.14 2.2e20 120
Cluster Method 0.34 1.39 0.36 0.63 2.8e20 12
Historic Cluster 0.51 0.99 0.10 0.7 9.9e19 4.5
26
Cluster methodology
  • Most blocks use dominant detail (attractor) x
    clusters to define the imputation classes
  • A business i belongs to cluster j of attractor x
    where xgt50 if
  • where is the total value reported by
    business i in detail j. If this statement is not
    true for any detail then the business is assigned
    to cluster j1.

27
Cluster methodology
  • Distribution ratios to details are calculated for
    each cluster
  • Discriminatory models are then created
  • (nonparametric for most blocks) to assign
    businesses with a generic
  • Use variables on industry (NAICS), location
    (province), size (revenue, log revenue), details
    and totals of details in other blocks

28
Cluster methodology
  • Generic amounts are assigned to details in the
    following 3 ways
  • If generic amount and no details reported then
    ratios are assigned as calculated
  • If generic amount and all details with ratio
    greater than 0 are reported then ratios are
    assigned as calculated
  • If generic amount and some details but not all
    are reported, then ratios are pro-rated and
    generic is assigned only to details which were
    not reported

29
Cluster methodology
  • Gives better micro results
  • Improved data for tax replacement
  • Macro results remain similar to current
    methodology
  • Micro results are consistent year to year

30
Future work and conclusions
  • The cluster methodology will be implemented for
    reference year 2006 for the Income Statement
  • Model fitting and implementation for Balance
    Sheet will follow
  • Review of models and clustering methods as deemed
    appropriate

31
Contact Information / Coordonnées
  • Jessica.andrews_at_statcan.ca
  • Francois.brisebois_at_statcan.ca
  • Nathalie.hamel_at_statcan.ca
Write a Comment
User Comments (0)
About PowerShow.com