Computational Application of Benford - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Application of Benford

Description:

P(mantissa t/10) = log10 t, for all t [1, 10) The mantissa (base 10) of a positive, real ... logarithmic probability measure on mantissa space (R , M) that ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 18
Provided by: defau610
Learn more at: https://www.aiecon.org
Category:

less

Transcript and Presenter's Notes

Title: Computational Application of Benford


1
Computational Application of Benfords First
Digit Law to Financial Fraud Detection
  • Sukanto Bhattacharya
    Kuldeep Kumar
  • School of Business/Information Technology
    School of Information Technology
  • Bond University
    Bond University

2
Newcombs discovery
  • In 1881, an American mathematician Simon Newcomb
    discovered to his surprise that the first few
    pages of a logarithmic table corresponding to the
    lower significant digits (typically those below
    5) were comparatively dirtier than the later
    pages corresponding to the higher significant
    digits (typically those above 5)
  • Newcomb attributed this to greater usage of the
    first pages than the later ones which in turn
    led him to reason that the probability
    distribution of an user accessing any of the
    pages at any given time was skewed in favour of
    the earlier pages corresponding to the lower
    significant digits! This was directly in contrast
    with the normal theory of probability according
    to which the probability of randomly picking any
    number between one and nine should be equal to
    the unique value of 1/9 or roughly 11.11

3
Benford steals the thunder
  • In 1938, almost half a century after Newcombs
    sensational discovery, another American the
    physicist Frank Benford was going through a large
    collection of numerical data from disparate
    sources when he stumbled upon a similar finding
  • Besides further exploring its mathematical
    intricacies, Benford also came up with a huge
    volume of data to empirically support his finding
    and went on to publish his findings in a number
    of papers. Thus the principle came to be known
    as Benfords Law

4
The mathematical structure of Benfords law
  • It is specifically a logarithmic probability
    distribution on the first significant digit of
    real numbers given as follows
  • P(D1 d) d?d1log10(e)D1-1 dD1 log10(1
    d-1), d 1, 2, 9
  • In the above form, it is also known as the first
    digit law. However, the law can be generalized to
    include any digit such that, in its general form
    it is stated as follows
  • P(D1 d1, D2 d2, Dn dn) log101
    (?di.10n-i)-1,
    for all n ? ?

5
Mathematics (contd.)
  • An alternative form of the general law may be
    stated as under
  • P(mantissa ? t/10) log10 t, for all t ?
    1, 10)
  • The mantissa (base 10) of a positive, real number
    x is the real number r in 1/10, 1) with x
    r.10n for some exponent n?N
  • Formally, the logarithmic probability measure P
    is defined on the measurable space (R , M) where
    R is the set of all positive, real numbers and
    M is the mantissa (base 10) sigma algebra which
    in turn is the sub-sigma algebra of the Borel
    set generated by the single-valued function x ?
    mantissa(x)

6
Invariance properties of Benfords distribution
  • Benfords distribution is characterized by the
    important statistical properties of scale
    invariance and base invariance
  • Scale Invariance A probability measure P on
    mantissa space (R , M) is said to be scale
    invariant if P(sS) P(S) for every S ? M and s gt
    0. This property ensures that Benfords
    distribution is particularly robust even with
    chaotic data subject to Feigenbaum scaling
  • Base Invariance A probability measure P on
    mantissa space (R , M) is said to be base
    invariant if P(S1/n) P(S) for every S ? M.
    Benfords distribution is the unique logarithmic
    probability measure on mantissa space (R , M)
    that displays base invariance

7
Benfords law as a signature of Nature
  • It has been mathematically proved that in a form
    analogous to the central limit theorem, the
    Benford distribution is characterized as the
    unique upper limit of the significant-digit
    frequencies of a sequence of conformably
    generated random variables
  • In accordance with Benford himself, while we
    count arithmetically as 1, 2, 3, 4, Nature
    counts geometrically as e0, ex, e2x, etc. Thus
    Benfords distribution is observable in most
    naturally occurring numbers but not in
    artificially manipulated or concocted data
  • Accounting data is one type of data that is
    expected to closely follow the Benford
    distribution. Therefore, theoretically, the more
    an observed set of accounting data deviates from
    the pattern predicted by Benford, the more are
    the chances that the data is not authentic

8
Getting the numbers right
  • The steady-state Benford first-digit frequencies

D1 1 2 3 4 5 6 7 8 9
P(D1 d) 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046
9
Dow Illustrates Benford's Law
  • To illustrate Benford's Law, Dr. Mark J.
    Nigrini offered this example "If we think of the
    Dow Jones stock average as 1,000, our first digit
    would be 1
  • "To get to a Dow Jones average with a first digit
    of 2, the average must increase to 2,000, and
    getting from 1,000 to 2,000 is a 100 percent
    increase
  • "Let's say that the Dow goes up at a rate of
    about 20 percent a year. That means that it would
    take five years to get from 1 to 2 as a first
    digit
  • "But suppose we start with a first digit 5. It
    only requires a 20 percent increase to get from
    5,000 to 6,000, and that is achieved in one year
  • "When the Dow reaches 9,000, it takes only an 11
    percent increase and just seven months to reach
    the 10,000 mark, which starts with the number 1.
    At that point you start over with the first digit
    a 1, once again. Once again, you must double the
    number -- 10,000 -- to 20,000 before reaching 2
    as the first digit
  • "As you can see, the number 1 predominates at
    every step of the progression, as it does in
    logarithmic sequences"

10
A suggested Monte Carlo approach
  • We have voiced slight reservations about direct
    comparison of observed first-digit frequencies
    with the expected Benford frequencies as the
    Benford frequencies are necessarily steady state
    frequencies and may not therefore be truly
    reflected in the sample frequencies. As samples
    are always of finite sizes, it is therefore not
    appropriate to arrive at any conclusion on the
    basis of such a direct comparison, as the sample
    frequencies wont be steady state frequencies
  • We have shown (Kumar and Bhattacharya, 2002) that
    if we draw digits randomly using the inverse
    transformation technique from within random
    number ranges derived from a cumulative
    probability distribution function based on the
    Benford frequencies then the problem boils down
    to running a goodness of fit kind of test to
    identify any significant difference between
    observed and simulated first-digit frequencies.
    This test may be conducted using a known sampling
    distribution like for example the Pearsons ?²
    distribution

11
The final test
  •   Test for significant difference in sample
    frequencies between the first digits observed in
    the sample and those generated by the Monte Carlo
    simulation by using a goodness of fit test using
    the Pearsonian ?² distribution. The null and
    alternative hypotheses are as follows
  •  
  • H0 The observed first digit frequencies
    approximate a Benford distribution
  • H1 The observed first digit frequencies
    do not approximate a Benford distribution
  • The above statistical test will not reveal
    whether or not a fraud has actually been
    committed. All it does is establish at a desired
    level of confidence, whether the accounting data
    has been manipulated (if H0 is rejected)

12
A Neutrosophic Extension
  • However, given that H1 is accepted and H0 is
    rejected, it could imply any of the following
    events
  • I. There is no manipulation - occurrence of a
    Type I error i.e. H0 rejected when true.
  • II. There is manipulation and such manipulation
    is definitely fraudulent.
  • III. There is manipulation and such manipulation
    may or may not be fraudulent.
  • IV. There is manipulation and such manipulation
    is definitely not fraudulent.

13
A Neutrosophic Extension (continued)
  • Neutrosophic probabilities are a generalization
    of classical and fuzzy probabilities and cover
    those events that involve some degree of
    indeterminacy
  • Neutrosophy provides a better approach to
    quantifying uncertainty than classical or even
    fuzzy probability theory. Neutrosophic
    probability theory uses a subset-approximation
    for truth-value as well as indeterminacy and
    falsity values
  • Also, this approach makes a distinction between
    relative true event and absolute true event
    the former being true in only some probability
    sub-spaces while the latter being true in all
    probability sub-spaces. Similarly, events that
    are false in only some probability sub-spaces are
    classified as relative false events while
    events that are false in all probability
    sub-spaces are classified as absolute false
    events. Again, the events that may be hard to
    classify as either true or false in some
    probability sub-spaces are classified as
    relative indeterminate events while events that
    bear this characteristic over all probability
    sub-spaces are classified as absolute
    indeterminate events.

14
A Neutrosophic Extension (continued)
  • While in classical probability n_sup ? 1, in
    neutrosophic probability one has n_sup ? 3 where
    n_sup is the upper bound of the probability
    space. In cases where the truth and falsity
    components are complimentary, i.e. there is no
    indeterminacy, the components sum to unity and
    neutrosophic probability is reduced to classical
    probability as in the tossing of a fair coin or
    the drawing of a card from a well-shuffled deck
  • Coming back to our original problem of financial
    fraud detection, let E be the event whereby a
    Type I error has occurred and F be the event
    whereby a fraud is actually detected. Then the
    conditional neutrosophic probability NP (F Ec)
    is defined over a probability space consisting of
    a triple of sets (T, I, U). Here, T, I and U are
    probability sub-spaces wherein event F is t
    true, i indeterminate and u untrue
    respectively, given that no Type I error occurred

15
Statistical sampling issues
  • A statistical sampling method particularly useful
    for the investigative accountant is the monetary
    unit sampling, which takes into account the
    materiality of various items by giving
    proportionately greater weightage to those items
    that have higher monetary values
  • The monetary unit sampling technique treats each
    monetary unit in the account balances under
    examination as a separate part of the population.
    The items with larger monetary values have a
    greater probability of selection (as they are
    automatically given a larger weightage in
    proportion to the size of the monetary units
    contained therein)
  • The monetary unit sampling method is particularly
    suitable for forensic accounting purposes where
    the investigator suspects material overstatement
    of accounts on a selective basis in an otherwise
    robust accounting system

16
Direction of future research
  • We are still trying to come to terms with the
    deep statistical and topological properties of
    this strange law of anomalous numbers
  • We have already attempted to add a neutrosophic
    dimension to the problem of determining the
    conditional probability that a financial fraud
    has been actually committed, given that no Type I
    error occurred while rejecting the null
    hypothesis (Bhattacharya, 2002)
  • The possibilities of coming up with a neuro-fuzzy
    multinomial fraud classification system are
    presently being explored. This is intended as the
    first step towards building a comprehensive fraud
    classification and detection tool-kit
    incorporating the statistical features of
    Benfords law along with sophisticated
    audit-sampling methodologies

17
High Five
  • An open workgroup has recently been
    formed for further collaborative research on
    application of Benfords law in fraud detection.
    The group presently involves the following
    researchers
  • 1. Florentin Smrandache, Department of
    Mathematics, University of New Mexico, U.S.A.
  • 2. Jean Dezert, ONERA (National
    Aerospace Research Establishment), France
  • 3. Kuldeep Kumar, School of IT, Bond
    University, Australia
  • 4. Sukanto Bhattacharya, School of
    Business/IT, Bond University, Australia
  • 5. Mohammad Khoshnevisan, School of
    Accounting Finance, Griffith University,
    Australia
Write a Comment
User Comments (0)
About PowerShow.com