Lattice Representation of Data - PowerPoint PPT Presentation

About This Presentation
Title:

Lattice Representation of Data

Description:

The attribute subsets N of M such that a(o(N)) = N. are called formal concepts in FCA ... animals 26 and 27 share the attributes ' ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 39
Provided by: alexp7
Category:

less

Transcript and Presenter's Notes

Title: Lattice Representation of Data


1
Lattice Representation of Data
  • Dr. Alex Pogel
  • Physical Science Laboratory
  • New Mexico State University

2
Basic Idea
  • Replace tabular representation by lattice
    representation in order to reveal hierarchical
    structure
  • Basic definitions
  • Information in the lattice
  • Carving up epidemiological data

Ganter Wille Formal Concept Analysis
(FCA) Barwise Seligman Information Flow
3
Input data
  • Base data structure is a 0,1-table
  • A set G of objects (represented by rows) and
  • A set M of attributes (represented by columns)
  • an entry of 1 indicates object g has attribute m

M

G
4
Input data, mathematically
  • Mathematically speaking
  • a binary relation I from G to M, a subset of G x
    M
  • interpreted as an indication of which objects g
    have which attributes m
  • Via (g,m) e I

5
Key Definitions
  • The notion of formal concept is based on
    natural mappings that arise from the binary
    relation I
  • interpret G and M as before
  • to each subset H of G, we associate the set a(A)
    of all attributes the objects in H satisfy in
    common
  • a P(G)?P(M)
  • to each subset N of M, we associate the set o(N)
    of all objects satisfying every attribute in N
  • o P(M)?P(G)

6
Key Definitions
  • The attribute subsets N of M such that a(o(N))
    N
  • are called formal concepts in FCA
  • And are called
  • closed sets in mathematics, as a(o()) is a
    closure operator on M
  • A formal concept can be identified geometrically
    within a data table by reshuffling rows and
    columns such that
  • object-attribute relations are maintained and
  • a maximal rectangle of 1s appears.

7
Animal Context
8
Shuffling Reveals a Concept
9
BIRD is the (formal) concept
10
Closure System Arises
  • Taking all closed sets together we obtain a
    closure system
  • aka a topped intersection structure, in
    Davey-Priestley
  • which is always a complete lattice an ordered
    set for which every subset has both a supremum
    and infimum in the set
  • Examples
  • R with lt,
  • P(S) with inclusion,
  • any topology with inclusion,

11
Focus on attribute logic
12
Full list difficult, redundant
  • all implications that hold for the data, with up
    to three attributes in their premise 125 with
    positive support

13
Duquenne-Guigues Basis
  • 20 implications generate the full list, and serve
    as a basis (analogy with linear algebra) ordered
    by support value

14
Full list, basis, and original data
15
Implication Reads Upwards
  • at top right warm-blooded implies airbreather
  • 1st in basis high support indicated in lime
    green

16
A Subinterval of the lattice
  •  
  • fourlegged implies airbreather
  • pet implies warm-blooded
  • (iguana?)
  • and
  • fur implies
  • fourlegged and warm-blooded (platypus?)

17
Original data preserved
  • animals 26 and 27
  • share the attributes
  • lives in water,
  • is warm-blooded and
  • is an airbreather

18
Original data preserved
  • animals 26 and 27 share the attributes
  • lives in water, is warm-blooded and is an
    airbreather

19
Color-coded support
  • the similarity in color between livestock and
    the concept node below it yields the association
    rule
  • livestock implies fur
  • with 79 confidence
  • And 11 support (bottom)

20
Visual Vocabulary
  • Small subdiagrams
  • (Specifically meet-subsemilattices)
  • can be recognized as complex sentences

21
3 unordered attribute concepts
c
b
a
Note the top element is really irrelevant, but
adding it makes everything well look at a
lattice instead of just a meet semilattice
(definition an ordered structure closed under
finite meet (glb))
22
Heres the best known outcome
No non-trivial implications
c
b
a
23
W over V a c ? b
c
b
a
24
Diamond in diamond
Under condition c, a and b are equivalent
b
c
a
25
Convergence
any two imply the third
b
c
a
26
Two Complex Sentences
  • So, we can read that
  •  
  • For nocturnal animals and pets, the attributes
    fourlegged and warm-blooded are equivalent,
  •  
  • and
  • the only implication between the attributes
    nocturnal, fur and pet is
  • pet and nocturnal implies fur.

27
The Hague, Netherlands
28
Before Freese improvement
29
After Freese improvement
30
Apparent Splits
31
Eliminating Light Smokers
32
Why no object names?
33
Lung Cancer and Smoking
  • nearly half of these 30 year smokers have lung
    cancer

34
Bird-keeping and Smoking
  • Association rules involving bird-keeping and
    smoking

35
Limitations as KDD Process
  • Needs attention given to data preparation
  • Need more built-in verification of discovered
    rules
  • No domain-specific constructions (advantage ?)
  • Does not scale without clustering (universal ?)

36
Epidemiological functions
  • Plan to add odds ratio calculation, via click

OR 3.9
37
Clustering for too large lattices
38
Support for improvement
  • Traditional diagram improvement algorithms are
    based solely upon the order structure
  • We are now moving towards the inclusion of
    support values in these algorithms
  • I will talk about this topic in detail in July,
    here at DIMACS, as part of the Applications of
    Lattice Theory workshop

END
Write a Comment
User Comments (0)
About PowerShow.com