REGNET - PowerPoint PPT Presentation

About This Presentation
Title:

REGNET

Description:

Axis dependency: non-Boolean matching. Vector model assumes mutual independence between axes ... non-Boolean matching model. Define a feature matching matrix ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 43
Provided by: eilSta
Learn more at: http://eil.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: REGNET


1
REGNET
An E-Government Infrastructure for Regulation
Parsing and Relatedness Analysis
Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law,
Prof. Gio Wiederhold http//eig.stanford.edu/regne
t Contact glau_at_stanford.edu http//eig.stanford
.edu/glau
2
Motivation
  • Multiple sources of regulations
  • Multiple jurisdictions federal, state, local,
    etc.
  • Different formats, terminologies, contexts
  • Amending rules, conflicting ideas

3
Motivation
  • Multiple sources of regulations
  • Multiple jurisdictions federal, state, local,
    etc.
  • Different formats, terminologies, contexts
  • Amending rules, conflicting ideas
  • ? Need for a repository
  • Locate relevant information
  • E.g., small business penalty fees for violations
  • ? Need for analysis tool
  • Complexity of regulations
  • Multiple jurisdictions
  • Understanding of regulations their
    relationships

4
Example 1 Related Provisions
  • ADAAG Appendix 4.6.3
  • Such a curb ramp opening must be located
    within the access aisle boundaries, not within
    the parking space boundaries.
  • CBC 1129B.4.3
  • Ramps shall not encroach into any parking
    space.
  • Exception 1. Ramps located at the front of
    accessible parking spaces may encroach into the
    length of such spaces
  • CBC allows curb ramps encroaching into accessible
    parking stall access aisles, while ADA disallows
    encroachment into any portion of the stall.

5
Example 2 Related but Conflicting Provisions
  • ADAAG 4.7.2
  • Slope. Transitions from ramps to walks,
    gutters, or streets shall be flush and free of
    abrupt changes
  • CBC 1127B.5.5
  • Beveled lip. The lower end of each curb ramp
    shall have a ½ inch (13mm) lip beveled at 45
    degrees as a detectable way-finding edge for
    persons with visual impairments.
  • ADAAG focuses on wheelchair traversal CBC
    focuses on the visually impaired when using a
    cane.

6
Scope
  • 1. Overview
  • Examples of system capabilities
  • 2. Repository development 3. Relatedness
    analysis

7
Overview of System Capabilities Parsing
40CFR natural structure
Original 40CFR
8
Overview of System Capabilities Parsing
IBC in 2-columned PDF
XML hierarchy
  • ltregulation id"ibc" name"international building
    code" type"private"gt
  • ltregElement id"ibc.1107" name"special
    occupancies"gt
  • ltregElement id"ibc.1107.2" nameassembly
    area seating"gt
  • ltreference id"ibc.1107.2.4.1" times"1" /gt
  • ltconcept name"assembl area" times"1" /gt
  • ltregTextgtAssembly areas with fixed seating
    shall comply with Sections lt/regTextgt
  • ltregElement id"ibc.1107.2.1"
    name"services"gt ... lt/regElementgt
  • lt/regElementgt
  • lt/regElementgt
  • lt/regulationgt

9
Overview of System Capabilities Feature Parsing
Usages of features
Extracted features
10
Overview of System Capabilities Comparisons
Regulation comparison 40CFR vs. 22CCR
11
Overview of System Capabilities E-rulemaking
Drafted regulations compared with public comments
12
Scope
  • 1. Overview
  • Examples of system capabilities
  • 2. Repository development 3. Relatedness
    analysis

13
Repository development
14
Shallow parser
  • Data Source
  • Americans with Disabilities Act Accessibility
    Guide (ADAAG), Uniform Federal Accessibility
    Standards (UFAS), Code of Federal Regulations
    Title 40 (40CFR), UK and Scottish Disability
    Discrimination Act, etc.
  • Current standard HTML, PDF, hardcopy...
  • Our system standard XML
  • Unit of extraction section/provision
  • ltregElement idufas.4.32.1 nameminimum
    number asterisk0 gt
  • ltregTextgt Fixed or built-in seating, ...
    lt/regTextgt
  • ltref nameufas.4.5 num1 /gt
  • ltref nameufas.4.32 num1 /gt
  • lt/regElementgt

15
Shallow parser PDF ? Basic XML format
16
Shallow parser HTML ? Basic XML format
ltregulation id"40.cfr" name"code of federal
regulations" type"federal"gt ... ltregElement
id"40.cfr.279.12.c" name"Burning in particular
units."gt ... ltregElement id"40.cfr.279.12.c.3
" name""gt ltreference id"40.cfr.264.O"
times"1" /gt ... ltconcept name"waste
incinerator" times"1" /gt ltregTextgt
Hazardous waste incinerators subject to
regulation under subpart O of parts 264 or 265 of
this chapter. lt/regTextgt lt/regElementgt
lt/regElementgt lt/regulationgt
17
Shallow parser extracting references
  • ltregulation id"40.cfr" name"code of federal
    regulations" type"federal"gt ...
  • ltregElement id"40.cfr.279.12.c" name"Burning
    in particular units"gt ...
  • ltregElement id"40.cfr.279.12.c.3" name""gt
  • ltreference id"40.cfr.264.O" times"1" /gt
    ...
  • ltconcept name"waste incinerator" times"1"
    /gt
  • ltregTextgt Hazardous waste incinerators
    subject to regulation under subpart O of parts
    264 or 265 of this chapter. lt/regTextgt
  • lt/regElementgt
  • lt/regElementgt
  • lt/regulationgt

18
Shallow parser feature extraction
  • Non-structural characteristics specific to a
    corpus
  • To aid user retrieval of relevant materials
  • For analysis purpose

19
Shallow parser feature extraction
  • Generic features
  • Concepts - noun phrases
  • Exceptions - negated provisions
  • Definitions - terminologies defined in
    regulations
  • Domain-specific features
  • Glossary terms - definitions from reference
    guides
  • Author-prescribed indices - concepts from field
    handbooks
  • Measurements - e.g., 2 inches max, 4 ppm
  • Chemicals - list of drinking water contaminants
    from EPA
  • Effective dates - provision updates

20
Example of definition/glossary tags
  • Original section 3.5 from the ADAAG
  • 3.5 DEFINITIONS.
  • Accessible. Describes a site, building,
    facility, or portion thereof
  • Clear. Unobstructed.
  • Refined section 3.5 in XML format
  • ltregElement nameadaag.3.5 titledefinitions
    asterisk0gt
  • ltindexTerm namefacility num1 /gt
  • ltdefinitiongt
  • lttermgt accessible lt/termgt
  • ltdefinedAsgt Describes a site, building,
    facility, or portion thereof... lt/definedAsgt
  • lt/definitiongt
  • ltdefinitiongt
  • lttermgt clear lt/termgt
  • ltdefinedAsgt Unobstructed. lt/definedAsgt
  • lt/definitiongt
  • lt/regElementgt

21
Example of indexTerm, concept, measurement
exception tags
  • Original section 4.6.3 from the UFAS
  • 4.6.3 PARKING SPACES. Parking spaces for
    disabled people shall be at least 96 in (2440 mm)
    wide and shall have an adjacent access aisle 60
    in (1525 mm) wide minimum (see Fig. 9). Parking
    access aisles shall be part of ...
  • EXCEPTION an adjacent access aisle at least
    96 in (2440 mm) wide complying with 4.5...
  • Refined section 4.6.3 in XML format
  • ltregElement nameufas.4.6.3 titleparking
    spaces asterisk1gt
  • ltconcept nameaccess aisle num3 /gt
  • ltindexTerm nameaccessible circulation
    route num1 /gt
  • ltmeasurement unitinch magnitude96
    quantifiermin /gt
  • ltref nameufas.4.5 num1 /gt
  • ltregTextgt Parking spaces for disabled
    people shall ... lt/regTextgt
  • ltexceptiongt If accessible parking spaces
    for ... lt/exceptiongt
  • lt/regElementgt

22
Usages of extracted features revisited
Usages of features
Extracted features
23
Scope
  • 1. Overview
  • Examples of system capabilities
  • 2. Repository development 3. Relatedness
    analysis

24
Relatedness analysis
25
Relatedness analysis
  • To utilize the structure, referencing of
    regulations and domain knowledge to obtain a
    better comparison
  • Measure
  • Similarity score f(A, U) ? (0, 1)
  • Nodes A and U are provisions from two different
    regulation trees

f ? (0, 1)
26
Base score f0 computation
  • Linear combination of feature matching
  • F(A,U,i) similarity score between Sections
    (A,U) based on feature i
  • N total number of features
  • Feature matching
  • Based on the Vector model using cosine similarity
    as the distance between feature vectors
  • Similarity between two documents M and N
  • dM and dN are document vectors
  • Cosine is normalized gt always between 0 and 1

27
Example of feature vectors
  • Traditional term match
  • each index term i is assigned a positive and
    non-binary weight wi,M in each document vector d
    M
  • Weight selection
  • Frequency of term, or
  • tf ? idf model
  • tf term frequency term density
  • idf inverse document frequency log(n/ni)
    term rarity
  • Excluding stopwords
  • Feature concept
  • Concept vectors are formed per provision based on
    concept frequency in each provision
  • F(provision 1, provision 2, featureconcept)
  • cosine between two concept vectors

28
Axis dependency non-Boolean matching
  • Vector model assumes mutual independence between
    axes
  • Domain experts do not necessarily agree
  • A measurement of 2 inches max can be a 70
    match to 2 inches
  • Synonyms exist, e.g., ontology defined for
    chemicals
  • Limitation observed
  • Need flexibility to model domain knowledge, such
    as a 0, 50, 75 and 100 measurement match

29
Proposed non-Boolean matching model
  • Define a feature matching matrix E
  • Eij match between features i and j
  • E.g., a 3-dimensional vector space using 2 ppm,
    2 ppm max and 2 ft as the first, second and
    third measurement axes
  • E
  • Vector space transformation
  • Map feature vectors onto an alternate space via
    matrix D
  • Cosines are computed on the consolidated
    frequency vectors
  • E.g., similarity based on measurements

30
Vector space transformation
  • Define D such that E DTD is fulfilled
  • Cosine between the consolidated frequency
    vectors
  • Reduces to a Boolean cosine when E I

31
Score refinements based on regulation structure
  • Neighbor inclusion
  • Diffusion of similarity between clusters of nodes
    in the tree
  • Self vs. parent-sibling-child (psc), fs-psc
  • psc vs. psc, fpsc-psc

32
Neighbor inclusion psc vs. psc
  • Take a linear combination of neighboring pair
    scores
  • Formulate a neighbor structure matrix N
  • Define score matrix ?
  • We have ?psc-psc NA?0NUT

33
Neighbor inclusion self vs. psc
  • Take a linear combination of neighbor vs. self
    scores
  • Formulate a neighbor structure matrix N
  • Define score matrix ?
  • We have ?s-psc ½ (?0NUT NA?0)

34
Score refinements based on regulation structure
  • Reference distribution
  • Diffusion of similarity between referencing nodes
    and referenced nodes in the tree
  • E.g., f(A5.3, U6.4(a)) updates f(A2.1, U3.3)

35
Reference distribution s-ref and ref-ref
  • Take a linear combination of reference vs. self
    and reference vs. reference scores
  • Formulate a reference structure matrix R
  • Define score matrix ?
  • We have ?ref-ref RA?0RUT and ?s-ref ½ (?0RUT
    RA?0)

36
Example of results UFAS vs BS8300
  • Phrasing difference between American and British
    regulations
  • ufas.4.13.9 Door Hardware. Handles, pulls,
    latches, locks, and other operating devices on
    accessible doors shall have a shape that is easy
  • bs8300.12.5.4.2 Door Furniture. Door handles on
    hinged and sliding doors in accessible bedrooms
    should be easy to grip
  • Neighbor similarities imply similarity between
    the interested nodes

37
Example of results almost identical provisions
Regulation comparison 40CFR vs. CCR
38
Example of results e-rulemaking
  • Application domain e-rulemaking
  • Comparison between draft of rules and the
    associated public comments
  • ADAAG Chapter 11, rights-of-way draft
  • Less than 15 pages
  • Over 1400 public comments received within 4
    months
  • Comments 10MB in size most are several pages
    long
  • ? New regulation draft can easily generate a huge
    amount of data that needs to be reviewed and
    analyzed

39
Example of results e-rulemaking
Regulations compared with public comments
40
Example of results e-rulemaking
  • Related draft section and public comment
  • Adaag.1105.4.1
  • Where signal timing is inadequate for full
    crossing of all traffic lanes or where the
    crossing is not signalized, cut-through medians
  • Deborah Wood, October 29, 2002
  • This often means walk lights that are so short
    in duration that by the time a person who is
    blind realizes
  • No identified related section
  • Donna Ring, September 6, 2002
  • If you become blind, no amount of electronics
    will make you safe You have to learn modern
    blindness skills from a good teacher. You have
    to practice your new skills
  • ? Concern not addressed in the draft

41
Conclusions
  • An infrastructure for
  • Repository for regulations
  • Shallow parser
  • Feature extractions
  • Similarity comparison
  • Base score
  • Score refinements
  • Results
  • Comparisons between Federal codes, European codes
  • Application to e-rulemaking
  • Future Directions
  • Extension of application to other domains of
    semi-structured documents
  • Conflict analysis?

42
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com