Title: REGNET
1REGNET
An E-Government Infrastructure for Regulation
Parsing and Relatedness Analysis
Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law,
Prof. Gio Wiederhold http//eig.stanford.edu/regne
t Contact glau_at_stanford.edu http//eig.stanford
.edu/glau
2Motivation
- Multiple sources of regulations
- Multiple jurisdictions federal, state, local,
etc. - Different formats, terminologies, contexts
- Amending rules, conflicting ideas
3Motivation
- Multiple sources of regulations
- Multiple jurisdictions federal, state, local,
etc. - Different formats, terminologies, contexts
- Amending rules, conflicting ideas
- ? Need for a repository
- Locate relevant information
- E.g., small business penalty fees for violations
- ? Need for analysis tool
- Complexity of regulations
- Multiple jurisdictions
- Understanding of regulations their
relationships
4Example 1 Related Provisions
- ADAAG Appendix 4.6.3
- Such a curb ramp opening must be located
within the access aisle boundaries, not within
the parking space boundaries. - CBC 1129B.4.3
- Ramps shall not encroach into any parking
space. - Exception 1. Ramps located at the front of
accessible parking spaces may encroach into the
length of such spaces - CBC allows curb ramps encroaching into accessible
parking stall access aisles, while ADA disallows
encroachment into any portion of the stall.
5Example 2 Related but Conflicting Provisions
- ADAAG 4.7.2
- Slope. Transitions from ramps to walks,
gutters, or streets shall be flush and free of
abrupt changes - CBC 1127B.5.5
- Beveled lip. The lower end of each curb ramp
shall have a ½ inch (13mm) lip beveled at 45
degrees as a detectable way-finding edge for
persons with visual impairments. - ADAAG focuses on wheelchair traversal CBC
focuses on the visually impaired when using a
cane.
6Scope
- 1. Overview
- Examples of system capabilities
- 2. Repository development 3. Relatedness
analysis
7Overview of System Capabilities Parsing
40CFR natural structure
Original 40CFR
8Overview of System Capabilities Parsing
IBC in 2-columned PDF
XML hierarchy
- ltregulation id"ibc" name"international building
code" type"private"gt - ltregElement id"ibc.1107" name"special
occupancies"gt - ltregElement id"ibc.1107.2" nameassembly
area seating"gt - ltreference id"ibc.1107.2.4.1" times"1" /gt
- ltconcept name"assembl area" times"1" /gt
- ltregTextgtAssembly areas with fixed seating
shall comply with Sections lt/regTextgt - ltregElement id"ibc.1107.2.1"
name"services"gt ... lt/regElementgt - lt/regElementgt
- lt/regElementgt
- lt/regulationgt
9Overview of System Capabilities Feature Parsing
Usages of features
Extracted features
10Overview of System Capabilities Comparisons
Regulation comparison 40CFR vs. 22CCR
11Overview of System Capabilities E-rulemaking
Drafted regulations compared with public comments
12Scope
- 1. Overview
- Examples of system capabilities
- 2. Repository development 3. Relatedness
analysis
13Repository development
14Shallow parser
- Data Source
- Americans with Disabilities Act Accessibility
Guide (ADAAG), Uniform Federal Accessibility
Standards (UFAS), Code of Federal Regulations
Title 40 (40CFR), UK and Scottish Disability
Discrimination Act, etc. - Current standard HTML, PDF, hardcopy...
- Our system standard XML
- Unit of extraction section/provision
- ltregElement idufas.4.32.1 nameminimum
number asterisk0 gt - ltregTextgt Fixed or built-in seating, ...
lt/regTextgt - ltref nameufas.4.5 num1 /gt
- ltref nameufas.4.32 num1 /gt
- lt/regElementgt
15Shallow parser PDF ? Basic XML format
16Shallow parser HTML ? Basic XML format
ltregulation id"40.cfr" name"code of federal
regulations" type"federal"gt ... ltregElement
id"40.cfr.279.12.c" name"Burning in particular
units."gt ... ltregElement id"40.cfr.279.12.c.3
" name""gt ltreference id"40.cfr.264.O"
times"1" /gt ... ltconcept name"waste
incinerator" times"1" /gt ltregTextgt
Hazardous waste incinerators subject to
regulation under subpart O of parts 264 or 265 of
this chapter. lt/regTextgt lt/regElementgt
lt/regElementgt lt/regulationgt
17Shallow parser extracting references
- ltregulation id"40.cfr" name"code of federal
regulations" type"federal"gt ... - ltregElement id"40.cfr.279.12.c" name"Burning
in particular units"gt ... - ltregElement id"40.cfr.279.12.c.3" name""gt
- ltreference id"40.cfr.264.O" times"1" /gt
... - ltconcept name"waste incinerator" times"1"
/gt - ltregTextgt Hazardous waste incinerators
subject to regulation under subpart O of parts
264 or 265 of this chapter. lt/regTextgt - lt/regElementgt
- lt/regElementgt
- lt/regulationgt
18Shallow parser feature extraction
- Non-structural characteristics specific to a
corpus - To aid user retrieval of relevant materials
- For analysis purpose
19Shallow parser feature extraction
- Generic features
- Concepts - noun phrases
- Exceptions - negated provisions
- Definitions - terminologies defined in
regulations - Domain-specific features
- Glossary terms - definitions from reference
guides
- Author-prescribed indices - concepts from field
handbooks - Measurements - e.g., 2 inches max, 4 ppm
- Chemicals - list of drinking water contaminants
from EPA - Effective dates - provision updates
20Example of definition/glossary tags
- Original section 3.5 from the ADAAG
- 3.5 DEFINITIONS.
- Accessible. Describes a site, building,
facility, or portion thereof - Clear. Unobstructed.
- Refined section 3.5 in XML format
- ltregElement nameadaag.3.5 titledefinitions
asterisk0gt - ltindexTerm namefacility num1 /gt
- ltdefinitiongt
- lttermgt accessible lt/termgt
- ltdefinedAsgt Describes a site, building,
facility, or portion thereof... lt/definedAsgt - lt/definitiongt
- ltdefinitiongt
- lttermgt clear lt/termgt
- ltdefinedAsgt Unobstructed. lt/definedAsgt
- lt/definitiongt
- lt/regElementgt
21Example of indexTerm, concept, measurement
exception tags
- Original section 4.6.3 from the UFAS
- 4.6.3 PARKING SPACES. Parking spaces for
disabled people shall be at least 96 in (2440 mm)
wide and shall have an adjacent access aisle 60
in (1525 mm) wide minimum (see Fig. 9). Parking
access aisles shall be part of ... - EXCEPTION an adjacent access aisle at least
96 in (2440 mm) wide complying with 4.5... - Refined section 4.6.3 in XML format
- ltregElement nameufas.4.6.3 titleparking
spaces asterisk1gt - ltconcept nameaccess aisle num3 /gt
- ltindexTerm nameaccessible circulation
route num1 /gt - ltmeasurement unitinch magnitude96
quantifiermin /gt - ltref nameufas.4.5 num1 /gt
- ltregTextgt Parking spaces for disabled
people shall ... lt/regTextgt - ltexceptiongt If accessible parking spaces
for ... lt/exceptiongt - lt/regElementgt
22Usages of extracted features revisited
Usages of features
Extracted features
23Scope
- 1. Overview
- Examples of system capabilities
- 2. Repository development 3. Relatedness
analysis
24Relatedness analysis
25Relatedness analysis
- To utilize the structure, referencing of
regulations and domain knowledge to obtain a
better comparison - Measure
- Similarity score f(A, U) ? (0, 1)
- Nodes A and U are provisions from two different
regulation trees
f ? (0, 1)
26Base score f0 computation
- Linear combination of feature matching
- F(A,U,i) similarity score between Sections
(A,U) based on feature i - N total number of features
- Feature matching
- Based on the Vector model using cosine similarity
as the distance between feature vectors - Similarity between two documents M and N
- dM and dN are document vectors
- Cosine is normalized gt always between 0 and 1
27Example of feature vectors
- Traditional term match
- each index term i is assigned a positive and
non-binary weight wi,M in each document vector d
M - Weight selection
- Frequency of term, or
- tf ? idf model
- tf term frequency term density
- idf inverse document frequency log(n/ni)
term rarity - Excluding stopwords
- Feature concept
- Concept vectors are formed per provision based on
concept frequency in each provision - F(provision 1, provision 2, featureconcept)
- cosine between two concept vectors
28Axis dependency non-Boolean matching
- Vector model assumes mutual independence between
axes - Domain experts do not necessarily agree
- A measurement of 2 inches max can be a 70
match to 2 inches - Synonyms exist, e.g., ontology defined for
chemicals - Limitation observed
- Need flexibility to model domain knowledge, such
as a 0, 50, 75 and 100 measurement match
29Proposed non-Boolean matching model
- Define a feature matching matrix E
- Eij match between features i and j
- E.g., a 3-dimensional vector space using 2 ppm,
2 ppm max and 2 ft as the first, second and
third measurement axes - E
- Vector space transformation
- Map feature vectors onto an alternate space via
matrix D - Cosines are computed on the consolidated
frequency vectors - E.g., similarity based on measurements
30Vector space transformation
- Define D such that E DTD is fulfilled
- Cosine between the consolidated frequency
vectors -
-
-
-
- Reduces to a Boolean cosine when E I
31Score refinements based on regulation structure
- Neighbor inclusion
- Diffusion of similarity between clusters of nodes
in the tree - Self vs. parent-sibling-child (psc), fs-psc
- psc vs. psc, fpsc-psc
32Neighbor inclusion psc vs. psc
- Take a linear combination of neighboring pair
scores - Formulate a neighbor structure matrix N
- Define score matrix ?
- We have ?psc-psc NA?0NUT
33Neighbor inclusion self vs. psc
- Take a linear combination of neighbor vs. self
scores - Formulate a neighbor structure matrix N
- Define score matrix ?
- We have ?s-psc ½ (?0NUT NA?0)
34Score refinements based on regulation structure
- Reference distribution
- Diffusion of similarity between referencing nodes
and referenced nodes in the tree - E.g., f(A5.3, U6.4(a)) updates f(A2.1, U3.3)
35Reference distribution s-ref and ref-ref
- Take a linear combination of reference vs. self
and reference vs. reference scores - Formulate a reference structure matrix R
- Define score matrix ?
- We have ?ref-ref RA?0RUT and ?s-ref ½ (?0RUT
RA?0)
36Example of results UFAS vs BS8300
- Phrasing difference between American and British
regulations - ufas.4.13.9 Door Hardware. Handles, pulls,
latches, locks, and other operating devices on
accessible doors shall have a shape that is easy
- bs8300.12.5.4.2 Door Furniture. Door handles on
hinged and sliding doors in accessible bedrooms
should be easy to grip - Neighbor similarities imply similarity between
the interested nodes
37Example of results almost identical provisions
Regulation comparison 40CFR vs. CCR
38Example of results e-rulemaking
- Application domain e-rulemaking
- Comparison between draft of rules and the
associated public comments - ADAAG Chapter 11, rights-of-way draft
- Less than 15 pages
- Over 1400 public comments received within 4
months - Comments 10MB in size most are several pages
long - ? New regulation draft can easily generate a huge
amount of data that needs to be reviewed and
analyzed
39Example of results e-rulemaking
Regulations compared with public comments
40Example of results e-rulemaking
- Related draft section and public comment
- Adaag.1105.4.1
- Where signal timing is inadequate for full
crossing of all traffic lanes or where the
crossing is not signalized, cut-through medians - Deborah Wood, October 29, 2002
- This often means walk lights that are so short
in duration that by the time a person who is
blind realizes - No identified related section
- Donna Ring, September 6, 2002
- If you become blind, no amount of electronics
will make you safe You have to learn modern
blindness skills from a good teacher. You have
to practice your new skills - ? Concern not addressed in the draft
41Conclusions
- An infrastructure for
- Repository for regulations
- Shallow parser
- Feature extractions
- Similarity comparison
- Base score
- Score refinements
- Results
- Comparisons between Federal codes, European codes
- Application to e-rulemaking
- Future Directions
- Extension of application to other domains of
semi-structured documents - Conflict analysis?
42Thank You!