AdLaw04 - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

AdLaw04

Description:

Presented October 21, 2004 at the ... Invitation for other researchers to join the fray. Real-world data with immediate consequences ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 29
Provided by: stuartsu8
Category:
Tags: adlaw04 | fray | the

less

Transcript and Presenter's Notes

Title: AdLaw04


1
The Internet Still Might (but Probably Wont)
Change Everything
Jamie Callan Carnegie Mellon University Eduard
Hovy USC/Information Sciences Institute Stuart
Shulman University of Pittsburgh Stephen
Zavestoski University of San Francisco
Presented October 21, 2004 at theAmerican Bar
Associations Administrative Law and Regulatory
Practice Conferenceand October 22, 2004 before
the Regulatory Affairs Committee of the United
States Chamber of Commerce
2
Acknowledgements
  • This has been supported by grants from the
    National Science Foundation
  • EIA-0089892
  • SGER Citizen Agenda-Setting in the Regulatory
    Process Electronic Collection and Synthesis of
    Public Commentary
  • EIA 0327979, 0328175, 0328914 0328618
  • SGER Collaborative A Testbed for eRulemaking
    Data
  • IIS IIS-0429293
  • Collaborative Research Language Processing
    Technology for Electronic Rulemaking
  • SES-0322662
  • Democracy and E-Rulemaking Comparing
    Traditional vs. Electronic Comment from a
    Discursive Democratic Framework
  • Any opinions, findings and conclusions or
    recommendations expressed in this material are
    those of the authors and do not necessarily
    reflect those of the National Science Foundation

3
An eRulemaking Testbed
  • A repository of public comments
  • For example
  • USDAs National Organic standard
  • we are missing lots of paper comments
  • EPAs Definition of US Waters (Post-SWANCC ANPR)
  • we are missing lots of electronic mail
  • DOTs latest CAFÉ standard
  • electronic versus paper presorted, but some
    sticky PDFs
  • A testbed for new tools to analyze the text
  • Goal public and agency personnel experiment with
    the new tools and provide continuous feedback
  • Invitation for other researchers to join the fray
  • Real-world data with immediate consequences

http//hartford.lti.cs.cmu.edu/eRulemaking/Data.ht
ml
4
(No Transcript)
5
What if it were all paper?
  • For our research, mercury is the best dataset yet
  • 530,00 emails, all plain text, with many
    duplicative (similar identical) comments
  • Average length about one typed page
  • If all 1.8 gigabytes were on paper (which it is)
  • it would weigh 5,350 pounds (about 2.7 tons)
  • it would make a stack 214 feet high

6
Scientific Research Objectives
  • Applied Research Objectives
  • Help agencies handle the information
  • Measure efficiency quality improvements
  • Basic Research Objectives
  • Advance Natural Language Processing
  • Analyze and categorize text according to several
    novel dimensions stakeholders, opinions,
    arguments
  • Advance social science methods for measuring the
    impact of IT on democracy
  • Develop metrics for assessing the quality and
    content of the public comment
  • Develop a public comment database and coding
    scheme that facilitates greater empirical social
    science research

7
Problem Duplicate Detection
  • Many public comments are form letters or edited
    form letters
  • Real grassroots or astroturf created by
    interest groups, lobbies

8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
Duplicate Detection Solutions
  • Duplicate detection algorithms
  • Generate summary counts
  • Identify the reference copy
  • Summarize differences from reference copy
  • Near-duplicate detection techniques
  • Use cosine correlation to identify similar
    documents
  • Identify near-duplicates using document
    fingerprints
  • Sequences of words that match in each document
  • Output
  • A reliable and easy count of duplicates
  • Unique passages isolated and displayed/clustered

13
(No Transcript)
14
(No Transcript)
15
Near Duplicate Detection Examples
16
Near Duplicate Detection Examples
17
Near Duplicate Detection Examples
18
Near Duplicate Detection Examples
19
Near Duplicate Detection Examples
20
Near Duplicate Detection Examples
21
Capturing the Publics Comments
Par 2.2(a) I am for this I am against
this because This is what should be
done (please be specific)
  • Help the public to formulate comments
  • Link to existing commentary, regulation draft,
    other material

pollution _____________ (see similar
views) expense ______________ (see similar
views) unfit for elderly ____________ (see
similar views) unfit for children ___________
(see similar views) add a new reason ___________
(explore all views)
relax requirement c _____________ (similar
ideas) phase in alternative _____________
(similar ideas) add a new suggestion
_______________________ _________________
(explore all suggestions)
22
Clustering By Opinion
  • For each (sub)topic
  • Group together all Yes, No In-betweens
  • Create these categories manually and have the
    system learn to duplicate that
  • Extract reasons/motivations/authorities using
    characteristic phrases
  • against it because I think X
  • Y will have such a beneficial effect

23
An Analysts Workbench
Main Opinions
  • pro (19,566)
  • pollution (11,003)
  • energy-efficient (9,812)
  • safe or safety (534)
  • guarded pro (4,661)
  • energy (3,652)
  • safe or safety (1,202)
  • anti (8,002)
  • cost, expense, expensive (7,314)
  • guarded anti (758)
  • cost (500)
  • difficult (105)

24
Display Ideas
  • Grouped by opinion and writer type
  • Grouped by topic and cross-correlated
  • Par 2.2(a1)
  • Con
  • 150, 818 impossible to maintain
  • 272 too expensive for elderly
  • Pro
  • 169, 213, 391, 392, 394 already being done in
    Alaska
  • 18 extend to children

25
Putting It All Together
  • Assemble tools in a Reg-Writer Workbench
  • With interface/display, reg-writer is able to
  • switch modules on or off
  • move smoothly from one display to another
  • Ex., from comments to regulation to cluster
  • drag information easily into the response-writer
    window
  • produce integrated regulatory text, response,
    comments, etc.

26
Available online at http//erulemaking.ucsur.pitt.
edu
27
(No Transcript)
28
Thanks!
  • Dr. Stuart W. ShulmanUniversity of Pittsburgh
  • Shulman_at_pitt.edu (email)
  • http//shulman.ucsur.pitt.edu (home page)
  • 412.918.1651 (voice)
Write a Comment
User Comments (0)
About PowerShow.com