Daniel%20G.%20Bobrow - PowerPoint PPT Presentation

About This Presentation
Title:

Daniel%20G.%20Bobrow

Description:

... research on the effects of secondhand smoking published prior to 1985. ... In addition to the 19 hijackers, 2973 people died in the terrorist attack ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 12
Provided by: Lawren65
Category:
Tags: 20bobrow | 20g | daniel

less

Transcript and Presenter's Notes

Title: Daniel%20G.%20Bobrow


1
Enhancing Legal Discovery with Linguistic
Processing
  • Daniel G. Bobrow
  • Research Fellow
  • Palo Alto Research Center Inc.
  • with Tracy King and Lawrence Lee
  • June 4, 2007

2
The problems in Legal Discovery
  • Recall
  • Nothing relevant left behind
  • Precision
  • Very little irrelevant to ignore
  • Scalability
  • Need to handle more and more
  • Privacy
  • What they see is only what they should get

3
Today negotiated keyword search protocol
  • All documents discussing or referencing
    scientific research on the effects of secondhand
    smoking published prior to 1985.
  • Defendants Initial Proposal secondhand
    smok! and (finding or science or or research)
    and (1985 or 1984 or 1983 or 1982 or 1981 or 1980
    or 197! or 196! or 195!)
  • Plaintiffs Rejoinder ((find! or result! or
    effect!) w/page (secondhand or second hand)) or
    (other! w/5 smok!)
  • All documents relating to destruction of records
    under defendants records retention policies and
    practices.
  • Defendants Initial Proposal records and
    destruction
  • Plaintiffs Counterproposal destr! or elim!
    or dispos! or purg! or recycl! or retain! or
    reten!

4
Linguistic enhancement of keyword queries
  • Inflexional morphology forms of verbs
  • destroy ?destroys, destroyed, destroying,
  • comply ?complies, complied, complying
  • Derivational morphology verbs ? nouns
  • destroy ? destruction, destroyer, ..
  • comply ? compliance,
  • retain ? retention,
  • Word taxonomy (e.g. WordNet)
  • result ?consequence, effect, outcome, result,
    event, issue, upshot

5
Processing the collection rather than the
queriesASKER A Semantically-indexed Knowledge
Repository
IntelligenceSource Documents
Filteredanswers
TextPassages
Query
QueryAKR
Expand
Simplify
Queryindexterms
Passage, AKR index terms
Retrievedpassages AKR
6
Normalize to Semantic Representation
  • Syntactic Normalization
  • morphological
  • bought ? buy past
  • structural
  • the file was lost by Mary? Mary lost the file
  • derivational
  • the destruction of the memo by the CEO ? the
    CEO destroyed the memo
  • Semantic normalization
  • word to list of WordNet synsets
  • buy ? buy, purchase,
  • Connect predicate and arguments
  • Preddestroy Agent CEO Theme memo
  • Fill in implicit arguments
  • Ed was easy to please ? Ed was pleased

7
Improved Recall (Google and Asker on Wikipedia)
  • Query How many terrorists have died?
  • Google
  • In addition to the 19 hijackers, 2973 people died
    in the terrorist attack ...
  • Although there were security alerts at many
    locations, no other terrorist incidents occurred
    outside central London.
  • This is a list of sportspeople who have died
  • Asker
  • The encounter resulted in the deaths of two
    terrorists of the Al Omar Tanzeem
  • In blazing gunfire, five of the insurgents
    perished
  • see to it that those terrorists die and are
    broken

8
Improved Precision (Using argument roles for
relevance test)
  • Query What terrorists have been killed?
  • Google
  • .. not include most people killed in big
    terrorist bombings
  • act of terrorism in which 93 innocent people
    have been killed or are missing in the ruins
  • Asker
  • During a two-hour gun battle in Mdantsane, police
    kill a terrorist or freedom fighter
  • All the three terrorists killed in this incident
    have been identified as Pakistani Nationals.
  • the former Socialist government carried out a
    covert campaign in which 27 suspected Basque
    terrorists were killed.

9
Scalability (Cost of doing linguistic processing
at scale)
  • Linguistic processing time lt 1 CPU sec/sentence
  • parsing, semantic normalization, indexing
  • Assumptions
  • Average collection size 100 million documents
  • Document size 25 sentences
  • 8 core processor -- 6K or 250/month
    (depreciated and housed for 3 years)
  • 2.5 million seconds month 100,000
    documents/core/month
  • Cost for handling 100 million documents/month
  • 1000 cores 125 processors250 32,000
  • Use human review query costs are in the noise

10
Privacy
  • Identify sensitive content by entity type and
    relationship (linguistic processing)
  • e.g. Phone numbers of people
  • Encrypt content to make content unreadable(PARC
    security technology)
  • Provide content-specific keys for those people
    with a need to know specific information
  • Additional PARC security technologies can
    identify additional content to be redacted to
    mitigate inference channels
  • can redacted information be discovered based on
    what is available?

11
Linguistic processing can be useful in legal
discovery
With good Recall, Precision, Scalability, Privacy
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com