Statistical Generation - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Statistical Generation

Description:

Order 1: Then said unto all thine arrows of Joseph of Saul, that enter into stubble. ... Generating filler text for typography, spam emails, and bogus web pages ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 15
Provided by: VasileiosH9
Category:

less

Transcript and Presenter's Notes

Title: Statistical Generation


1
Statistical Generation
  • Vasileios Hatzivassiloglou
  • University of Texas at Dallas

2
Data King James bible
  • Order 1 Then said unto all thine arrows of
    Joseph of Saul, that enter into stubble. Darts
    are abomination to his servants. And it shall
    stink and went to a present himself without
    fear.
  • Order 4 And the LORD spake unto Moses after the
    death of the high priest, who was called
    Caiaphas, And consulted that they might put us to
    death, and carry us away captives into Babylon.

3
Mixing input sources
  • Alice was beginning to write out a history of all
    flesh, as God hath judged me, and I will tell you
    my adventures--beginning from this my oath, when
    thou fleddest from the engine, and everybody
    jumped up in alarm, For the Baker had met with
    again!
  • Here the King called out Ten minutes allowed for
    refreshments!' Haigha and Hatta set to work at
    once, and Alice lifted up his voice, and the
    flesh instead thereof.

4
Data Programming textbook
  • (Order 4) We always select the first line, we
    select the second line with probability one half,
    the third line with probability one third, and so
    on. Unfortunately, the array x0..n used for
    heaps requires n1 additional words of main
    memory. We turn now to the Heapsort, which
    improves this approach. It uses less code, it
    uses less space because it doesn't require the
    auxiliary array, and it uses less time.

5
Applications
  • Generating filler text for typography, spam
    emails, and bogus web pages
  • Studies of the writing process
  • Generating fake but authentic-looking text for
    validating a review process
  • Text generation and machine translation

6
The SCIGen controversy
  • Series of scientific conferences with some
    suspicious elements
  • very wide scope, credentials of organizers, nice
    locale, high acceptance rate, substantial fees
  • A group of graduate students at MIT built a
    random text generator to see if their paper would
    be accepted in WMSCI 2005 (World Multiconference
    on Systemics, Cybernetics, and Informatics)

7
SCIGen
  • Based on randomized context-free grammars
  • An MC-like approach but with additional structure
  • at the text level, allows for longer-distance
    dependencies
  • at the structure level, allows for the placement
    of sections, figures, and references
  • More at http//pdos.csail.mit.edu/scigen/

8
(No Transcript)
9
Results
  • Two papers were submitted One accepted
  • No reviews or ratings
  • Human-made fake papers have been accepted at
    other conferences with similar structure
  • rendering images in a sealed interior room
  • papers that mention extreme shortcomings
  • a paper that interleaves two others
  • submitting back the call for papers

10
Russian journal test (2008)
  • State requires all PhDs to publish at least one
    paper in an approved journal from a list
  • Once the list was formulated, it was soon
    infiltrated by additional, new journals
  • Experiment tests one of them, Journal of
    Scientific Publications of Aspirants and
    Doctorants

11
Procedure
  • Original Rooter paper translated into Russian
    with automatic machine translation
  • Modifications
  • Post-editing by hand for grammaticality
  • Insertion of Russian references
  • Change of institution and student authors name

12
Additional clues
  • Because human intervention may have made the text
    less random, additional clues were given to help
    the journal reviewers
  • One of the English author names was changed to
    Softporn
  • An acknowledgment was inserted Thanks to
    Professor Gelfand who introduced the author to
    the problem of publication of random texts

13
Review
  • Actuality high The choice of the study subject
    correct Setting aims logical Novelty
    excellent Depth sufficient Structure good
    Value of methods excellent Style
    non-satisfactory Practical efficacy excellent
    Coverage of literature excellent

14
Effects of this experiment
  • Widespread news coverage
  • Journal editorial board resigned
  • Journal removed from approved list
  • New quality control criteria added for the
    approved list
  • A new sense of the word korchevatel entered the
    Russian language
Write a Comment
User Comments (0)
About PowerShow.com