Nave Bayes Classifier - PowerPoint PPT Presentation

About This Presentation
Title:

Nave Bayes Classifier

Description:

create and test the effectiveness of a na ve Bayes classifier on the 20 Newsgroup database ... 20 most frequent words in sci.space from 20 Newsgroup ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 8
Provided by: christina151
Learn more at: https://www.tjhsst.edu
Category:

less

Transcript and Presenter's Notes

Title: Nave Bayes Classifier


1
Naïve Bayes Classifier
Christina Wallin, Period 3 Computer Systems
Research Lab 2008-2009
2
Goal
  • -create and test the effectiveness of a naïve
    Bayes classifier on the 20 Newsgroup database
  • -compare the effectiveness of a simple naïve
    Bayes classifier and one optimized
  • -possible optimizations are using a Porter
    stemmer to make the program recognize words such
    as runs and running as the same word since
    they have the same stem

3
What is it?
-Classification method based on independence
assumption -Machine learning -trained with test
cases as to what the classes are, and then can
classify texts -classification based on the
probability that a word will be in a specific
class of text
4
Previous Research
  • Algorithm has been around for a while (first use
    is in 1966)
  • At first, it was thought to be less effective
    because of its simplicity and false independence
    assumption, but a recent review of the uses of
    the algorithm has found that it is actually
    rather effective("Idiot's Bayes--Not So Stupid
    After All?" by David Hand and Keming Yu)

5
Procedures
  • So far, a program which inputs a text file
  • Then, it parses that file and removes all of the
    punctuation and capitalization so that The.
    would be the same as the
  • Makes a dictionary of all of the words present
    and their frequency
  • With PyLab, graphs the 20 most frequent words

6
Results
20 most frequent words in sci.space from 20
Newsgroup
20 most frequent words in rec.sports.baseball
from 20 Newsgroup
7
Results
  • Approx the same length stories
  • sci.space more dense and less to the point
  • Most frequent word, the, the same
Write a Comment
User Comments (0)
About PowerShow.com