Lucene - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Lucene

Description:

Ferret. Show Me Some Code! ... I'm using Acts As Ferret, a plug-in for Ruby on Rails ... Ferret. Extra Stuff ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 14
Provided by: jtkim
Category:
Tags: ferret | lucene

less

Transcript and Presenter's Notes

Title: Lucene


1
Lucene
  • What is Lucene?
  • From Lucene's homepage Apache Lucene is a
    high-performance, full-featured text search
    engine library written entirely in Java. It is a
    technology suitable for nearly any application
    that requires full-text search, especially
    cross-platform.
  • From wikipedia.org While suitable for any
    application which requires full text indexing and
    searching capability, Lucene has been widely
    recognized for its utility in the implementation
    of internet search engines and local, single-site
    searching.
  • Lucene has been ported to programming languages
    including Perl, C, C, Python, Ruby and PHP.

2
Lucene
  • How Does Lucene Work?
  • Lucene is a set of tools that allows the
    developer to more easily (and powerfully) index
    and search through a set of documents.
  • Lucene has many features and is very powerful,
    especially when you learn to use all of its tools
    and features.

3
Lucene
  • Show Me Some Code!
  • public void indexFile(IndexWriter writer, File f)
    throws IOException
  • if(f.isHidden() !f.exists() !f.canRead())
  • return //In case there
    is some kind of problem as detailed in the 'if'
    statement
  • Document doc new Document()
  • doc.add(new Field("contents", new
    FileReader(f))) //indexes the file content
  • doc.add(new Field("filename",
    f.getCanonicalPath(), Field.Store.YES,
    Field.Index.UN_TOKENIZED)) //Indexes
    the filename as a keyword
  • writer.addDocument(doc)
    //This hands the document over to Lucene to
    index

4
Lucene
  • Show Me More!
  • public void search(File indexDir, String q)
    throws Exception
  • Directory fsDir FSDirectory.getDirectory(indexD
    ir, false) //The false has something to do
    with locking...
  • IndexSearcher is new IndexSearcher(fsDir)
  • QueryParser qp new QueryParser("conten
    ts", new StandardAnalyzer())
  • Query query qp.parse(q)
  • Hits hits is.search(query) //This
    search method returns a list of Hits, which are
    matching documents
  • System.out.println("Found "
    hits.length() " document(s) that matched query
    '" q "'\n")
  • for(int i0i lt hits.length()i)
  • Document doc hits.doc(i)
    //Grabs each doc from hits
  • System.out.println(doc.get("filename")
    ) //and displays filename

5
Lucene
  • Where and How can I learn more about Lucene?
  • Lucene's homepage http//lucene.apache.org/java/
    docs/
  • Lucene API http//lucene.apache.org/java/docs/ap
    i/
  • A Lucene Forum http//www.nabble.com/Lucene-f44.
    html
  • Lucene In Action by Erik Hatcher and Otis
    Gospodnetic. ISBN 1-932394-28-1
  • Ask me, I can try to help.

6
Lucene
  • About My Job
  • What is my job?
  • Why do you care?
  • How do I use Lucene at work?
  • Well, then what do you use?

7
Lucene Ferret
  • I use Ferret!
  • Ferret is the ruby port of Lucene. I use it
    because I am developing in Ruby on Rails.

8
Ferret
  • Show Me Some Code!
  • Find.find("/home/kimbell/ein/postings") do doc
    For every document in that directory...
  • if(File.directory?(doc))
    Is file a directory?
  • totalcount totalcount 1
  • next
  • elsif(File.extname(doc) ! ".txt")
    Is Make sure it is a text file
  • next
  • else
  • file_content ""
    Initialize variable to hold file content
  • File.open(doc) do infile
  • while (line infile.gets)
  • file_content ltlt line
    Store the content
  • end
  • end
  • index ltlt title gt File.basename(doc,
    ".txt"), content gt file_content Index the
    filename and content
  • totalcount totalcount 1
  • doccount doccount 1
  • end

9
Ferret
  • More Code!
  • searcher SearchSearcher.new("/home/kimbell/tes
    t/posting_index")
  • query_parser QueryParser.new(default_field gt
    "titlecontent")
  • query query_parser.parse("influenza")
  • puts "Searching for 'influenza'..."
  • resultHash This is a hash which
    will hold the search results
  • searcher.search_each(query,limit gt all) do
    id, score
  • title searcheridtitle
  • if(!resultHash.has_key?(title))
    If the document has not hits so far...
  • resultHashtitle
    RetrievedDocument.new(title, nil,
    searcheridcontent,score, 1)
  • else
    The document is already in
    the hash
  • resultHashtitle.hits
    resultHashtitle.hits 1 Increment
    the number of hits for this document
  • end
  • end

10
Ferret
  • Where and How can I learn more about Ferret?
  • Ferret homepage http//ferret.davebalmain.com/tr
    ac/
  • Ferret API http//ferret.davebalmain.com/api/
  • Ferret forum http//www.ruby-forum.com/forum/5
  • Ask me, I can try to help.

11
Ferret
  • But Wait, There's More!
  • I'm actually not using regular, plain old,
    vanilla ferret.
  • I'm using Acts As Ferret, a plug-in for Ruby on
    Rails
  • Why? It makes life easier.

12
Ferret
  • Where can I learn more about Acts As Ferret?
  • Acts As Ferret homepage http//projects.jkraemer
    .net/acts_as_ferret/wiki
  • Acts As Ferret API http//projects.jkraemer.net/
    acts_as_ferret/rdoc/
  • Acts As Ferret forum http//www.ruby-forum.com/f
    orum/5
  • Ask me, I'll try to help.

13
Ferret
  • Extra Stuff
  • http//cs.uiowa.edu/kimbell/196/ lt-- For all
    the code used in this presentation.
  • http//129.255.72.523000/search/ lt-- A quick
    demo of the Acts As Ferret site I am currently
    working on.
Write a Comment
User Comments (0)
About PowerShow.com