Lucene

Lucene is a Perl API to the C port of the Lucene search engine.
Download

Lucene Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Thomas Busch
  • Publisher web site:
  • http://search.cpan.org/~tbusch/Lucene-0.13/lib/Lucene.pm

Lucene Tags


Lucene Description

Lucene is a Perl API to the C port of the Lucene search engine. Lucene is a Perl API to the C port of the Lucene search engine.SYNOPSISInitialize/Empty Lucene index my $analyzer = new Lucene::Analysis::Standard::StandardAnalyzer(); my $store = Lucene::Store::FSDirectory->getDirectory("/home/lucene", 1); my $tmp_writer = new Lucene::Index::IndexWriter($store, $analyzer, 1); $tmp_writer->close; undef $tmp_writer;Choose your Analyzer (string tokenizer) # lowercases text and splits it at non-letter characters my $analyzer = new Lucene::Analysis::SimpleAnalyzer(); # same as before and removes stop words my $analyzer = new Lucene::Analysis::StopAnalyzer(); # same as before but you provide your own stop words my $analyzer = new Lucene::Analysis::StopAnalyzer(); # splits text at whitespace characters my $analyzer = new Lucene::Analysis::WhitespaceAnalyzer(); # lowercases text, tokenized it based on a grammer that # leaves named authorities intact (e-mails, company names, # web hostnames, IP addresses, etc) and removed stop words my $analyzer = new Lucene::Analysis::Standard::StandardAnalyzer(); # same as before but you provide your own stop words my $analyzer = new Lucene::Analysis::Standard::StandardAnalyzer(); # takes string as it is (only when using clucene-0.9.17 or above) my $analyzer = new Lucene::Analysis::KeywordAnalyzer();Create a custom Analyzer package MyAnalyzer; use base 'Lucene::Analysis::Analyzer'; # You MUST called SUPER::new if you implement new() sub new { my $class = shift; my $self = $class->SUPER::new(); # ... return $self; } sub tokenStream { my ($self, $field, $reader) = @_; my $ret = new Lucene::Analysis::StandardTokenizer($reader); if ($field eq "MyKeywordField") { return $ret; } $ret = new Lucene::Analysis::LowerCaseFilter($ret); $ret = new Lucene::Analysis::StopFilter($ret, ); return $ret; } package main; my $analyzer = new MyAnalyzer;Choose your Store (storage engine) # in-memory storage my $store = new Lucene::Store::RAMDirectory(); # disk-based storage my $store = Lucene::Store::FSDirectory->getDirectory("/home/lucene", 0);Open and configure an IndexWriter my $writer = new Lucene::Index::IndexWriter($store, $analyzer, 0); # optional settings for power users $writer->setMergeFactor(100); $writer->setUseCompoundFile(0); $writer->setMaxFieldLength(255); $writer->setMinMergeDocs(10); $writer->setMaxMergeDocs(100);Create Documents and add Fields my $doc = new Lucene::Document; # field gets analyzed, indexed and stored $doc->add(Lucene::Document::Field->Text("content", $content)); # field gets indexed and stored $doc->add(Lucene::Document::Field->Keyword("isbn", $isbn)); # field gets just stored $doc->add(Lucene::Document::Field->UnIndexed("sales_rank", $sales_rank)); # field gets analyzed and indexed $doc->add(Lucene::Document::Field->UnStored("categories", $categories)); Requirements: · Perl


Lucene Related Software