lucene3.0基础实例 -

zhuliang1984723

浏览: 113550 次
性别:
来自: 深圳

最近访客更多访客>>

rocex

huozhiyun

u012363178

hzacxy123

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

lucene3.0基础实例

Lucene3.0和Lucene2.0API有多处改动，以下实例用Luence3.0实现。

第一部分：Lucene建立索引
Lucene建立索引主要有以下两步：
第一步：建立索引器
第二步：添加索引文件
准备在E盘建立testlucene文件夹，然后在testlucene下建立文件夹test和index两个文件夹。
在test文件夹下建立如下四个txt文件
a.txt 内容：中华人民共和国
b.txt 内容：人民共和国
c.txt 内容：人民
d.txt 内容：共和国

这四个文件就是我们要建立索引的文件，
Index文件夹作为索引结果输出文件夹

准备工作完成以后，我们开始建立索引。
第一步：建立索引器，如下
writer = new IndexWriter(FSDirectory.open(new File(Constants.INDEX_STORE_PATH)), new StandardAnalyzer(
Version.LUCENE_30), true, IndexWriter.MaxFieldLength.LIMITED);

第二步：添加索引文件
writer.addDocument(doc);

具体完整代码如下：

package testlucene;
import java.util.Date;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import java.io.File;
import java.io.FileInputStream;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.apache.lucene.document.Field;
import org.apache.lucene.util.Version;
import org.apache.lucene.store.FSDirectory;
public class LuceneIndex {
// 索引器对象
private IndexWriter writer = null ;
// 在构造函数中建立索引器
public LuceneIndex() {
try {
writer = new IndexWriter(FSDirectory.open( new File(Constants.INDEX_STORE_PATH)), new StandardAnalyzer(Version.LUCENE_30), true , IndexWriter.MaxFieldLength.LIMITED); // 有变化的地方
} catch (Exception e) {
e.printStackTrace();
}
}
public Document getDocument(File f) throws Exception {
// 生成文档对象
Document doc = new Document();
// 获取文件输入流
FileInputStream input = new FileInputStream(f);
BufferedReader bufferedReader = new BufferedReader( new InputStreamReader(input));
// 添加索引内容
doc.add(new Field( "content" , bufferedReader)); // Lucene3.0有变化的地方
doc.add(new Field( "path" , f.getAbsolutePath(), Field.Store.YES, Field.Index.ANALYZED)); // Lucene3.0有变化的地方
return doc;
}
public void writeToIndex() throws Exception {
File folder = new File(Constants.INDEX_FILE_PATH);
if (folder.isDirectory()) {
String[] files = folder.list();
for ( int i = 0 ; i < files.length; i++) {
File file = new File(folder, files[i]);
Document doc = getDocument(file);
System.out.println("正在建立索引：" + file + " " );
// 添加索引文件
writer.addDocument(doc);
}
}else {
System.out.println("-----folder.isDirectory():false." );
}
}
public void close() throws Exception {
writer.close();
}
public static void main(String[] args) throws Exception {
// 声明一个对象
LuceneIndex indexer = new LuceneIndex();
// 建立索引
Date start = new Date();
indexer.writeToIndex();
Date end = new Date();
System.out.println("建立索引用时：" + (end.getTime() - start.getTime()) + "毫秒" );
// 关闭索引器
indexer.close();
}
}

package testlucene;
public class Constants {
//要建立索引的文件的存放路径
public static final String INDEX_FILE_PATH = "E://testlucene//test" ;
//索引存放的位置
public static final String INDEX_STORE_PATH = "E://testlucene//index" ;
}

最后，执行程序，结果如下：
正在建立索引：E:/testlucene/test/a.txt
正在建立索引：E:/testlucene/test/b.txt
正在建立索引：E:/testlucene/test/c.txt
正在建立索引：E:/testlucene/test/d.txt
建立索引用时：47毫秒
在E:/testlucene/index下发现索引结果文件
_7.cfs segments.gen segments_9

第二部分：在索引上检索
在索引上搜索主要包括个步骤，使用两个对象—IndexSearcher和Query。
检索步骤：
第一步：创建索引器
searcher = new IndexSearcher(IndexReader.open(FSDirectory.open(new File(Constants.INDEX_STORE_PATH))));

第二步：将待检索关键字打包成Query对象
query = queryParser.parse(keyword);

第三步：使用索引器检索Query，得到检索结果Hits对象
TopDocs hits = searcher.search(query, 10);
最后，将检索到的结果Hits打印出来：
   for (int i = 0; i < hits.scoreDocs.length; i++) {
    try {
     ScoreDoc scoreDoc = hits.scoreDocs[i];// 有变化的地方
     Document doc = searcher.doc(scoreDoc.doc);// 有变化的地方
     System.out.print("这是第" + (i+1) + "个检索结果，文件路径为:");
     System.out.println(doc.get("path"));

} catch (Exception ex) {

}
全部程序如下：

package testlucene;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.Query;
import org.apache.lucene.queryParser.QueryParser;
import java.util.Date;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.document.Document;
import org.apache.lucene.util.Version;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.store.FSDirectory;
import java.io.File;
public class LuceneSearch {
// 声明一个IndexSearcher对象
private IndexSearcher searcher = null ;
// 声明一个Query对象
private Query query = null ;
public LuceneSearch() {
try {
// 创建索引器
searcher = new IndexSearcher(IndexReader.open(FSDirectory.open( new File(Constants.INDEX_STORE_PATH))));
} catch (Exception e) {
e.printStackTrace();
}
}
public final TopDocs search(String keyword) {
System.out.println("正在搜素关键字：" + keyword);
try {
QueryParser queryParser = new QueryParser(Version.LUCENE_30, "content" , new StandardAnalyzer(Version.LUCENE_30));
// 将待检索关键字打包成Query对象
query = queryParser.parse(keyword);
Date start = new Date();
// 使用索引器检索Query，得到检索结果Hits对象
TopDocs hits = searcher.search(query, 10 ); // 有变化的地方
Date end = new Date();
System.out.println("搜索完毕用时:" + (end.getTime() - start.getTime()) + "毫秒" );
return hits;
} catch (Exception ex) {
return null ;
}
}
public void printResult(TopDocs hits) {
if (hits.totalHits == 0 ) {
System.out.println("没有找到您需要的结果" );
} else {
for ( int i = 0 ; i < hits.scoreDocs.length; i++) {
try {
ScoreDoc scoreDoc = hits.scoreDocs[i];// 有变化的地方
Document doc = searcher.doc(scoreDoc.doc);// 有变化的地方
System.out.print("这是第" + (i+ 1 ) + "个检索结果，文件路径为:" );
System.out.println(doc.get("path" ));
} catch (Exception ex) {
}
}
}
System.out.println("--------------------------------" );
}
public static void main(String[] args) throws Exception {
LuceneSearch test = new LuceneSearch();
TopDocs hits = null ;
hits = test.search("中华" );
test.printResult(hits);
hits = test.search("人民" );
test.printResult(hits);
hits = test.search("共和国" );
test.printResult(hits);
}
}

在执行第一部分的程序得到索引后，执行搜索程序LuceneSearch，在控制台下得到结果如下：
（对比我们在f:/testlucene /test下的四个文件可知，检索结果正确）
正在搜素关键字：中华
搜索完毕用时:15毫秒
这是第1个检索结果，文件路径为:E:/testlucene/test/a.txt
--------------------------------
正在搜素关键字：人民
搜索完毕用时:0毫秒
这是第1个检索结果，文件路径为:E:/testlucene/test/c.txt
这是第2个检索结果，文件路径为:E:/testlucene/test/b.txt
这是第3个检索结果，文件路径为:E:/testlucene/test/a.txt
--------------------------------
正在搜素关键字：共和国
搜索完毕用时:0毫秒
这是第1个检索结果，文件路径为:E:/testlucene/test/d.txt
这是第2个检索结果，文件路径为:E:/testlucene/test/b.txt
这是第3个检索结果，文件路径为:E:/testlucene/test/a.txt
--------------------------------

总结
通过以上两篇文章我们看以看到使用lucene建立索引过程主要有一下4步：
1.提取文本
2.构建Document
3.分析
4.建立索引

LuceneSearcher.rar (15.2 KB)
下载次数: 68

分享到：

lucene3.0中文分词实例 | 解决JBOSS6实体BEAN 持久化单元不能连接多 ...

2011-10-12 16:41
浏览 2055
评论(1)
分类:编程语言
查看更多

1 楼 chasewade 2012-02-17

很好很适合入门

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene3.0基础实例

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene3.0基础实例

评论

发表评论

相关推荐

最近访客更多访客>>