使用Java的Lucene搜索工具對檢索結果進行分組和分頁

2019-11-26 14:31:18

字體：大中小

來源：轉載

供稿：網友

使用GroupingSearch對搜索結果進行分組
Package org.apache.lucene.search.grouping Description

這個模塊可以對Lucene的搜索結果進行分組，指定的單值域被聚集到一起。比如，根據”author“域進行分組，“author”域值相同的的文檔分成一個組。

進行分組的時候需要輸入一些必要的信息：

1、groupField：根據這個域進行分組。比如，如果你使用“author”域進行分組，那么每一個組里面的書籍都是同一個作者。沒有這個域的文檔將被分到一個單獨的組里面。

2、groupSort：組排序。

3、topNGroups：保留多少組。比如，10表示只保留前10組。

4、groupOffset：對排在前面的哪些分組組進行檢索。比如，3表示返回7個組（假設opNGroups等于10）。在分頁里面很有用，比如每頁只顯示5個組。

5、withinGroupSort：組內文檔排序。注意：這里和groupSort的區別

6、withingroupOffset：對每一個分組里面的哪些排在前面的文檔進行檢索。

使用GroupingSearch 對搜索結果分組比較簡單

GroupingSearch API文檔介紹：

Convenience class to perform grouping in a non distributed environment.

非分布式環境下分組

WARNING: This API is experimental and might change in incompatible ways in the next release.

這里使用的是4.3.1版本

一些重要的方法：

GroupingSearch：setCaching(int maxDocsToCache, boolean cacheScores) 緩存
GroupingSearch：setCachingInMB(double maxCacheRAMMB, boolean cacheScores) 緩存第一次搜索結果，用于第二次搜索
GroupingSearch：setGroupDocsLimit(int groupDocsLimit) 指定每組返回的文檔數，不指定時，默認返回一個文檔
GroupingSearch：setGroupSort(Sort groupSort) 指定分組排序

示例代碼：

1.先看建索引的代碼

public class IndexHelper {  private Document document;  private Directory directory;  private IndexWriter indexWriter;   public Directory getDirectory(){    directory=(directory==null)? new RAMDirectory():directory;    return directory;  }   private IndexWriterConfig getConfig() {    return new IndexWriterConfig(Version.LUCENE_43, new IKAnalyzer(true));  }   private IndexWriter getIndexWriter() {    try {      return new IndexWriter(getDirectory(), getConfig());    } catch (IOException e) {      e.printStackTrace();      return null;    }  }   public IndexSearcher getIndexSearcher() throws IOException {    return new IndexSearcher(DirectoryReader.open(getDirectory()));  }   /**   * Create index for group test   * @param author   * @param content   */  public void createIndexForGroup(int id,String author,String content) {    indexWriter = getIndexWriter();    document = new Document();    document.add(new IntField("id",id, Field.Store.YES));    document.add(new StringField("author", author, Field.Store.YES));    document.add(new TextField("content", content, Field.Store.YES));    try {      indexWriter.addDocument(document);      indexWriter.commit();      indexWriter.close();    } catch (IOException e) {      e.printStackTrace();    }  }}

2.分組：

public class GroupTestpublic void group(IndexSearcher indexSearcher,String groupField,String content) throws IOException, ParseException {    GroupingSearch groupingSearch = new GroupingSearch(groupField);    groupingSearch.setGroupSort(new Sort(SortField.FIELD_SCORE));    groupingSearch.setFillSortFields(true);    groupingSearch.setCachingInMB(4.0, true);    groupingSearch.setAllGroups(true);    //groupingSearch.setAllGroupHeads(true);    groupingSearch.setGroupDocsLimit(10);     QueryParser parser = new QueryParser(Version.LUCENE_43, "content", new IKAnalyzer(true));    Query query = parser.parse(content);     TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, 0, 1000);     System.out.println("搜索命中數：" + result.totalHitCount);    System.out.println("搜索結果分組數：" + result.groups.length);     Document document;    for (GroupDocs<BytesRef> groupDocs : result.groups) {      System.out.println("分組：" + groupDocs.groupValue.utf8ToString());      System.out.println("組內記錄：" + groupDocs.totalHits);       //System.out.println("groupDocs.scoreDocs.length:" + groupDocs.scoreDocs.length);      for (ScoreDoc scoreDoc : groupDocs.scoreDocs) {        System.out.println(indexSearcher.doc(scoreDoc.doc));      }    }  }

3.簡單的測試：

public static void main(String[] args) throws IOException, ParseException {    IndexHelper indexHelper = new IndexHelper();    indexHelper.createIndexForGroup(1,"紅薯", "開源中國");    indexHelper.createIndexForGroup(2,"紅薯", "開源社區");    indexHelper.createIndexForGroup(3,"紅薯", "代碼設計");    indexHelper.createIndexForGroup(4,"紅薯", "設計");    indexHelper.createIndexForGroup(5,"覺先", "Lucene開發");    indexHelper.createIndexForGroup(6,"覺先", "Lucene實戰");    indexHelper.createIndexForGroup(7,"覺先", "開源Lucene");    indexHelper.createIndexForGroup(8,"覺先", "開源solr");     indexHelper.createIndexForGroup(9,"散仙", "散仙開源Lucene");    indexHelper.createIndexForGroup(10,"散仙", "散仙開源solr");    indexHelper.createIndexForGroup(11,"散仙", "開源");    GroupTest groupTest = new GroupTest();     groupTest.group(indexHelper.getIndexSearcher(),"author", "開源");  }}

4.測試結果：

20163684827254.png (1168×355)

兩種分頁方式
Lucene有兩種分頁方式：

1、直接對搜索結果進行分頁，數據量比較少的時候可以用這種方式，分頁代碼核心參照：

ScoreDoc[] sd = XXX;// 查詢起始記錄位置int begin = pageSize * (currentPage - 1);// 查詢終止記錄位置int end = Math.min(begin + pageSize, sd.length);for (int i = begin; i < end && i <totalHits; i++) {//對搜索結果數據進行處理的代碼}

2、使用searchAfter(...)

Lucene提供了五個重載方法，可以根據需要使用

20163684904821.png (1012×281)

ScoreDoc after：為上次搜索結果ScoreDoc總量減1；

Query query：查詢方式

int n：為每次查詢返回的結果數，即每頁的結果總量

一個簡單的使用示例：

//可以使用Map保存必要的搜索結果Map<String, Object> resultMap = new HashMap<String, Object>();ScoreDoc after = null;Query query = XXTopDocs td = search.searchAfter(after, query, size); //獲取命中數resultMap.put("num", td.totalHits); ScoreDoc[] sd = td.scoreDocs;for (ScoreDoc scoreDoc : sd) {//經典的搜索結果處理}//搜索結果ScoreDoc總量減1after = sd[td.scoreDocs.length - 1]; //保存after用于下次搜索，即下一頁開始 resultMap.put("after", after); return resultMap;

上一篇：以用戶名注冊為例分析三種Action獲取數據的方式

下一篇：詳解Java編程中Annotation注解對象的使用方法