hibernate提供了全文索引功能,非常棒,這里簡要介紹下它的用法,
1. 在pom.xml引入包依賴
<dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search-orm</artifactId> <version>${hibernate-search.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-smartcn</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-queryparser</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-phonetic</artifactId> <version>${lucene.version}</version> </dependency>
hibernate配置 search index保存路徑
<bean id="sessionFactory" class="org.springframework.orm.hibernate4.LocalSessionFactoryBean" destroy-method="destroy"> <property name="dataSource" ref="poolingDataSource" /> <property name="configLocation"> <value> classpath:hibernate.cfg.xml </value> </property> <property name="hibernateProperties"> <props> <prop key="hibernate.dialect">${hibernate.dialect}</prop> <!-- Booleans can be easily used in expressions by declaring HQL query substitutions in Hibernate configuration --> <prop key="hibernate.query.substitutions">true 'Y', false 'N'</prop> <!-- http://ehcache.org/documentation/integrations/hibernate --> <!-- http://www.tutorialspoint.com/hibernate/hibernate_caching.htm --> <prop key="hibernate.cache.use_second_level_cache">true</prop> <!-- org.hibernate.cache.ehcache.EhCacheRegionFactory --> <prop key="hibernate.cache.region.factory_class">org.hibernate.cache.ehcache.EhCacheRegionFactory</prop> <!-- hibernate只會緩存使用load()方法獲得的單個持久化對象,如果想緩存使用findall()、 list()、Iterator()、createCriteria()、createQuery() 等方法獲得的數(shù)據(jù)結(jié)果集的話,就需要設(shè)置hibernate.cache.use_query_cache true --> <prop key="hibernate.cache.use_query_cache">true</prop> <prop key="net.sf.ehcache.configurationResourceName">ehcache-hibernate.xml</prop> <!-- Hibernate Search index directory --> ***<prop key="hibernate.search.default.indexBase">indexes/</prop>*** </props> </property> </bean>
對需要搜索的類加上Indexed Annotation,然后對類中可以被搜索的字段加上@Field Annotation,通常Enum字段不需要Analyzer進行詞法分析,其他字段則需要,對于不需要Projection(返回部分字段)的情況下,不需要在index中存儲實際數(shù)據(jù)。可以通過AnalyzerDef來定義不同的詞法分析器以及對于的特殊詞過濾器
@Indexed@AnalyzerDef( name="enTopicAnalyzer", charFilters={ @CharFilterDef(factory=HTMLStripCharFilterFactory.class) }, tokenizer=@TokenizerDef(factory=StandardTokenizerFactory.class), filters={ @TokenFilterDef(factory=StandardFilterFactory.class), @TokenFilterDef(factory=StopFilterFactory.class), @TokenFilterDef(factory=PhoneticFilterFactory.class, params = { @Parameter(name="encoder", value="DoubleMetaphone") }), @TokenFilterDef(factory=SnowballPorterFilterFactory.class, params = { @Parameter(name="language", value="English") }) } )public class Topic { ...... @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO) @Analyzer(definition = "enTopicAnalyzer") private String title; ...... @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO) @Analyzer(definition = "enTopicAnalyzer") private String content; ...... @Enumerated(EnumType.STRING) @Field(index=Index.YES, analyze=Analyze.NO, store=Store.NO, bridge=@FieldBridge(impl=EnumBridge.class)) private TopicStatus status; ... }
通過代碼對已有數(shù)據(jù)創(chuàng)建index
ApplicationContext context = new ClassPathXmlApplicationContext("spring-resources.xml"); SessionFactory sessionFactory = (SessionFactory) context.getBean("sessionFactory"); Session sess = sessionFactory.openSession(); FullTextSession fullTextSession = Search.getFullTextSession(sess); try { fullTextSession.createIndexer().startAndWait(); } catch (InterruptedException e) { LOG.error(e.getMessage(), e); } finally { fullTextSession.close(); } ((AbstractApplicationContext)context).close();
創(chuàng)建查詢fulltextsession,按照query條件獲取結(jié)果
FullTextSession fullTextSession = Search .getFullTextSession(getSession()); QueryBuilder queryBuilder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity(Show.class).get(); org.apache.lucene.search.Query luceneQuery = null; luceneQuery = queryBuilder.keyword()// .wildcard() .onFields("title", "content").matching(query.getKeyword()) // .matching("*" + query.getKeyword() + "*") .createQuery(); FullTextQuery hibernateQuery = fullTextSession.createFullTextQuery( luceneQuery, Show.class); return hibernateQuery.list();
note:
1. 在一次測試過程中,修改了value object,添加了新的index,忘記了rebuildIndex,結(jié)果unit test沒問題,生成環(huán)境就出錯了。
2. 搜索還不是很強大,比如搜索測,含有測試的結(jié)果可能就搜索不出來
中文詞法分析
hibernate search底層使用Lucene,所以Lucene可以使用的中文分詞,hibernate search都可以用來支持中文詞法分析,比較常用的詞法分析器包括paoding,IKAnalyzer,mmseg4j 等等。具體可以參考分詞分析 最近分析。hibernate search默認(rèn)的分詞器是org.apache.lucene.analysis.standard.StandardAnalyzer,中文按字分詞,顯然不符合我們的需求。
這里介紹一下如何在hibernate中配置中文分詞,選擇的是Lucene自帶的中文分詞
主站蜘蛛池模板:
什邡市|
南汇区|
土默特右旗|
沅江市|
揭阳市|
新昌县|
孝昌县|
正镶白旗|
无棣县|
德惠市|
额敏县|
承德市|
宁河县|
定安县|
抚宁县|
丰县|
兴海县|
嘉鱼县|
南乐县|
金堂县|
佛山市|
襄城县|
阳泉市|
宝兴县|
天门市|
太仓市|
农安县|
中江县|
阿拉善盟|
施甸县|
卢龙县|
洛宁县|
张掖市|
萨迦县|
洞口县|
黑水县|
乌拉特后旗|
津市市|
南丹县|
常州市|
祁东县|