write down,forget
adidas eqt support ultra primeknit vintage white coming soon adidas eqt support ultra boost primeknit adidas eqt support ultra pk vintage white available now adidas eqt support ultra primeknit vintage white sz adidas eqt support ultra boost primeknit adidas eqt adv support primeknit adidas eqt support ultra boost turbo red white adidas eqt support ultra boost turbo red white adidas eqt support ultra boost turbo red adidas eqt support ultra whiteturbo adidas eqt support ultra boost off white more images adidas eqt support ultra boost white tactile green adidas eqt support ultra boost beige adidas eqt support ultra boost beige adidas eqt support refined camo drop adidas eqt support refined camo drop adidas eqt support refined running whitecamo adidas eqt support 93 primeknit og colorway ba7506 adidas eqt running support 93 adidas eqt support 93

Diving Into ElasticSearch (3) 编写自定义分词插件

<Category: Diving Into ElasticSearch> 查看评论

今天介绍下怎么样编写一个自己的分词插件,开始之前,先介绍下ES的项目结构:

2011-07-13_231544

分别介绍下吧:

.idea:IDEA的项目配置文件

bin:可执行脚本文件

config:配置文件

gradle:精简版的gradle

lib:里面主要放了sigar用来做资源监控

modules:ES主要模块在这里了

plugins:插件都放这里啦

plugins里面按类型分了好多目录,今天我们来看怎么自定义一个分词插件,所以放analysis目录里面。

我们先移植一个开源的IKAnalyzer到ES里面吧。

第一步,先在plugins下建立目录ik

完整路径如下:\plugins\analysis\ik

下面建立src目录和build目录、并建立文件build.gradle(内容从icu里面的build.gradle拷贝出来,然后做相应的调整,如下图所示)

2011-07-14_000407

第二步,新建目录src/main/java/org

第三步,在java下新建配置文件es-plugin.properities,里面一行指定当前插件的入口类(实现ES插件接口的类),如下图所示

2011-07-14_001303

 

第四步,在.idea目录里面,找到modules.xml,将我们自定义的module添加到工程中,注意路径,完了之后,会发现plugin下面的ik目录会变粗,说明该目录已经成为了项目中的一个module了。

2011-07-14_002422

第五步,在.idea/modules目录里面新建插件的配置文件如plugin-analysis-ik.iml,内容有下图所示

2011-07-14_001513

第六步,右键点击项目,选择Open Module Setting

应该会看到ik的模块配置,选中,然后分别设置目录的属性(有排除、源代码、测试三种),设置好之后如下图所示

2011-07-14_000744

第七步,在.idea/modules/里面的elasticsearch-root.iml加上一句,将我们的自定义module添加进去,如下图所示

2011-07-14_002023

最后一步,在elasticsearch根目录的setting.gradle文件里,添加一行来打包ik,如下图所示

2011-07-14_002946

配置的工作基本上到此结束,接着就是实现自己的AnalyzerProvider和AbstractPlugin了,具体代码可以看这里https://github.com/medcl/elasticsearch/commit/21abad12a0096173e8836dd042ca403751ab7ad1,就不一一列举了。

开始试验一下吧,默认bootstrap模式会加载所有插件,所有可以直接使用ik-analysis的插件。

curl –XGET http://localhost:9200/index/_analyze?text=%e8%84%91%e6%ae%8b%e7%89%87%e8%ae%a9%e4%bd%a0%e8%84%91%e6%ae%8b%ef%bc%8c%e7%a5%9e%e5%a5%87%e7%9a%84%e5%b0%8f%e8%8d%af%e4%b8%b8%e5%95%8a&analyzer=ik

通过服务端的日志,可以看到正确加载词库了。

2011-07-14_004324

刚刚分词的结果页出来了,ik正确加载了我自定义的词组“脑残片”,分词的结果也是正确的

2011-07-14_004615

本文来自: Diving Into ElasticSearch (3) 编写自定义分词插件


  1. 我用你的例子编译了一个版本。结果在添加索引的时候出错。请问大概是什么问题呢?default analyzer是ik

    这是报错信息:
    [2011-10-26 17:26:05,443][DEBUG][action.index ] [煞神鬼] [test1][2], node[467ZgQrkQ-2dZ2KGXTdyWQ], [P], s[STARTED]: Failed to execute [index {[test1][t][1], source[{“mess” : “hi”}]}]
    java.lang.NullPointerException
    at org.elasticsearch.index.analysis.FieldNameAnalyzer.reusableTokenStream(FieldNameAnalyzer.java:58)
    at org.elasticsearch.common.lucene.all.AllTokenStream.allTokenStream(AllTokenStream.java:38)
    at org.elasticsearch.common.lucene.all.AllField.tokenStreamValue(AllField.java:61)
    at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:102)
    at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:249)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:760)
    at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2055)
    at org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:550)
    at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:465)
    at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:303)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:188)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:418)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.access$100(TransportShardReplicationOperationAction.java:233)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:331)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:680)

    medcl Reply:

    @darkyoung, 不好意思,刚刚才看到评论,这个问题解决了么?配置文件是怎么样子的呢?