【ES】学习4-结构化搜索-白红宇

【ES】学习4-结构化搜索

阅读量：5102 次

发布时间：2019-06-13

本文共 3823 字，大约阅读时间需要 12 分钟。

1. 结构化搜索得到的结果只有是和否，没有相似概念。

term可以实现精确值查询

curl -XGET 'localhost:9200/logstash-cowrie/_search?pretty' -d '{
   "query":    { "constant_score":{        "filter":{            "term": {
   "src_ip": "192.168.188.88"}            }        }    }}'

因为term是非评分的，所以要用constant_score的方式将其转化为过滤器。注意：如果没有constant_score是会报错的。

查看字段是否可以用精确值查询，可以通过analyze API。下面是例子，可以看到ip只有一个token可以精确查询。

curl -XGET 'localhost:9200/test/_analyze?pretty' -d '{
   "field":"src_ip","text": "192.168.188.88"}'#结果{  "tokens" : [    {      "token" : "192.168.188.88",      "start_offset" : 0,      "end_offset" : 14,      "type" : "
     
      ",      "position" : 0    }  ]}

设置字段具有精确值，如果想修改已有字段只能删除后重新建立。例子：

DELETE /my_store PUT /my_store {    "mappings" : {        "products" : {            "properties" : {                "productID" : {                    "type" : "string",                    "index" : "not_analyzed"                 }            }        }    }}

组合过滤：

{   "bool" : {      "must" :     [],   #与      "should" :   [],   #或      "must_not" : [],  #非   }}

例子1：

curl -XGET 'localhost:9200/test/_search?pretty' -d '{
   "query":    {
    "constant_score":{        "filter":{            "bool":{                "should":[{
   "term": {
   "src_ip": "192.168.188.88"}},{
   "term": {
   "src_ip": "1.2.3.4"}}],                "must":{
   "range":{
   "timestamp":{
   "gte":"2016-10-24T00:00:00", "lt":"2017-10-25T00:00:00"}}}                }            }        }    }}'

注意单个条件和多个条件的写法。

单个条件，直接用 {}

多个条件，用 [{},{}]

注意：中的例子中用了filtered关键字，该关键字在新版es中已经被废除了。

bool表达式的嵌套

src_ip=192.168.188.88 or (src_ip=1.2.3.4 and 2016-10-24<=time<2017-10-25)

curl -XGET 'localhost:9200/test/_search?pretty' -d '{
   "query":    {
    "constant_score":{        "filter":{            "bool":{                "should":[                   {
    "term": {
   "src_ip": "192.168.188.88"}},                   {
    "bool":{                        "must":                             [{
    "term":{
   "src_ip": "1.2.3.4"}},                            {
    "range":{
   "timestamp":{
   "gte":"2016-10-24T00:00:00", "lt":"2017-10-25T00:00:00"}}}]                        }                    }]                }            }        }    }}'

查找多个精确值 terms

{
   　　"terms": {    　　"src_ip": ["1.2.3.4","5.6.7.8"]    }}

注意，term和terms表示包含，而不是相等

{ "term" : { "tags" : "search" } } 可以匹配下面两个文档

{ "tags" : ["search"] }{ "tags" : ["search", "open_source"] }

如果想要完全一样，必须用其他字段增加约束。

范围range

"range" : {    "price" : {        "gte" : 20,        "lte" : 40    }}

日期范围可以在日期上做运算

"range" : {    "timestamp" : {        "gt" : "2014-01-01 00:00:00",        "lt" : "2014-01-01 00:00:00||+1M"     }}

"range" : {    "timestamp" : {        "gt" : "now-1h"    }}

字符串范围：(不推荐，会很慢)

range 查询同样可以处理字符串字段，字符串范围可采用 字典顺序（lexicographically） 或字母顺序（alphabetically）。

5, 50, 6, B, C, a, ab, abb, abc, b 字典范围排序

"range" : {    "title" : {        "gte" : "a",        "lt" :  "b"    }}

数字和日期字段的索引方式使高效地范围计算成为可能。但字符串却并非如此，要想对其使用范围过滤，Elasticsearch 实际上是在为范围内的每个词项都执行 term 过滤器，这会比日期或数字的范围过滤慢许多。

字符串范围在过滤 低基数（low cardinality） 字段（即只有少量唯一词项）时可以正常工作，但是唯一词项越多，字符串范围的计算会越慢。

存在查询：exists

GET /my_index/posts/_search{    "query" : {        "constant_score" : {            "filter" : {                "exists" : { "field" : "tags" }            }        }    }}

缺失查询：missing

GET /my_index/posts/_search{    "query" : {        "constant_score" : {            "filter": {                "missing" : { "field" : "tags" }            }        }    }}

注意下面这个例子

{   "name" : {      "first" : "John",      "last" :  "Smith"   }}

查询

{    "exists" : { "field" : "name" }}

实际执行的是

{    "bool": {        "should": [            { "exists": { "field": "name.first" }},            { "exists": { "field": "name.last" }}        ]    }}

这也就意味着，如果 first 和 last 都是空，那么 name 这个命名空间才会被认为不存在。

转载于:https://www.cnblogs.com/dplearning/p/6952915.html

你可能感兴趣的文章