ElasticSearch_基础

2021-04-07 阅读量

ElasticSearch

简称是ES

作用：用来进行搜索的

搜索的分类：

1、全网搜索：搜索互联网上所有的内容。常见的产品：百度、谷歌

2、站内搜索：搜索本网站内部的资源。常见的产品：淘宝、天猫、京东

上述搜索是怎么实现的呢？

我们想法：把数据存储到mysql数据库中，然后通过select语句进行查询即可。

但是上述的搜索功能肯定不是基于mysql数据库进行实现的，为什么？因为mysql数据库在搜索的时候是存在一定的问题。

1 mysql搜索存在的问题

存在的问题：

1、效率低

当数据量是小于100万条的时候，搜索效率还可以满足企业的要求。当数据库超过100w条以后，并且小于1000w条，可以通过一些技术手段来提高搜索效率。常见的技术手段：索引、分库分表。但是当我们在进行查询的时候，我们往往需要进行模糊查询：SELECT * FROM tb_goods where good_name like ‘%华为手机%’。当左侧存在%的时候，索引会失效。那么在进行查询的时候使用的还是全表扫描。

当数据量超过了1000w条，通过加索引以及分库分表来提升搜索效率，起到的作用就是微乎其微。

2、搜索结果不全面（功能弱）

需求：当用户输入一个关键字：华为手机，想要都是的结果是：只要商品的名称中包含了华为或者手机都需要搜索到。

实现能否使用mysql进行实现呢？

sql语句：SELECT * FROM tb_goods where good_name like ‘%华为手机%’，要求”华为手机”必须在一起才可以搜索到。

怎么解决？使用ES

2 倒排索引

在使用ES进行数据存储的时候，首先就需要对数据进行分词（term）。

正向索引：在索引表中记录的是文档的id和文档中出现的词的对应关系。

采用正向索引在进行搜索的时候使用的就是全表扫描，当数据量较大的时候搜索效率就较低。

倒排（反向）索引：建立数据的分词结果和数据id之间的对应关系。

后期再进行搜索的时候，首先需要对搜索的关键字进行分词，然后根据这个词从倒排（反向）索引表中找对应的数据的id，然后根据id在找到对应的数据。

ES解决数据库查询功能弱：通过对数据进行分词来解决

ES解决数据库查询效率低：对分词的结构进行排序，然后进行了一个树形结构，提高了查询效率

3 ES的特点

ES特点：

1、ElasticSearch是一个基于Lucene的搜索服务器（Lucene是一个搜索工具包）

2、基于Lucene的搜索服务器种类：Solr ， ElasticSearch（实时性搜索效率高于solr）

3、是一个分布式、高扩展、高实时的搜索与数据分析引擎

4、基于RESTful web接口

ES和mysql区别：

1、mysql数据库中是支持事务，在ES中是不支持事务的

2、mysql数据库是存在外键的，在ES中不存储

3、mysql主要的作用是进行数据的存储，而ES主要的主要就是进行搜索

4 ES和kibana安装

参考安装文档。

5 ES索引库结构

ES索引库的结构和mysql数据库的结构是很类似的：

在ES7.x之前其实还存在一个结构：type（类型），在7.X之后把type的概念进行弱化。

6 Restful风格

Restful是一种接口定义风格（把请求参数作为请求路径的一部分）。

通过不同的请求方式来区分用户的操作：

POST：新增(添加)资源

GET：获取资源

PUT：更新资源

DELETE：删除资源

7 索引操作

Postman操作索引：

新增

1	PUT http://ip:端口/索引名称

查询

1
2
3

GET http://ip:端口/索引名称  				# 查询单个索引信息
GET http://ip:端口/索引名称1,索引名称2...  # 查询多个索引信息
GET http://ip:端口/_all  				   # 查询所有索引信息

删除索引

1	DELETE http://ip:端口/索引名称

关闭、打开索引（了解）

1 2	POST http://ip:端口/索引名称/_close POST http://ip:端口/索引名称/_open

8 映射操作

8.1 数据类型

映射就是定义表结构，表又是由字段（域）组成，每一域都需要去指定的数据类型。

ES中的数据类型:

1、简单类型

字符串
- text : 可以进行分词，不支持聚合
- keyword ：不能进行分词，支持聚合
数值（类似java中的基本数据类型）
布尔类型：boolean
二进制：binary
范围类型
- integer_range, float_range, long_range, double_range, date_range

2、复杂类型

数组：[]
对象: {}

复杂类型在定义的时候可以不用指定，在插入数据的时候会自动进行创建。

8.2 映射的操作

添加映射

为已经创建的索引添加映射

#添加映射
PUT /person/_mapping
{
    "properties":{
        "name":{
            "type":"text"
        },
        "age":{
            "type":"integer"
        }
    }
}

在创建索引的时候去指定映射

#添加映射
PUT person
{
    "mappings": {
        "properties":{
            "name":{
                "type":"text"
            },
            "age":{
                "type":"integer"
            }
        }   
    }
}

查看映射

1	GET person1/_mapping

添加域

#添加字段
PUT /person1/_mapping
{
  "properties": {
      "address": {
          "type":"text"
      }
    }
}

9 文档操作

添加文档

指定id添加文档

POST /person1/_doc/2
{
  "name":"张三",
  "age":18,
  "address":"北京"
}

不指定id添加文档

POST /person1/_doc/
{
  "name":"张三",
  "age":18,
  "address":"北京"
}

获取文档

# 获取id为1的文档
GET /person1/_doc/1

#查询所有文档
GET /person1/_search

删除文档

1
2
3

#删除指定id文档
DELETE /person1/_doc/1

10 分词器

10.1 概述

作用：就是对要存储的数据进行分词，并且在进行搜索的时候根据不同的搜索情况会对关键字进行分词。

在ES中其实提供了很多的分词器，但是这些分词器对中文的分词并不是特别的友好。当我们不指定分词器的情况下，默认使用的就是标准分词器（Standard Analyzer），对中文进行分

词的时候使用的是单字分词。

要想对中文进行分词，就需要去使用一些中文分词器。

10.2 IK分词器

安装ik分词器

1、把ik分词器下载下来

网址：https://github.com/medcl/elasticsearch-analysis-ik/

找到7.4.0版本的ik分词器：

下载下来以后就是一个：elasticsearch-analysis-ik-7.4.0.zip

2、对上述的压缩包进行解压

# 进入到es的插件目录
cd /opt/elasticsearch-7.4.0/plugins/

# 在插件目录下创建一个文件件ik
mkdir ik

# 把elasticsearch-analysis-ik-7.4.0.zip复制到/opt/elasticsearch-7.4.0/plugins/ik这个目录
cp elasticsearch-analysis-ik-7.4.0.zip /opt/elasticsearch-7.4.0/plugins/ik

# 对elasticsearch-analysis-ik-7.4.0.zip进行解压
unzip elasticsearch-analysis-ik-7.4.0.zip

# 删除elasticsearch-analysis-ik-7.4.0.zip这个文件
rm -rf elasticsearch-analysis-ik-7.4.0.zip

注意：需要重启es

测试IK分词器

IK分词器中提供了两种算法：

1、ik_max_word：细粒度分词（分的词比较多）

2、ik_smart：粗粒度分词（分的词比较少）

#方式一ik_max_word
GET _analyze
{
  "analyzer": "ik_max_word",			# 指定分词器算法
  "text": "乒乓球明年总冠军"
}

问题1：这两种算法应该如何进行选择呢？

在进行数据存储的时候我们一般情况下选择的就是：ik_max_word

在进行搜索的时候我们一般情况下选择的就是：ik_smart

问题2：如何进行指定？

注意：在后期开发的时候，一般情况下不会去显示的指定分词算法，而使用的就是和存储数据时相同的分词算法。

基本搜索

1、termQuery：词条查询。特点：不会对搜索的关键字进行分词

GET teacher/_search
{
  "query": {
    "term": {
      "name": {
        "value": "张三"
      }
    }
  }
}

2、matchQuery：匹配查询。特点：会对搜索的关键字进行分词（分词算法取决于当时在创建索引的时候通过search_analyzer指定的分词算法，如果没有指定使用analyzer所指定的分

词算法）

GET teacher/_search
{
  "query": {
    "match": {
      "address": "北京昌平"
    }
  }
}

11 Java Api操作ES

11.1 Spring Boot整合ES

关于整合其实Spring官方也提供了对应起步依赖，但是使用这个起步依赖（spring-boot-starter-data-elasticsearch）中的API来操作ES并不是特别的方便。因此一般情况下不会去选择这

个起步依赖。而是使用ES官方所提供的API进行操作。

整合思想：导入ES官方所提供的API的依赖包，通过java配置的方式去配置客户端对象

pom.xml

<!--引入es的坐标-->
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.4.0</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-client</artifactId>
    <version>7.4.0</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>7.4.0</version>
</dependency>

配置类：配置客户端对象RestHighLevelClient

@Configuration
@ConfigurationProperties(prefix = "elasticsearch")
public class ElasticSearchConfig {

    private String host;
    private int port;

    public String getHost() {
        return host;
    }

    public void setHost(String host) {
        this.host = host;
    }

    public int getPort() {
        return port;
    }

    public void setPort(int port) {
        this.port = port;
    }

    @Bean
    public RestHighLevelClient client(){
        return new RestHighLevelClient(RestClient.builder(
                new HttpHost(
                        host,
                        port,
                        "http"
                )
        ));
    }
}

11.2 操作索引

创建索引

@Test
public void addIndexAndMapping() throws IOException {

    //1.使用client获取操作索引的对象
    IndicesClient indicesClient = client.indices();											// 核心代码

    //2.具体操作，获取返回值
    CreateIndexRequest createRequest = new CreateIndexRequest("itcast");

    //2.1 设置mappings
    String mapping = "{\n" +
            "      \"properties\" : {\n" +
            "        \"address\" : {\n" +
            "          \"type\" : \"text\",\n" +
            "          \"analyzer\" : \"ik_max_word\"\n" +
            "        },\n" +
            "        \"age\" : {\n" +
            "          \"type\" : \"long\"\n" +
            "        },\n" +
            "        \"name\" : {\n" +
            "          \"type\" : \"keyword\"\n" +
            "        }\n" +
            "      }\n" +
            "    }";

    createRequest.mapping(mapping,XContentType.JSON);
    CreateIndexResponse response = indicesClient.create(createRequest, RequestOptions.DEFAULT);   // 核心代码

    //3.根据返回值判断结果
    System.out.println(response.isAcknowledged());

}

查询索引

@Test
public void queryIndex() throws IOException {

    IndicesClient indices = client.indices();
    GetIndexRequest getReqeust = new GetIndexRequest("itcast");
    GetIndexResponse response = indices.get(getReqeust, RequestOptions.DEFAULT);		// 核心代码

    //获取结果
    Map<String, MappingMetaData> mappings = response.getMappings();
    for (String key : mappings.keySet()) {
        System.out.println(key+":" + mappings.get(key).getSourceAsMap());

    }

}

删除索引

/**
 * 删除索引
 */
@Test
public void deleteIndex() throws IOException {
    IndicesClient indices = client.indices();
    DeleteIndexRequest deleteRequest = new DeleteIndexRequest("itheima");
    AcknowledgedResponse response = indices.delete(deleteRequest, RequestOptions.DEFAULT);	// 核心代码
    System.out.println(response.isAcknowledged());

}

判断所以是否存在

@Test
public void existIndex() throws IOException {
    IndicesClient indices = client.indices();
    GetIndexRequest getRequest = new GetIndexRequest("itcast");
    boolean exists = indices.exists(getRequest, RequestOptions.DEFAULT);				// 核心代码
    System.out.println(exists);
}

11.3 文档操作

添加文档

1、基于Map的数据进行添加

@Test
public void addDoc() throws IOException {

    //数据对象，map
    Map data = new HashMap();
    data.put("address","北京昌平");
    data.put("name","大胖");
    data.put("age",20);

    //1.获取操作文档的对象
    IndexRequest request = new IndexRequest("itcast").id("1").source(data);

    //添加数据，获取结果
    IndexResponse response = client.index(request, RequestOptions.DEFAULT);

    //打印响应结果
    System.out.println(response.getId());

}

2、基于Json的数据进行添加

@Test
public void addDoc2() throws IOException {

    //数据对象，javaObject
    Person p = new Person();
    p.setId("2");
    p.setName("小胖2222");
    p.setAge(30);
    p.setAddress("陕西西安");

    //将对象转为json
    String data = JSON.toJSONString(p);

    //1.获取操作文档的对象
    IndexRequest request = new IndexRequest("itcast").id(p.getId()).source(data,XContentType.JSON);

    //添加数据，获取结果
    IndexResponse response = client.index(request, RequestOptions.DEFAULT);

    //打印响应结果
    System.out.println(response.getId());

}

修改文档

1、通过index方法进行修改（判断当前文档的id是否存在，如果不存在进行添加，如果存在进行修改）

2、通过update方法进行修改

@Test
public void updateDoc() throws IOException {

    UpdateRequest updateRequest = new UpdateRequest("itcast" , "1") ;
    Map data = new HashMap();
    data.put("address","北京大兴");
    data.put("name","itheima");
    data.put("age",22);
    updateRequest.doc(data) ;
    UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
    System.out.println(updateResponse);

}

获取文档

@Test
public void findDocById() throws IOException {

    GetRequest getReqeust = new GetRequest("itcast","1");
    GetResponse response = client.get(getReqeust, RequestOptions.DEFAULT);
    
    //获取数据对应的json
    System.out.println(response.getSourceAsString());

}

删除文档

@Test
public void delDoc() throws IOException {

    DeleteRequest deleteRequest = new DeleteRequest("itcast","1");
    DeleteResponse response = client.delete(deleteRequest, RequestOptions.DEFAULT);
    System.out.println(response.getId());

}