# Elastic search

## Useful links

[Intro about internals](https://habr.com/ru/post/489924/) (С чего начинается Elasticsearch)

This is a document oriented storage with [Lucene](https://lucene.apache.org/core/) as an index.

![](https://415484505-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LxtoAXZwwOc4XGto8vb%2F-ME48Fj3RP1mr3UgC5Cw%2F-ME49iqkprRh3Ei6r7q6%2FScreen%20Shot%202020-08-06%20at%2021.06.16.png?alt=media\&token=4c0c1625-015c-45f7-b7ed-5eaf886d349a)

Each shard is a Lucene index. How to control number of shards in index:

```
PUT _template/all
{
  "template": "*",
      "settings": {
        "number_of_shards": 1
      }
}
```

![](https://415484505-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LxtoAXZwwOc4XGto8vb%2F-ME4A3BiLDeVF7zxECQn%2F-ME4AEOnqoJ8qZzelxlb%2FScreen%20Shot%202020-08-06%20at%2021.08.33.png?alt=media\&token=66711ae9-d847-45e6-aafa-a94d6a539091)

* data node&#x20;
  * hot (SDD is better)
  * warm (HDD is enough)
  * cold (HDD is enough)
* coordinating node
* master node
  * active master may be only one
  * master manages a topology of the cluster
    * create new index
    * extract shards
    * move shards and join if necessary
  * knows all about cluster state
  * `node.master: true`
  *

Each ElasticSearch instance is a node. To join nodes in cluster:

* Nodes need to have same version
* `cluster.name` should be equal

### How to control number of replicas

```
PUT / _settings {
  "index": {
      "number_of_replicas": someVal
  }
}
```

### Deletion of data from node

![](https://415484505-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LxtoAXZwwOc4XGto8vb%2F-MESsNTtwRJkeRwqCCWV%2F-MESuLv8_d0SYYscCnFf%2Fdata-deletion.png?alt=media\&token=bc85fac4-d90b-4267-bf57-fe41280f140d)

First deletion happens only in primary shard. And after flush and commit in primary shard => internal request happens for changing replicas.

## Cluster health status

* green - all good
* yellow - there are lost shards. Cluster is fully operating, but uses replicas
* red - there are lost shards. Cluster is broken or part of the data is not available

`num(data nodes) >= num(replicas)`&#x20;

"Replica" is applicable for shards.

## Fault tolerance

Split-brain problem.

`КОЛИЧЕСТВОКАНДИДАТОВ = ОБЩЕЕКОЛИЧЕСТВО_НОД/2 + 1`&#x20;
