Concepts of Elasticsearch — Blog Series Part 3

Sahibdeep Singh Sodhi
3 min readApr 2, 2023

--

Continuing to the blog series on Elasticsearch, we are familiar that Elasticsearch is a distributed, open-source search and analytics engine that allows users to store, search, and analyze large volumes of data in real-time. In this blog, we will explore how Elasticsearch works and the components that make up the platform.

Architecture of Elasticsearch

Elasticsearch is built on top of the Apache Lucene search engine library. It is designed as a distributed system, meaning that it can be scaled horizontally across multiple nodes. Each node in Elasticsearch is capable of storing data and performing search and analysis operations.

The nodes in Elasticsearch are organized into clusters. A cluster is a group of nodes that work together to store and manage data. Each cluster has a unique name, which is used to distinguish it from other clusters in the network.

The Elasticsearch is the central component of the Elastic Stack. The ELK Stack is made of Elasticsearch, Logstash, and Kibana. These tools enrich data ingestion, storage, visualizing, and analysis.

ELK Stack

Working of ElasticSearch

Working of ElasticSearch

Elasticsearch receives unstructured data from multiple sources, which is then enriched through data ingestion using tools such as Logstash. Prior to indexing, the data is aggregated and processed using Logstash.

Once indexed, the data forms an Elasticsearch index consisting of related or similar documents. The index utilizes an inverted index data structure, enabling rapid full-text searches and identification of unique words within the documents. Data can be sent to Elasticsearch in JSON document format using API or ingestion tools. Elasticsearch stores the original document and generates a searchable reference to it in the index, allowing for searchability via the Elasticsearch API.

Indexing and Searching

Elasticsearch uses an inverted index to store and search data. When data is indexed, it is tokenized, filtered, and normalized. This process is known as analysis. Elasticsearch then creates a mapping, which defines the structure of the data, including field names, data types, and how they are analyzed.

Once data is indexed, it can be searched using Elasticsearch’s query language. Elasticsearch supports a wide variety of queries, including full-text search, term queries, and range queries. Elasticsearch also supports aggregations, which allow users to calculate metrics such as counts, sums, averages, and percentiles.

Distributed Processing

Elasticsearch uses a distributed processing model to enable scalability and fault tolerance. Each node in Elasticsearch is capable of performing search and analysis operations. When a query is submitted to Elasticsearch, it is broadcast to all nodes in the cluster. Each node processes the query independently and returns the results to the coordinating node.

The coordinating node is responsible for aggregating the results from all nodes and returning them to the client. This allows Elasticsearch to scale horizontally across multiple nodes and handle large volumes of data.

Sharding and Replication

Elasticsearch uses sharding and replication to distribute data across nodes in the cluster. Sharding is the process of splitting data into smaller pieces and distributing them across nodes. Each shard is a self-contained unit of data that can be stored and searched independently.

Replication is the process of creating copies of data across multiple nodes. Replication provides fault tolerance by ensuring that there are multiple copies of data in the cluster. If a node fails, data can be retrieved from a replica on another node.

Conclusion

Elasticsearch is a popular tool already utilized by companies to enhance their search capabilities. It has the ability to provide powerful search and analytics to any type of data.

Data ingestion and retrieval are managed quickly, allowing for efficient management of data. Managed services and Elasticsearch support are available, as well as integration tools for unifying logs and metrics. The platform also includes machine learning capabilities, enabling companies to extract new insights, predict trends, and identify anomalies with ease. It can also be used for security and automated threat detection.

Elasticsearch is a flexible and high-performing platform that can be used by enterprises for various applications. It is constantly evolving and offers a range of features to effectively manage data.

--

--