Aggregations The Elasticsearch GROUP BY John Sobanski – VP of AI/ML
About Me John Sobanski VP AI/ML at Pyramid Systems Inc. Elastic Certified Engineer (2020, 2022) https://soban.ski
Agenda Background - Relational Database Management Systems Aggregations – The Elasticsearch GROUP BY Why use the Elasticsearch Aggregation API 1 st Demo - Execute GROUP BY operations via Elasticsearch aggregations Elasticsearch Aggregations Drive Time Series Data Visualization BUCKETS in Elasticsearch 2 nd Demo - Time Series Data Visualization with Kibana
Relational Database Management System
Series Operations 2 Sum 57 5 Product 40000 20 Min 2 20 Max 20 10 Median 10 Mean 11.4
RDBMS Series Operations In the traditional Relational Database Management System (RDBMS) world, SQL databases use GROUP BY syntax to group rows with similar values into summary rows. The query, "find the number of web page hits per country," for example, represents a typical GROUP BY operation. This table records the number of hits to my site [ https://soban.ski] , broken down by time zone.
Further summarize the table to record "hits per country" via a GROUP BY operation Collapse the Time zones into parent countries. SELECT COUNTRY, SUM (HITS) FROM timezone_hits GROUP BY COUNTRY; SQL Syntax:
RDBMS vs. Elasticsearch Even though Elasticsearch does not use the row construct to identify a unit of data (Elastic calls their rows Documents), we can still perform GROUP BY queries in Elasticsearch Elasticsearch names their GROUP BY queries Aggregations . The Elasticsearch API provides an expressive REST API to execute Aggregations Kibana also provides a Graphical User Interface (GUI) to execute Aggregations ElasticSearch Relational Database Index Type Documents Fields Database Table Rows Columns
Aggregations – The Elasticsearch GROUP BY
Why use the Elasticsearch Aggregation API? - 1 Easy Approach: Query Elasticsearch, convert the returned JSON to a Pandas Dataframe , and then apply a Pandas GROUP BY to the Dataframe to retrieve summary stats. Modern laptops include 32GB of memory and you have had no issues with this method. If you use Elasticsearch for non time series data, e.g. static data for blogs, you may not need to worry about running out of memory.
Why use the Elasticsearch Aggregation API? - 2 In the future, you may deal with Big Data. If you collect time series data, such as access logs, or security logs, you might scale to Big Data. In that case, the Elasticsearch database size will exceed the memory of your laptop.
Why use the Elasticsearch Aggregation API? - 3 Aggs API allows you to command Elasticsearch to execute the GROUP BY analogue in- stu (a best practice), and then also apply the summary stats in place. Elasticsearch will then return the summary stats as JSON, and you will not run out of memory.
Time Series GROUP BY(RDMMS) and Aggregation(Elasticsearch) operations lend themselves well to Time Series data, since these operations allow you to GROUP BY or Aggregate results over a given time bucket (e.g. Hour, Day, Week, Month, etc.).
1 st Demo Execute GROUP BY operations via Elasticsearch aggregations Demonstrate how to generate tables via both Kibana and the Elasticsearch API.
Elasticsearch Aggregations Drive Time Series Data Visualization
Elasticsearch BUCKETS In Elasticsearch parlance, we put the rows into BUCKETS, with one BUCKET for each country. Consider a query for total hits. The Elasticsearch API reports that I received ~100k hits in the month of June. Add the hits to BUCKETS !
“Country” Buckets Use the AGGS API to count the hits by country.
“Day” Buckets Use the API to count the hits by day.
Time Buckets T Time BUCKETS Analyze, summarize and visualize time series data. BUCKETS return a smaller amount of data, for example, hits per hour We need BUCKETS because we cannot plot every datum from a big data document store
Why Use Time Buckets?
Why Use Time Buckets?
Demo – Time Series Data Viz with Kibana
Conclusion Summarize Series Sum, Median, etc. RDBMS Uses GROUP BY Elasticsearch Uses AGGS Separate data by tag via BUCKETS Country Buckets, Time Buckets, Month Buckets