ELK Stack Tutorial (Elasticsearch, Logstash, Kibana)
Overview
The ELK Stack is a powerful combination of three open-source tools: Elasticsearch, Logstash, and Kibana. Together, they provide a complete solution for collecting, processing, storing, and visualizing log data from various sources.
Architecture Overview
Data Sources → Logstash → Elasticsearch → Kibana
↓
Beats (Optional)
- Elasticsearch: Distributed search and analytics engine
- Logstash: Data processing pipeline for ingesting data
- Kibana: Visualization and management interface
- Beats: Lightweight data shippers (optional)
Elasticsearch Setup
Installation and Configuration
Docker Compose Setup:
# docker-compose.yml
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.8.0
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
ports:
- "9200:9200"
- "9300:9300"
volumes:
- elasticsearch_data:/usr/share/elasticsearch/data
volumes:
elasticsearch_data:
Manual Configuration:
# elasticsearch.yml
cluster.name: my-application
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node
# JVM heap size
bootstrap.memory_lock: true
# Security settings
xpack.security.enabled: false
Basic Elasticsearch Operations
Index Management:
# Create an index
curl -X PUT "localhost:9200/logs" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"level": { "type": "keyword" },
"message": { "type": "text" },
"service": { "type": "keyword" },
"host": { "type": "keyword" }
}
}
}
'
# Insert a document
curl -X POST "localhost:9200/logs/_doc" -H 'Content-Type: application/json' -d'
{
"@timestamp": "2023-01-01T12:00:00Z",
"level": "INFO",
"message": "Application started successfully",
"service": "web-app",
"host": "server-01"
}
'
# Search documents
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"level": "ERROR"
}
}
}
'
Index Templates
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"index.lifecycle.name": "logs-policy",
"index.lifecycle.rollover_alias": "logs"
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"log_level": {
"type": "keyword"
},
"message": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
Logstash Configuration
Basic Pipeline Configuration
# logstash.conf
input {
# File input
file {
path => "/var/log/application/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => "json"
}
# Beats input
beats {
port => 5044
}
# Syslog input
syslog {
port => 514
}
}
filter {
# Parse timestamp
date {
match => [ "timestamp", "ISO8601" ]
}
# Grok filter for log parsing
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] %{LOGLEVEL:level} %{DATA:logger} - %{GREEDYDATA:message}"
}
overwrite => [ "message" ]
}
# Add fields based on conditions
if [level] == "ERROR" {
mutate {
add_field => { "alert" => "true" }
add_tag => [ "error" ]
}
}
# GeoIP enrichment
if [client_ip] {
geoip {
source => "client_ip"
target => "geoip"
}
}
# Remove unwanted fields
mutate {
remove_field => [ "host", "agent" ]
}
}
output {
# Elasticsearch output
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "logs-%{+YYYY.MM.dd}"
template_name => "logstash"
}
# Conditional output for errors
if [level] == "ERROR" {
email {
to => "admin@example.com"
subject => "Error in application"
body => "Error message: %{message}"
}
}
# Debug output
stdout {
codec => rubydebug
}
}
Advanced Filters
JSON Filter:
filter {
json {
source => "message"
target => "parsed"
}
# Extract fields from parsed JSON
mutate {
add_field => {
"user_id" => "%{[parsed][user_id]}"
"action" => "%{[parsed][action]}"
}
}
}
Multiline Filter:
filter {
multiline {
pattern => "^%{TIMESTAMP_ISO8601}"
negate => true
what => "previous"
}
}
Ruby Filter:
filter {
ruby {
code => '
if event.get("response_time")
response_time = event.get("response_time").to_f
if response_time > 1000
event.set("performance", "slow")
elsif response_time > 500
event.set("performance", "medium")
else
event.set("performance", "fast")
end
end
'
}
}
Kibana Setup and Usage
Installation
# docker-compose.yml (add to existing)
kibana:
image: docker.elastic.co/kibana/kibana:8.8.0
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
depends_on:
- elasticsearch
Index Patterns
- Navigate to Management > Stack Management > Index Patterns
- Create index pattern:
logs-* - Select time field:
@timestamp
Visualizations
Log Volume Over Time:
{
"aggs": {
"logs_over_time": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1h",
"min_doc_count": 0
}
}
}
}
Error Rate by Service:
{
"aggs": {
"services": {
"terms": {
"field": "service.keyword"
},
"aggs": {
"error_rate": {
"filters": {
"filters": {
"errors": {
"term": {
"level": "ERROR"
}
}
}
}
}
}
}
}
}
Dashboards
Create Dashboard JSON:
{
"version": "8.8.0",
"objects": [
{
"type": "dashboard",
"id": "application-logs",
"attributes": {
"title": "Application Logs Dashboard",
"type": "dashboard",
"description": "Overview of application logs and metrics",
"panelsJSON": "[
{
\"version\": \"8.8.0\",
\"gridData\": {\"x\": 0, \"y\": 0, \"w\": 24, \"h\": 15},
\"panelIndex\": \"1\",
\"embeddableConfig\": {},
\"panelRefName\": \"panel_1\"
}
]",
"timeRestore": true,
"timeTo": "now",
"timeFrom": "now-1h"
}
}
]
}
Beats Integration
Filebeat Configuration
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/application/*.log
- /var/log/nginx/*.log
fields:
service: web-app
environment: production
fields_under_root: true
multiline.pattern: '^\d{4}-\d{2}-\d{2}'
multiline.negate: true
multiline.match: after
- type: log
enabled: true
paths:
- /var/log/database/*.log
fields:
service: database
environment: production
output.logstash:
hosts: ["logstash:5044"]
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
logging.level: info
logging.to_files: true
logging.files:
path: /var/log/filebeat
name: filebeat
keepfiles: 7
permissions: 0644
Metricbeat for System Metrics
# metricbeat.yml
metricbeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
metricbeat.modules:
- module: system
metricsets:
- cpu
- load
- memory
- network
- process
- process_summary
- socket_summary
enabled: true
period: 10s
processes: ['.*']
- module: docker
metricsets:
- container
- cpu
- diskio
- healthcheck
- info
- memory
- network
enabled: true
period: 10s
output.elasticsearch:
hosts: ["elasticsearch:9200"]
index: "metricbeat-%{+yyyy.MM.dd}"
setup.template.name: "metricbeat"
setup.template.pattern: "metricbeat-*"
Performance Optimization
Elasticsearch Tuning
# elasticsearch.yml optimizations
bootstrap.memory_lock: true
indices.memory.index_buffer_size: 20%
indices.memory.min_index_buffer_size: 96mb
# Thread pool settings
thread_pool.search.queue_size: 1000
thread_pool.write.queue_size: 1000
# Circuit breaker settings
indices.breaker.total.limit: 70%
indices.breaker.request.limit: 40%
indices.breaker.fielddata.limit: 40%
Logstash Performance Tuning
# logstash.yml
pipeline.workers: 4
pipeline.batch.size: 1000
pipeline.batch.delay: 50
# JVM settings in jvm.options
-Xms2g
-Xmx2g
Index Lifecycle Management
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "1d",
"max_docs": 100000000
},
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"allocate": {
"number_of_replicas": 0
},
"set_priority": {
"priority": 50
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"allocate": {
"number_of_replicas": 0
},
"set_priority": {
"priority": 0
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
Security Configuration
X-Pack Security
# elasticsearch.yml
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true
# Setup built-in users
elasticsearch-setup-passwords interactive
Role-Based Access Control
{
"roles": {
"logs_reader": {
"indices": [
{
"names": ["logs-*"],
"privileges": ["read", "view_index_metadata"]
}
]
},
"logs_writer": {
"indices": [
{
"names": ["logs-*"],
"privileges": ["create", "index", "write"]
}
]
}
}
}
Monitoring and Alerting
Watcher Alerts
{
"trigger": {
"schedule": {
"interval": "5m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": ["logs-*"],
"body": {
"query": {
"bool": {
"must": [
{
"term": {
"level": "ERROR"
}
},
{
"range": {
"@timestamp": {
"gte": "now-5m"
}
}
}
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 10
}
}
},
"actions": {
"send_email": {
"email": {
"to": ["admin@example.com"],
"subject": "High Error Rate Alert",
"body": "{{ctx.payload.hits.total}} errors detected in the last 5 minutes"
}
}
}
}
Troubleshooting Common Issues
Performance Problems
- Slow Queries: Use the slow log to identify problematic queries
- Memory Issues: Monitor heap usage and adjust JVM settings
- Disk Space: Implement proper index lifecycle management
- Network Latency: Optimize network configuration between nodes
Data Loss Prevention
# Backup indices
curl -X PUT "localhost:9200/_snapshot/backup_repository" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/mount/backups/elasticsearch"
}
}
'
# Create snapshot
curl -X PUT "localhost:9200/_snapshot/backup_repository/snapshot_1?wait_for_completion=true"
Best Practices
Log Format Standardization
{
"@timestamp": "2023-01-01T12:00:00.000Z",
"level": "INFO",
"service": "user-service",
"trace_id": "abc123",
"span_id": "def456",
"message": "User logged in successfully",
"user_id": "12345",
"ip_address": "192.168.1.1",
"duration_ms": 250
}
Capacity Planning
- Estimate daily log volume in GB
- Plan for peak traffic (3-5x normal volume)
- Account for retention requirements
- Monitor cluster health regularly
The ELK Stack provides powerful log management capabilities. Start with a simple setup and gradually add complexity as your logging needs grow. Focus on proper data modeling, performance optimization, and security configuration for production deployments.
Content Review
The content in this repository has been reviewed by chevp. Chevp is dedicated to ensuring that the information provided is accurate, relevant, and up-to-date, helping users to learn and implement programming skills effectively.
About the Reviewer
For more insights and contributions, visit chevp's GitHub profile: chevp's GitHub Profile.