ELK Stack Tutorial (Elasticsearch, Logstash, Kibana)

Overview

The ELK Stack is a powerful combination of three open-source tools: Elasticsearch, Logstash, and Kibana. Together, they provide a complete solution for collecting, processing, storing, and visualizing log data from various sources.

Architecture Overview

Data Sources → Logstash → Elasticsearch → Kibana
                 ↓
              Beats (Optional)

Elasticsearch: Distributed search and analytics engine
Logstash: Data processing pipeline for ingesting data
Kibana: Visualization and management interface
Beats: Lightweight data shippers (optional)

Elasticsearch Setup

Installation and Configuration

Docker Compose Setup:

# docker-compose.yml
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.8.0
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=false
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data

volumes:
  elasticsearch_data:

Manual Configuration:

# elasticsearch.yml
cluster.name: my-application
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node

# JVM heap size
bootstrap.memory_lock: true

# Security settings
xpack.security.enabled: false

Basic Elasticsearch Operations

Index Management:

# Create an index
curl -X PUT "localhost:9200/logs" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "@timestamp": { "type": "date" },
      "level": { "type": "keyword" },
      "message": { "type": "text" },
      "service": { "type": "keyword" },
      "host": { "type": "keyword" }
    }
  }
}
'

# Insert a document
curl -X POST "localhost:9200/logs/_doc" -H 'Content-Type: application/json' -d'
{
  "@timestamp": "2023-01-01T12:00:00Z",
  "level": "INFO",
  "message": "Application started successfully",
  "service": "web-app",
  "host": "server-01"
}
'

# Search documents
curl -X GET "localhost:9200/logs/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "level": "ERROR"
    }
  }
}
'

Index Templates

{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy",
      "index.lifecycle.rollover_alias": "logs"
    },
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time||epoch_millis"
        },
        "log_level": {
          "type": "keyword"
        },
        "message": {
          "type": "text",
          "analyzer": "standard"
        }
      }
    }
  }
}

Logstash Configuration

Basic Pipeline Configuration

# logstash.conf
input {
  # File input
  file {
    path => "/var/log/application/*.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => "json"
  }

  # Beats input
  beats {
    port => 5044
  }

  # Syslog input
  syslog {
    port => 514
  }
}

filter {
  # Parse timestamp
  date {
    match => [ "timestamp", "ISO8601" ]
  }

  # Grok filter for log parsing
  grok {
    match => {
      "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] %{LOGLEVEL:level} %{DATA:logger} - %{GREEDYDATA:message}"
    }
    overwrite => [ "message" ]
  }

  # Add fields based on conditions
  if [level] == "ERROR" {
    mutate {
      add_field => { "alert" => "true" }
      add_tag => [ "error" ]
    }
  }

  # GeoIP enrichment
  if [client_ip] {
    geoip {
      source => "client_ip"
      target => "geoip"
    }
  }

  # Remove unwanted fields
  mutate {
    remove_field => [ "host", "agent" ]
  }
}

output {
  # Elasticsearch output
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logs-%{+YYYY.MM.dd}"
    template_name => "logstash"
  }

  # Conditional output for errors
  if [level] == "ERROR" {
    email {
      to => "admin@example.com"
      subject => "Error in application"
      body => "Error message: %{message}"
    }
  }

  # Debug output
  stdout {
    codec => rubydebug
  }
}

Advanced Filters

JSON Filter:

filter {
  json {
    source => "message"
    target => "parsed"
  }

  # Extract fields from parsed JSON
  mutate {
    add_field => {
      "user_id" => "%{[parsed][user_id]}"
      "action" => "%{[parsed][action]}"
    }
  }
}

Multiline Filter:

filter {
  multiline {
    pattern => "^%{TIMESTAMP_ISO8601}"
    negate => true
    what => "previous"
  }
}

Ruby Filter:

filter {
  ruby {
    code => '
      if event.get("response_time")
        response_time = event.get("response_time").to_f
        if response_time > 1000
          event.set("performance", "slow")
        elsif response_time > 500
          event.set("performance", "medium")
        else
          event.set("performance", "fast")
        end
      end
    '
  }
}

Kibana Setup and Usage

Installation

# docker-compose.yml (add to existing)
  kibana:
    image: docker.elastic.co/kibana/kibana:8.8.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    depends_on:
      - elasticsearch

Index Patterns

Navigate to Management > Stack Management > Index Patterns
Create index pattern: logs-*
Select time field: @timestamp

Visualizations

Log Volume Over Time:

{
  "aggs": {
    "logs_over_time": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "1h",
        "min_doc_count": 0
      }
    }
  }
}

Error Rate by Service:

{
  "aggs": {
    "services": {
      "terms": {
        "field": "service.keyword"
      },
      "aggs": {
        "error_rate": {
          "filters": {
            "filters": {
              "errors": {
                "term": {
                  "level": "ERROR"
                }
              }
            }
          }
        }
      }
    }
  }
}

Dashboards

Create Dashboard JSON:

{
  "version": "8.8.0",
  "objects": [
    {
      "type": "dashboard",
      "id": "application-logs",
      "attributes": {
        "title": "Application Logs Dashboard",
        "type": "dashboard",
        "description": "Overview of application logs and metrics",
        "panelsJSON": "[
          {
            \"version\": \"8.8.0\",
            \"gridData\": {\"x\": 0, \"y\": 0, \"w\": 24, \"h\": 15},
            \"panelIndex\": \"1\",
            \"embeddableConfig\": {},
            \"panelRefName\": \"panel_1\"
          }
        ]",
        "timeRestore": true,
        "timeTo": "now",
        "timeFrom": "now-1h"
      }
    }
  ]
}

Beats Integration

Filebeat Configuration

# filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/application/*.log
    - /var/log/nginx/*.log
  fields:
    service: web-app
    environment: production
  fields_under_root: true
  multiline.pattern: '^\d{4}-\d{2}-\d{2}'
  multiline.negate: true
  multiline.match: after

- type: log
  enabled: true
  paths:
    - /var/log/database/*.log
  fields:
    service: database
    environment: production

output.logstash:
  hosts: ["logstash:5044"]

processors:
- add_host_metadata:
    when.not.contains.tags: forwarded
- add_docker_metadata: ~
- add_kubernetes_metadata: ~

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644

Metricbeat for System Metrics

# metricbeat.yml
metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true

metricbeat.modules:
- module: system
  metricsets:
    - cpu
    - load
    - memory
    - network
    - process
    - process_summary
    - socket_summary
  enabled: true
  period: 10s
  processes: ['.*']

- module: docker
  metricsets:
    - container
    - cpu
    - diskio
    - healthcheck
    - info
    - memory
    - network
  enabled: true
  period: 10s

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "metricbeat-%{+yyyy.MM.dd}"

setup.template.name: "metricbeat"
setup.template.pattern: "metricbeat-*"

Performance Optimization

Elasticsearch Tuning

# elasticsearch.yml optimizations
bootstrap.memory_lock: true
indices.memory.index_buffer_size: 20%
indices.memory.min_index_buffer_size: 96mb

# Thread pool settings
thread_pool.search.queue_size: 1000
thread_pool.write.queue_size: 1000

# Circuit breaker settings
indices.breaker.total.limit: 70%
indices.breaker.request.limit: 40%
indices.breaker.fielddata.limit: 40%

Logstash Performance Tuning

# logstash.yml
pipeline.workers: 4
pipeline.batch.size: 1000
pipeline.batch.delay: 50

# JVM settings in jvm.options
-Xms2g
-Xmx2g

Index Lifecycle Management

{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "1d",
            "max_docs": 100000000
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          },
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Security Configuration

X-Pack Security

# elasticsearch.yml
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true

# Setup built-in users
elasticsearch-setup-passwords interactive

Role-Based Access Control

{
  "roles": {
    "logs_reader": {
      "indices": [
        {
          "names": ["logs-*"],
          "privileges": ["read", "view_index_metadata"]
        }
      ]
    },
    "logs_writer": {
      "indices": [
        {
          "names": ["logs-*"],
          "privileges": ["create", "index", "write"]
        }
      ]
    }
  }
}

Monitoring and Alerting

Watcher Alerts

{
  "trigger": {
    "schedule": {
      "interval": "5m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": ["logs-*"],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "term": {
                    "level": "ERROR"
                  }
                },
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-5m"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 10
      }
    }
  },
  "actions": {
    "send_email": {
      "email": {
        "to": ["admin@example.com"],
        "subject": "High Error Rate Alert",
        "body": "{{ctx.payload.hits.total}} errors detected in the last 5 minutes"
      }
    }
  }
}

Troubleshooting Common Issues

Performance Problems

Slow Queries: Use the slow log to identify problematic queries
Memory Issues: Monitor heap usage and adjust JVM settings
Disk Space: Implement proper index lifecycle management
Network Latency: Optimize network configuration between nodes

Data Loss Prevention

# Backup indices
curl -X PUT "localhost:9200/_snapshot/backup_repository" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/mount/backups/elasticsearch"
  }
}
'

# Create snapshot
curl -X PUT "localhost:9200/_snapshot/backup_repository/snapshot_1?wait_for_completion=true"

Best Practices

Log Format Standardization

{
  "@timestamp": "2023-01-01T12:00:00.000Z",
  "level": "INFO",
  "service": "user-service",
  "trace_id": "abc123",
  "span_id": "def456",
  "message": "User logged in successfully",
  "user_id": "12345",
  "ip_address": "192.168.1.1",
  "duration_ms": 250
}

Capacity Planning

Estimate daily log volume in GB
Plan for peak traffic (3-5x normal volume)
Account for retention requirements
Monitor cluster health regularly

The ELK Stack provides powerful log management capabilities. Start with a simple setup and gradually add complexity as your logging needs grow. Focus on proper data modeling, performance optimization, and security configuration for production deployments.

Content Review

The content in this repository has been reviewed by chevp. Chevp is dedicated to ensuring that the information provided is accurate, relevant, and up-to-date, helping users to learn and implement programming skills effectively.

About the Reviewer

For more insights and contributions, visit chevp's GitHub profile: chevp's GitHub Profile.

Overview​

Architecture Overview​

Elasticsearch Setup​

Installation and Configuration​

Basic Elasticsearch Operations​

Index Templates​

Logstash Configuration​

Basic Pipeline Configuration​

Advanced Filters​

Kibana Setup and Usage​

Installation​

Index Patterns​

Visualizations​

Dashboards​

Beats Integration​

Filebeat Configuration​

Metricbeat for System Metrics​

Performance Optimization​

Elasticsearch Tuning​

Logstash Performance Tuning​

Index Lifecycle Management​

Security Configuration​

X-Pack Security​

Role-Based Access Control​

Monitoring and Alerting​

Watcher Alerts​

Troubleshooting Common Issues​

Performance Problems​

Data Loss Prevention​

Best Practices​

Log Format Standardization​

Capacity Planning​

Content Review​

About the Reviewer​