Logging
The Anton cluster implements a comprehensive logging solution using the Loki stack to collect, store, and analyze logs from all cluster components and applications.
Architecture Overview
Core Components
Log Collection
- Promtail: Primary log shipping agent running as DaemonSet
- Automatic Discovery: Kubernetes service discovery for pod logs
- Log Parsing: Structured log extraction and labeling
Log Processing
- Loki Gateway: NGINX-based load balancing and authentication
- Distributor: Validates and forwards log streams
- Ingester: Builds chunks and flushes to storage
Storage Backend
- S3 Storage: Scalable object storage for log chunks (Ceph RGW)
- Local Index: BoltDB for log stream indexes
- Retention Policy: Configurable log retention periods
Query Interface
- Grafana Integration: Visual log exploration and dashboards
- LogCLI: Command-line log querying tool
- API: REST API for programmatic access
Log Sources
Application Logs
# Automatic collection from pod stdout/stderr
spec:
containers:
- name: app
image: myapp:latest
# Logs automatically collected
Kubernetes Component Logs
- API Server: Audit logs and request logs
- kubelet: Node agent logs
- kube-proxy: Network proxy logs (if enabled)
- Controller Manager: Control loop logs
System Logs
- systemd: System service logs
- kernel: Kernel messages and events
- containerd: Container runtime logs
Configuration
Promtail Configuration
# DaemonSet configuration for log collection
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: promtail
spec:
template:
spec:
containers:
- name: promtail
image: grafana/promtail:latest
volumeMounts:
- name: logs
mountPath: /var/log
readOnly: true
- name: pods
mountPath: /var/lib/docker/containers
readOnly: true
Loki Configuration
# Loki server configuration
auth_enabled: false
server:
http_listen_port: 3100
ingester:
chunk_idle_period: 1h
max_chunk_age: 1h
chunk_target_size: 1048576
storage_config:
boltdb_shipper:
active_index_directory: /loki/boltdb-shipper-active
cache_location: /loki/boltdb-shipper-cache
aws:
s3: s3://loki-chunks
region: us-east-1
Log Querying
LogQL Syntax
LogQL is Loki's query language for log analysis:
# Basic log stream selection
{namespace="monitoring"}
# Filter by multiple labels
{namespace="monitoring", pod=~"prometheus-.*"}
# Text filtering
{namespace="monitoring"} |= "error"
# Regular expressions
{namespace="monitoring"} |~ "error|ERROR|Error"
# Exclude patterns
{namespace="monitoring"} != "debug"
# Metric queries from logs
rate({namespace="monitoring"}[5m])
# Aggregations
sum by (pod) (rate({namespace="monitoring"}[5m]))
Common Query Examples
# Show all error logs from last hour
{level="error"} |= "error"
# Count error rate per service
sum by (service) (rate({level="error"}[5m]))
# Find logs containing specific text
{namespace="default"} |= "database connection"
# Top 10 logging pods
topk(10, sum by (pod) (rate({namespace!=""}[1h])))
# Application-specific logs
{namespace="monitoring", app="prometheus"} |= "TSDB"
Access and Management
LogCLI Commands
# Install LogCLI (if not available)
# curl -fSL -o logcli.zip "https://github.com/grafana/loki/releases/download/v3.5.3/logcli-linux-amd64.zip"
# Query logs
logcli query '{namespace="monitoring"}'
# Query with time range
logcli query --since=1h '{namespace="monitoring"}'
# Live tail logs
logcli tail '{pod="prometheus-server"}'
# Export logs to file
logcli query '{namespace="monitoring"}' > logs.txt
Grafana Integration
# Port forward to Grafana
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Access Loki data source
# URL: http://loki-gateway:80
Direct API Access
# Port forward to Loki
kubectl port-forward -n monitoring svc/loki-gateway 3100:80
# Query via API
curl -G -s "http://localhost:3100/loki/api/v1/query_range" \
--data-urlencode 'query={namespace="monitoring"}' \
--data-urlencode 'start=2024-01-01T00:00:00Z' \
--data-urlencode 'end=2024-01-01T01:00:00Z'
# Get label values
curl "http://localhost:3100/loki/api/v1/labels"
curl "http://localhost:3100/loki/api/v1/label/namespace/values"
Management Commands
Status Monitoring
# Check Loki components
kubectl get pods -n monitoring -l app.kubernetes.io/name=loki
# View Loki configuration
kubectl get configmap -n monitoring loki-config -o yaml
# Check Promtail status
kubectl get daemonset -n monitoring promtail
# View Promtail logs
kubectl logs -n monitoring -l app.kubernetes.io/name=promtail
Storage Management
# Check storage usage
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
rbd du pool/loki
# View S3 bucket contents
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
s3cmd ls s3://loki-chunks/
# Check retention policies
kubectl get configmap -n monitoring loki-config -o yaml | grep retention
Troubleshooting
# Check Loki ingester status
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester
# Verify log ingestion
kubectl logs -n monitoring -l app.kubernetes.io/name=promtail | grep "entry pushed"
# Test log query performance
time logcli query '{namespace="monitoring"}' --limit=100
# Check gateway health
curl http://loki-gateway/ready
curl http://loki-gateway/metrics
Performance Optimization
Resource Configuration
# Loki resource limits
resources:
requests:
memory: 1Gi
cpu: 500m
limits:
memory: 2Gi
cpu: 1000m
Query Performance
# Query frontend caching
query_range:
cache_results: true
max_retries: 5
limits_config:
query_timeout: 1m
max_query_parallelism: 32
The logging infrastructure provides comprehensive log aggregation and analysis capabilities, enabling effective troubleshooting, monitoring, and operational insights across the entire Anton cluster.