Logging

The Anton cluster implements a comprehensive logging solution using the Loki stack to collect, store, and analyze logs from all cluster components and applications.

Architecture Overview

Core Components

Log Collection

Promtail: Primary log shipping agent running as DaemonSet
Automatic Discovery: Kubernetes service discovery for pod logs
Log Parsing: Structured log extraction and labeling

Log Processing

Loki Gateway: NGINX-based load balancing and authentication
Distributor: Validates and forwards log streams
Ingester: Builds chunks and flushes to storage

Storage Backend

S3 Storage: Scalable object storage for log chunks (Ceph RGW)
Local Index: BoltDB for log stream indexes
Retention Policy: Configurable log retention periods

Query Interface

Grafana Integration: Visual log exploration and dashboards
LogCLI: Command-line log querying tool
API: REST API for programmatic access

Log Sources

Application Logs

# Automatic collection from pod stdout/stderr
spec:
  containers:
    - name: app
      image: myapp:latest
      # Logs automatically collected

Kubernetes Component Logs

API Server: Audit logs and request logs
kubelet: Node agent logs
kube-proxy: Network proxy logs (if enabled)
Controller Manager: Control loop logs

System Logs

systemd: System service logs
kernel: Kernel messages and events
containerd: Container runtime logs

Configuration

Promtail Configuration

# DaemonSet configuration for log collection
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: promtail
spec:
  template:
    spec:
      containers:
        - name: promtail
          image: grafana/promtail:latest
          volumeMounts:
            - name: logs
              mountPath: /var/log
              readOnly: true
            - name: pods
              mountPath: /var/lib/docker/containers
              readOnly: true

Loki Configuration

# Loki server configuration
auth_enabled: false
server:
  http_listen_port: 3100

ingester:
  chunk_idle_period: 1h
  max_chunk_age: 1h
  chunk_target_size: 1048576

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/boltdb-shipper-active
    cache_location: /loki/boltdb-shipper-cache
  aws:
    s3: s3://loki-chunks
    region: us-east-1

Log Querying

LogQL Syntax

LogQL is Loki's query language for log analysis:

# Basic log stream selection
{namespace="monitoring"}

# Filter by multiple labels
{namespace="monitoring", pod=~"prometheus-.*"}

# Text filtering
{namespace="monitoring"} |= "error"

# Regular expressions
{namespace="monitoring"} |~ "error|ERROR|Error"

# Exclude patterns
{namespace="monitoring"} != "debug"

# Metric queries from logs
rate({namespace="monitoring"}[5m])

# Aggregations
sum by (pod) (rate({namespace="monitoring"}[5m]))

Common Query Examples

# Show all error logs from last hour
{level="error"} |= "error"

# Count error rate per service
sum by (service) (rate({level="error"}[5m]))

# Find logs containing specific text
{namespace="default"} |= "database connection"

# Top 10 logging pods
topk(10, sum by (pod) (rate({namespace!=""}[1h])))

# Application-specific logs
{namespace="monitoring", app="prometheus"} |= "TSDB"

Access and Management

LogCLI Commands

# Install LogCLI (if not available)
# curl -fSL -o logcli.zip "https://github.com/grafana/loki/releases/download/v3.5.3/logcli-linux-amd64.zip"

# Query logs
logcli query '{namespace="monitoring"}'

# Query with time range
logcli query --since=1h '{namespace="monitoring"}'

# Live tail logs
logcli tail '{pod="prometheus-server"}'

# Export logs to file
logcli query '{namespace="monitoring"}' > logs.txt

Grafana Integration

# Port forward to Grafana
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

# Access Loki data source
# URL: http://loki-gateway:80

Direct API Access

# Port forward to Loki
kubectl port-forward -n monitoring svc/loki-gateway 3100:80

# Query via API
curl -G -s "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode 'query={namespace="monitoring"}' \
  --data-urlencode 'start=2024-01-01T00:00:00Z' \
  --data-urlencode 'end=2024-01-01T01:00:00Z'

# Get label values
curl "http://localhost:3100/loki/api/v1/labels"
curl "http://localhost:3100/loki/api/v1/label/namespace/values"

Management Commands

Status Monitoring

# Check Loki components
kubectl get pods -n monitoring -l app.kubernetes.io/name=loki

# View Loki configuration
kubectl get configmap -n monitoring loki-config -o yaml

# Check Promtail status
kubectl get daemonset -n monitoring promtail

# View Promtail logs
kubectl logs -n monitoring -l app.kubernetes.io/name=promtail

Storage Management

# Check storage usage
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
  rbd du pool/loki

# View S3 bucket contents
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
  s3cmd ls s3://loki-chunks/

# Check retention policies
kubectl get configmap -n monitoring loki-config -o yaml | grep retention

Troubleshooting

# Check Loki ingester status
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester

# Verify log ingestion
kubectl logs -n monitoring -l app.kubernetes.io/name=promtail | grep "entry pushed"

# Test log query performance
time logcli query '{namespace="monitoring"}' --limit=100

# Check gateway health
curl http://loki-gateway/ready
curl http://loki-gateway/metrics

Performance Optimization

Resource Configuration

# Loki resource limits
resources:
  requests:
    memory: 1Gi
    cpu: 500m
  limits:
    memory: 2Gi
    cpu: 1000m

Query Performance

# Query frontend caching
query_range:
  cache_results: true
  max_retries: 5
  
limits_config:
  query_timeout: 1m
  max_query_parallelism: 32

The logging infrastructure provides comprehensive log aggregation and analysis capabilities, enabling effective troubleshooting, monitoring, and operational insights across the entire Anton cluster.

Architecture Overview​

Core Components​

Log Collection​

Log Processing​

Storage Backend​

Query Interface​

Log Sources​

Application Logs​

Kubernetes Component Logs​

System Logs​

Configuration​

Promtail Configuration​

Loki Configuration​

Log Querying​

LogQL Syntax​

Common Query Examples​

Access and Management​

LogCLI Commands​

Grafana Integration​

Direct API Access​

Management Commands​

Status Monitoring​

Storage Management​

Troubleshooting​

Performance Optimization​

Resource Configuration​

Query Performance​