Skip to main content

Loki S3 Authentication Failures

Status: Active Issue
Category: Authentication
Affected Component: Loki (Logging Stack)
Severity: High - Log ingestion failures

Symptoms

  • Loki pod shows 1/2 ready status
  • Continuous InvalidAccessKeyId errors in logs with HTTP 403 status
  • Log messages: "failed to flush chunks: store put chunk: InvalidAccessKeyId"
  • Readiness probe failures on /ready endpoint
  • High restart count on loki container

Root Cause

The S3 credentials stored in 1Password (accessed via External Secrets Operator) do not match the current credentials generated by Rook Ceph RadosGW. This mismatch occurs when:

  1. Ceph RadosGW regenerates user credentials
  2. 1Password entry is not updated with new credentials
  3. External Secret continues to pull outdated credentials
  4. Loki cannot authenticate with S3 storage backend

Technical Details

Components Involved:

  • Rook Ceph: CephObjectStoreUser/loki in storage namespace
  • Secret: rook-ceph-object-user-storage-loki (contains current credentials)
  • 1Password: Entry loki-s3-config (may contain outdated credentials)
  • External Secret: loki-s3-credentials in monitoring namespace
  • Loki: Uses credentials from External Secret for S3 backend

Authentication Flow:

  1. Rook creates CephObjectStoreUser/loki
  2. Rook generates secret with current S3 credentials
  3. External Secret pulls credentials from 1Password
  4. Loki uses External Secret credentials to access S3

Diagnosis

Check if credentials match between sources:

# Get current Ceph credentials (correct ones)
kubectl get secret -n storage rook-ceph-object-user-storage-loki \\
-o jsonpath='{.data.AccessKey}' | base64 -d

# Get credentials from External Secret (may be outdated)
kubectl get secret -n monitoring loki-s3-credentials \\
-o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d

If these don't match, the 1Password entry needs updating.

Resolution

Step 1: Get Current Credentials

Extract current credentials from Rook Ceph:

# Run the bucket creation script to display current credentials
./scripts/create-loki-s3-bucket.ts

This script will display the correct access key and secret key that should be in 1Password.

Step 2: Update 1Password

Update the 1Password entry with current credentials:

  • Item name: loki-s3-config
  • Field access_key: (Value from Step 1)
  • Field secret_key: (Value from Step 1)

Step 3: Force External Secret Refresh

# Force immediate refresh of External Secret
kubectl annotate externalsecret loki-s3-credentials \\
-n monitoring force-sync="$(date +%s)" --overwrite

Step 4: Restart Loki Pod

# Delete pod to pick up new credentials
kubectl delete pod loki-0 -n monitoring

Step 5: Verify Resolution

# Check pod status (should be 2/2 ready)
kubectl get pods -n monitoring loki-0

# Verify no more authentication errors in logs
kubectl logs -n monitoring loki-0 -c loki --tail=20

Prevention

  1. Monitoring: Set up alerts for Loki readiness failures
  2. Automation: Consider automating credential sync between Rook and 1Password
  3. Documentation: Keep this procedure updated for future occurrences
  4. Regular Checks: Include S3 authentication in health checks

History

  • Initial Discovery: Authentication failures detected via pod status monitoring
  • Root Cause: Credential mismatch between Rook Ceph and 1Password
  • Resolution: Manual 1Password update process documented