Loki S3 Authentication Failures
Status: Active Issue
Category: Authentication
Affected Component: Loki (Logging Stack)
Severity: High - Log ingestion failures
Symptoms
- Loki pod shows
1/2
ready status - Continuous
InvalidAccessKeyId
errors in logs with HTTP 403 status - Log messages:
"failed to flush chunks: store put chunk: InvalidAccessKeyId"
- Readiness probe failures on
/ready
endpoint - High restart count on loki container
Root Cause
The S3 credentials stored in 1Password (accessed via External Secrets Operator) do not match the current credentials generated by Rook Ceph RadosGW. This mismatch occurs when:
- Ceph RadosGW regenerates user credentials
- 1Password entry is not updated with new credentials
- External Secret continues to pull outdated credentials
- Loki cannot authenticate with S3 storage backend
Technical Details
Components Involved:
- Rook Ceph:
CephObjectStoreUser/loki
instorage
namespace - Secret:
rook-ceph-object-user-storage-loki
(contains current credentials) - 1Password: Entry
loki-s3-config
(may contain outdated credentials) - External Secret:
loki-s3-credentials
inmonitoring
namespace - Loki: Uses credentials from External Secret for S3 backend
Authentication Flow:
- Rook creates
CephObjectStoreUser/loki
- Rook generates secret with current S3 credentials
- External Secret pulls credentials from 1Password
- Loki uses External Secret credentials to access S3
Diagnosis
Check if credentials match between sources:
# Get current Ceph credentials (correct ones)
kubectl get secret -n storage rook-ceph-object-user-storage-loki \\
-o jsonpath='{.data.AccessKey}' | base64 -d
# Get credentials from External Secret (may be outdated)
kubectl get secret -n monitoring loki-s3-credentials \\
-o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d
If these don't match, the 1Password entry needs updating.
Resolution
Step 1: Get Current Credentials
Extract current credentials from Rook Ceph:
# Run the bucket creation script to display current credentials
./scripts/create-loki-s3-bucket.ts
This script will display the correct access key and secret key that should be in 1Password.
Step 2: Update 1Password
Update the 1Password entry with current credentials:
- Item name:
loki-s3-config
- Field
access_key
: (Value from Step 1) - Field
secret_key
: (Value from Step 1)
Step 3: Force External Secret Refresh
# Force immediate refresh of External Secret
kubectl annotate externalsecret loki-s3-credentials \\
-n monitoring force-sync="$(date +%s)" --overwrite
Step 4: Restart Loki Pod
# Delete pod to pick up new credentials
kubectl delete pod loki-0 -n monitoring
Step 5: Verify Resolution
# Check pod status (should be 2/2 ready)
kubectl get pods -n monitoring loki-0
# Verify no more authentication errors in logs
kubectl logs -n monitoring loki-0 -c loki --tail=20
Prevention
- Monitoring: Set up alerts for Loki readiness failures
- Automation: Consider automating credential sync between Rook and 1Password
- Documentation: Keep this procedure updated for future occurrences
- Regular Checks: Include S3 authentication in health checks
Related Documentation
History
- Initial Discovery: Authentication failures detected via pod status monitoring
- Root Cause: Credential mismatch between Rook Ceph and 1Password
- Resolution: Manual 1Password update process documented