S3 Integration
The Anton cluster's Loki logging system leverages S3-compatible object storage provided by Rook-Ceph for scalable, durable log storage with cost-effective long-term retention.
Architecture
Ceph RGW Configuration
RadosGW Deployment
The Rook-Ceph operator deploys RadosGW (Rados Gateway) to provide S3-compatible API:
# CephObjectStore configuration
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
name: ceph-objectstore
namespace: storage
spec:
metadataPool:
failureDomain: host
replicated:
size: 3
dataPool:
failureDomain: host
replicated:
size: 3
gateway:
instances: 2
port: 80
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
Service Endpoint
# Service exposing RGW
apiVersion: v1
kind: Service
metadata:
name: s3-gateway-service
namespace: storage
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: rook-ceph-rgw
ceph_daemon_id: ceph-objectstore
S3 Bucket Configuration
Bucket Creation
# Create S3 buckets for Loki
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
s3cmd mb s3://loki-chunks
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
s3cmd mb s3://loki-ruler
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
s3cmd mb s3://loki-admin
Bucket Policies
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam:::user/loki"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::loki-chunks/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam:::user/loki"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::loki-chunks"
}
]
}
Loki S3 Configuration
Storage Configuration
# Loki storage configuration for S3
storage_config:
aws:
# S3 endpoint and bucket
endpoint: http://s3-gateway-service.storage.svc.cluster.local:80
region: us-east-1
bucketnames: loki-chunks
s3forcepathstyle: true
# Authentication
access_key_id: ${S3_ACCESS_KEY}
secret_access_key: ${S3_SECRET_KEY}
# Performance tuning
insecure: false
sse_encryption: false
http_config:
idle_conn_timeout: 90s
response_header_timeout: 0s
insecure_skip_verify: false
# BoltDB shipper for index
boltdb_shipper:
active_index_directory: /loki/boltdb-shipper-active
cache_location: /loki/boltdb-shipper-cache
shared_store: aws
cache_ttl: 24h
Chunk Configuration
# Chunk storage settings optimized for S3
chunk_store_config:
max_look_back_period: 0s
# Compactor configuration for S3 optimization
compactor:
working_directory: /loki/compactor
shared_store: aws
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
Authentication Setup
S3 User Creation
# Create Loki S3 user in Ceph
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
radosgw-admin user create --uid=loki --display-name="Loki Log Storage"
# Generate access keys
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
radosgw-admin key create --uid=loki --key-type=s3
Secret Management
# Kubernetes secret for S3 credentials
apiVersion: v1
kind: Secret
metadata:
name: loki-s3-secret
namespace: monitoring
type: Opaque
stringData:
access-key-id: YOUR_S3_ACCESS_KEY
secret-access-key: YOUR_S3_SECRET_KEY
External Secrets Integration
# External Secret for 1Password integration
apiVersion: external-secrets.io/v1
kind: SecretStore
metadata:
name: 1password-secret-store
namespace: monitoring
spec:
provider:
onepassword:
connectHost: http://onepassword-connect:8080
vaults:
secrets: 1
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: loki-s3-credentials
namespace: monitoring
spec:
secretStoreRef:
name: 1password-secret-store
kind: SecretStore
target:
name: loki-s3-secret
data:
- secretKey: access-key-id
remoteRef:
key: loki-s3-access-key-id
- secretKey: secret-access-key
remoteRef:
key: loki-s3-secret-key
Performance Optimization
Chunk Optimization
# Optimize chunk size for S3 storage
ingester:
chunk_target_size: 1572864 # 1.5MB optimal for S3
chunk_encoding: snappy # Fast compression
chunk_idle_period: 30m # Balance upload frequency
max_chunk_age: 2h # Maximum chunk age before flush
Upload Configuration
# S3 upload optimization
storage_config:
aws:
# Multipart upload settings
s3:
multipart_size: 64MB # Chunk size for multipart uploads
max_concurrent_uploads: 10 # Concurrent upload streams
# HTTP client tuning
http_config:
timeout: 30s
keep_alive: 30s
max_idle_conns: 100
max_idle_conns_per_host: 10
Caching Strategy
# Enable caching for S3 objects
chunk_store_config:
cache_lookups_older_than: 0s
chunk_cache_config:
enable_fifocache: true
fifocache:
max_size_items: 2048
validity: 24h
write_dedupe_cache_config:
enable_fifocache: true
fifocache:
max_size_items: 1024
validity: 24h
Monitoring and Operations
Storage Monitoring
# Check S3 bucket usage
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
s3cmd du s3://loki-chunks --human-readable
# List recent uploads
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
s3cmd ls s3://loki-chunks/fake/ | tail -20
# Check bucket policy
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
s3cmd info s3://loki-chunks
Performance Metrics
# Monitor S3 operation metrics
curl "http://localhost:3100/metrics" | grep -E "(s3|aws)"
# Check chunk upload rates
curl "http://localhost:3100/metrics" | grep loki_ingester_chunks_flushed_total
# Monitor S3 errors
kubectl logs -n monitoring -l app.kubernetes.io/name=loki | grep -i "s3\|aws"
Health Checks
# Test S3 connectivity
kubectl exec -n monitoring loki-ingester-0 -- \
curl -I http://s3-gateway-service.storage.svc.cluster.local
# Verify bucket access
kubectl exec -n storage -c toolbox deployment/rook-ceph-tools -- \
s3cmd ls s3://loki-chunks/
# Check RadosGW status
kubectl get pods -n storage -l app=rook-ceph-rgw
Troubleshooting
Common Issues
# S3 authentication errors
kubectl logs -n monitoring -l app.kubernetes.io/name=loki | grep -i "AccessDenied\|InvalidAccessKey"
# Connection timeouts
kubectl logs -n monitoring -l app.kubernetes.io/name=loki | grep -i "timeout\|connection"
# Upload failures
kubectl logs -n monitoring -l app.kubernetes.io/name=loki | grep -i "upload\|put.*failed"
# Check RGW logs for server-side issues
kubectl logs -n storage -l app=rook-ceph-rgw
Recovery Procedures
# Restart Loki components if S3 issues persist
kubectl rollout restart deployment/loki-gateway -n monitoring
kubectl rollout restart deployment/loki-distributor -n monitoring
# Clear local cache if needed
kubectl exec -n monitoring loki-ingester-0 -- \
rm -rf /loki/boltdb-shipper-cache/*
# Test bucket connectivity
kubectl run s3-test --image=amazon/aws-cli --rm -it -- \
aws s3 ls s3://loki-chunks --endpoint-url=http://s3-gateway-service.storage.svc.cluster.local
Security Considerations
Access Control
- Minimal Permissions: Grant only required S3 permissions
- Secret Management: Use External Secrets for credential management
- Network Policies: Restrict S3 access to authorized pods
Data Protection
- Encryption: Enable S3 server-side encryption if required
- Retention Policies: Implement appropriate data retention
- Backup: Regular backup of critical configuration
The S3 integration provides scalable, cost-effective log storage while leveraging the existing Ceph infrastructure, ensuring data durability and performance for the logging system.