Storage
The Anton cluster uses Rook-Ceph to provide distributed, software-defined storage that delivers high availability, scalability, and performance for all stateful workloads.
Architecture Overview
Storage Classes
The cluster provides three storage classes optimized for different use cases:
Block Storage (ceph-block) - Default
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-block
provisioner: storage.rbd.csi.ceph.com
parameters:
clusterID: storage
pool: ceph-blockpool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: storage
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: storage
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
Use Cases:
- Database storage (PostgreSQL, MongoDB)
- Application data volumes
- Prometheus metrics storage
- High-performance workloads
Filesystem Storage (ceph-filesystem)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-filesystem
provisioner: storage.cephfs.csi.ceph.com
parameters:
clusterID: storage
fsName: ceph-filesystem
pool: ceph-filesystem-data0
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
Use Cases:
- Shared storage across multiple pods
- Content management systems
- Development environments
- ReadWriteMany access patterns
Object Storage (ceph-bucket)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-bucket
provisioner: rook-ceph.rook.io/bucket
parameters:
objectStoreName: ceph-objectstore
bucketName: app-bucket
reclaimPolicy: Delete
Use Cases:
- Container registry backends
- Log storage (Loki chunks)
- Backup storage
- Static website content
Cluster Configuration
Physical Layout
- Total Capacity: 6x 1TB NVMe SSDs (6TB raw)
- Replication: 3-way replication for all data
- Usable Capacity: ~2TB after replication and overhead
- Failure Domain: Host-level (can survive single node failure)
Ceph Cluster Health
# Check overall cluster health
kubectl -n storage exec deploy/rook-ceph-tools -- ceph status
# View cluster capacity and usage
kubectl -n storage exec deploy/rook-ceph-tools -- ceph df
# Check OSD status
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd status
# View placement group health
kubectl -n storage exec deploy/rook-ceph-tools -- ceph pg stat
Performance Characteristics
Throughput Expectations
- Sequential Read: ~400-500 MB/s per OSD
- Sequential Write: ~200-300 MB/s per OSD
- Random IOPS: 10K-15K IOPS per OSD
- Aggregate: ~2.4GB/s read, ~1.8GB/s write theoretical
Latency Profile
- Block Storage: 1-5ms typical latency
- Filesystem: 2-8ms typical latency
- Object Storage: 10-50ms typical latency
Resource Allocation
Current Storage Usage
# Check PVC usage across namespaces
kubectl get pvc -A
# View storage consumption by application
kubectl get pvc -A -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,SIZE:.spec.resources.requests.storage,STORAGECLASS:.spec.storageClassName"
# Monitor volume usage
kubectl exec -n storage deploy/rook-ceph-tools -- rbd du -p ceph-blockpool
Capacity Planning
# Check available space
kubectl -n storage exec deploy/rook-ceph-tools -- ceph df detail
# View OSD utilization
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd df
# Monitor cluster growth trends
kubectl -n storage exec deploy/rook-ceph-tools -- ceph tell osd.* perf schema
Management Commands
Cluster Operations
# Access Ceph toolbox
kubectl -n storage exec -it deploy/rook-ceph-tools -- bash
# View cluster topology
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd tree
# Check cluster warnings
kubectl -n storage exec deploy/rook-ceph-tools -- ceph health detail
# View active operations
kubectl -n storage exec deploy/rook-ceph-tools -- ceph -w
Storage Administration
# List all pools
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd pool ls detail
# Check pool statistics
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd pool stats
# View RBD images
kubectl -n storage exec deploy/rook-ceph-tools -- rbd ls -p ceph-blockpool
# Check image details
kubectl -n storage exec deploy/rook-ceph-tools -- rbd info -p ceph-blockpool pvc-<uuid>
Performance Monitoring
# Monitor cluster performance
kubectl -n storage exec deploy/rook-ceph-tools -- ceph tell osd.* perf schema
# Check slow operations
kubectl -n storage exec deploy/rook-ceph-tools -- ceph daemon osd.0 dump_historic_slow_ops
# View bandwidth utilization
kubectl top pods -n storage
# Monitor OSD performance metrics
kubectl -n storage exec deploy/rook-ceph-tools -- ceph tell osd.* perf schema
Troubleshooting
Health Diagnostics
# Comprehensive health check
kubectl -n storage exec deploy/rook-ceph-tools -- ceph health detail
# Check for stuck PGs
kubectl -n storage exec deploy/rook-ceph-tools -- ceph pg dump_stuck
# View cluster logs
kubectl logs -n storage -l app=rook-ceph-mon
# Check OSD logs
kubectl logs -n storage -l app=rook-ceph-osd
Recovery Operations
# Restart stuck OSDs
kubectl -n storage delete pod -l app=rook-ceph-osd,ceph_daemon_id=0
# Force scrub operations
kubectl -n storage exec deploy/rook-ceph-tools -- ceph pg scrub <pg-id>
# Check and repair filesystem
kubectl -n storage exec deploy/rook-ceph-tools -- ceph fs status
kubectl -n storage exec deploy/rook-ceph-tools -- ceph mds repaired
Best Practices
Volume Management
- Size Planning: Plan for 3x replication overhead
- Performance: Use block storage for databases
- Backup: Implement regular snapshot policies
- Monitoring: Set up alerts for capacity thresholds
Maintenance
- Regular Health Checks: Daily cluster health monitoring
- Capacity Monitoring: Track usage growth trends
- Performance Baselines: Establish performance benchmarks
- Update Strategy: Plan Ceph version upgrades carefully
The Rook-Ceph storage system provides enterprise-grade distributed storage capabilities, ensuring data durability, high availability, and consistent performance for all cluster workloads.