Storage

The Anton cluster uses Rook-Ceph to provide distributed, software-defined storage that delivers high availability, scalability, and performance for all stateful workloads.

Architecture Overview

Storage Classes

The cluster provides three storage classes optimized for different use cases:

Block Storage (ceph-block) - Default

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-block
provisioner: storage.rbd.csi.ceph.com
parameters:
  clusterID: storage
  pool: ceph-blockpool
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: storage
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: storage
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate

Use Cases:

Database storage (PostgreSQL, MongoDB)
Application data volumes
Prometheus metrics storage
High-performance workloads

Filesystem Storage (ceph-filesystem)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-filesystem
provisioner: storage.cephfs.csi.ceph.com
parameters:
  clusterID: storage
  fsName: ceph-filesystem
  pool: ceph-filesystem-data0
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate

Use Cases:

Shared storage across multiple pods
Content management systems
Development environments
ReadWriteMany access patterns

Object Storage (ceph-bucket)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-bucket
provisioner: rook-ceph.rook.io/bucket
parameters:
  objectStoreName: ceph-objectstore
  bucketName: app-bucket
reclaimPolicy: Delete

Use Cases:

Container registry backends
Log storage (Loki chunks)
Backup storage
Static website content

Cluster Configuration

Physical Layout

Total Capacity: 6x 1TB NVMe SSDs (6TB raw)
Replication: 3-way replication for all data
Usable Capacity: ~2TB after replication and overhead
Failure Domain: Host-level (can survive single node failure)

Ceph Cluster Health

# Check overall cluster health
kubectl -n storage exec deploy/rook-ceph-tools -- ceph status

# View cluster capacity and usage
kubectl -n storage exec deploy/rook-ceph-tools -- ceph df

# Check OSD status
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd status

# View placement group health
kubectl -n storage exec deploy/rook-ceph-tools -- ceph pg stat

Performance Characteristics

Throughput Expectations

Sequential Read: ~400-500 MB/s per OSD
Sequential Write: ~200-300 MB/s per OSD
Random IOPS: 10K-15K IOPS per OSD
Aggregate: ~2.4GB/s read, ~1.8GB/s write theoretical

Latency Profile

Block Storage: 1-5ms typical latency
Filesystem: 2-8ms typical latency
Object Storage: 10-50ms typical latency

Resource Allocation

Current Storage Usage

# Check PVC usage across namespaces
kubectl get pvc -A

# View storage consumption by application
kubectl get pvc -A -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,SIZE:.spec.resources.requests.storage,STORAGECLASS:.spec.storageClassName"

# Monitor volume usage
kubectl exec -n storage deploy/rook-ceph-tools -- rbd du -p ceph-blockpool

Capacity Planning

# Check available space
kubectl -n storage exec deploy/rook-ceph-tools -- ceph df detail

# View OSD utilization
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd df

# Monitor cluster growth trends
kubectl -n storage exec deploy/rook-ceph-tools -- ceph tell osd.* perf schema

Management Commands

Cluster Operations

# Access Ceph toolbox
kubectl -n storage exec -it deploy/rook-ceph-tools -- bash

# View cluster topology
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd tree

# Check cluster warnings
kubectl -n storage exec deploy/rook-ceph-tools -- ceph health detail

# View active operations
kubectl -n storage exec deploy/rook-ceph-tools -- ceph -w

Storage Administration

# List all pools
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd pool ls detail

# Check pool statistics
kubectl -n storage exec deploy/rook-ceph-tools -- ceph osd pool stats

# View RBD images
kubectl -n storage exec deploy/rook-ceph-tools -- rbd ls -p ceph-blockpool

# Check image details
kubectl -n storage exec deploy/rook-ceph-tools -- rbd info -p ceph-blockpool pvc-<uuid>

Performance Monitoring

# Monitor cluster performance
kubectl -n storage exec deploy/rook-ceph-tools -- ceph tell osd.* perf schema

# Check slow operations
kubectl -n storage exec deploy/rook-ceph-tools -- ceph daemon osd.0 dump_historic_slow_ops

# View bandwidth utilization
kubectl top pods -n storage

# Monitor OSD performance metrics
kubectl -n storage exec deploy/rook-ceph-tools -- ceph tell osd.* perf schema

Troubleshooting

Health Diagnostics

# Comprehensive health check
kubectl -n storage exec deploy/rook-ceph-tools -- ceph health detail

# Check for stuck PGs
kubectl -n storage exec deploy/rook-ceph-tools -- ceph pg dump_stuck

# View cluster logs
kubectl logs -n storage -l app=rook-ceph-mon

# Check OSD logs
kubectl logs -n storage -l app=rook-ceph-osd

Recovery Operations

# Restart stuck OSDs
kubectl -n storage delete pod -l app=rook-ceph-osd,ceph_daemon_id=0

# Force scrub operations
kubectl -n storage exec deploy/rook-ceph-tools -- ceph pg scrub <pg-id>

# Check and repair filesystem
kubectl -n storage exec deploy/rook-ceph-tools -- ceph fs status
kubectl -n storage exec deploy/rook-ceph-tools -- ceph mds repaired

Best Practices

Volume Management

Size Planning: Plan for 3x replication overhead
Performance: Use block storage for databases
Backup: Implement regular snapshot policies
Monitoring: Set up alerts for capacity thresholds

Maintenance

Regular Health Checks: Daily cluster health monitoring
Capacity Monitoring: Track usage growth trends
Performance Baselines: Establish performance benchmarks
Update Strategy: Plan Ceph version upgrades carefully

The Rook-Ceph storage system provides enterprise-grade distributed storage capabilities, ensuring data durability, high availability, and consistent performance for all cluster workloads.

Architecture Overview​

Storage Classes​

Block Storage (ceph-block) - Default​

Filesystem Storage (ceph-filesystem)​

Object Storage (ceph-bucket)​

Cluster Configuration​

Physical Layout​

Ceph Cluster Health​

Performance Characteristics​

Throughput Expectations​

Latency Profile​

Resource Allocation​

Current Storage Usage​

Capacity Planning​

Management Commands​

Cluster Operations​

Storage Administration​

Performance Monitoring​

Troubleshooting​

Health Diagnostics​

Recovery Operations​

Best Practices​

Volume Management​

Maintenance​

Architecture Overview

Storage Classes

Block Storage (ceph-block) - Default

Filesystem Storage (ceph-filesystem)

Object Storage (ceph-bucket)

Cluster Configuration

Physical Layout

Ceph Cluster Health

Performance Characteristics

Throughput Expectations

Latency Profile

Resource Allocation

Current Storage Usage

Capacity Planning

Management Commands

Cluster Operations

Storage Administration

Performance Monitoring

Troubleshooting

Health Diagnostics

Recovery Operations

Best Practices

Volume Management

Maintenance