Object Storage Disaster Recovery

The Ceph RGW Multi-Site feature is a cross-cluster asynchronous data replication mechanism designed to synchronize object storage data between geographically distributed Ceph clusters, providing High Availability (HA) and Disaster Recovery (DR) capabilities.

TOC

Terminology

TermExplanation
Primary ClusterThe cluster currently providing storage services.
Secondary ClusterThe standby cluster used for backup purposes.
Realm, ZoneGroup, Zone
  • Realm: The highest-level logical grouping in Ceph object storage. It represents a complete object storage namespace, typically used for multi-site replication and synchronization. A Realm can span different geographical locations or data centers.
  • ZoneGroup: A logical grouping within a Realm, containing multiple Zones. ZoneGroups enable data synchronization and replication across Zones, usually within the same geographical region.
  • Zone: A logical grouping within a ZoneGroup that physically stores data. Each Zone manages and stores objects independently and can have its own data/metadata pool configurations.

Prerequisites

  • Prepare two clusters available for deploying Rook-Ceph (Primary and Secondary clusters) with network connectivity between them.
  • Both clusters must use the same platform version (v3.12 or later).
  • Ensure no Ceph object storage is deployed on either the Primary or Secondary cluster.
  • Refer to the Create Storage Service documentation to deploy Operator and create clusters. Do not proceed with object storage pool creation via the wizard after cluster creation. Instead, use CLI tools for configuration as described below.

Architecture

Procedures

This guide provides a synchronization solution between two Zones in the same ZoneGroup.

Create Object Storage in Primary Cluster

This step creates the Realm, ZoneGroup, Primary Zone, and Primary Zone's gateway resources.

Execute the following commands on the Control node of the Primary cluster:

  1. Set Parameters

    export REALM_NAME=<realm-name>
    export ZONE_GROUP_NAME=<zonegroup-name>
    export PRIMARY_ZONE_NAME=<primary-zone-name>
    export PRIMARY_OBJECT_STORE_NAME=<primary-object-store-name>

    Parameters description:

    • <realm-name>: Realm name.
    • <zonegroup-name>: ZoneGroup name.
    • <primary-zone-name>: Primary Zone name.
    • <primary-object-store-name>: Gateway name.
  2. Create Object Storage

    Command
    Output
    cat << EOF | kubectl apply -f -
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephObjectRealm
    metadata:
      name: $REALM_NAME
      namespace: rook-ceph
    
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephObjectZoneGroup
    metadata:
      name: $ZONE_GROUP_NAME
      namespace: rook-ceph
    spec:
      realm: $REALM_NAME
    
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephObjectZone
    metadata:
      name: $PRIMARY_ZONE_NAME
      namespace: rook-ceph
    spec:
      zoneGroup: $ZONE_GROUP_NAME
      metadataPool:
        failureDomain: host
        replicated:
          size: 3
          requireSafeReplicaSize: true
      dataPool:
        failureDomain: host
        replicated:
          size: 3
          requireSafeReplicaSize: true
        parameters:
          compression_mode: none
      preservePoolsOnDelete: false
    
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephBlockPool
    metadata:
      labels:
        cpaas.io/builtin: "true"
      name: builtin-rgw-root
      namespace: rook-ceph
    spec:
      name: .rgw.root
      application: rgw
      enableCrushUpdates: true
      failureDomain: host
      replicated:
        size: 3
      parameters:
        pg_num: "8"
    
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephObjectStore
    metadata:
      name: $PRIMARY_OBJECT_STORE_NAME
      namespace: rook-ceph
    spec:
      gateway:
        port: 7480
        instances: 2
      zone:
        name: $PRIMARY_ZONE_NAME
    EOF

Configure External Access for Primary Zone

  1. Obtain the UID of the ObjectStore

    export PRIMARY_OBJECT_STORE_UID=$(kubectl -n rook-ceph get cephobjectstore $PRIMARY_OBJECT_STORE_NAME -o jsonpath='{.metadata.uid}')
  2. Create an external access Service

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: Service
    metadata:
      name: rook-ceph-rgw-$PRIMARY_OBJECT_STORE_NAME-external
      namespace: rook-ceph
      labels:
        app: rook-ceph-rgw
        rook_cluster: rook-ceph
        rook_object_store: $PRIMARY_OBJECT_STORE_NAME
      ownerReferences:
        - apiVersion: ceph.rook.io/v1
          kind: CephObjectStore
          name: $PRIMARY_OBJECT_STORE_NAME
          uid: $PRIMARY_OBJECT_STORE_UID
    spec:
      ports:
        - name: rgw
          port: 7480
          targetPort: 7480
          protocol: TCP
      selector:
        app: rook-ceph-rgw
        rook_cluster: rook-ceph
        rook_object_store: $PRIMARY_OBJECT_STORE_NAME
      sessionAffinity: None
      type: NodePort
    EOF
  3. Add external endpoints to the CephObjectZone.

    IP=$(kubectl get nodes -l 'node-role.kubernetes.io/control-plane' -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}' | cut -d ' ' -f1 | tr -d '\n')
    PORT=$(kubectl -n rook-ceph get svc rook-ceph-rgw-$PRIMARY_OBJECT_STORE_NAME-external -o jsonpath='{.spec.ports[0].nodePort}')
    ENDPOINT=http://$IP:$PORT
    kubectl -n rook-ceph patch cephobjectzone $PRIMARY_ZONE_NAME --type merge -p "{\"spec\":{\"customEndpoints\":[\"$ENDPOINT\"]}}"

Obtain access-key and secret-key

kubectl -n rook-ceph get secrets $REALM_NAME-keys -o jsonpath='{.data.access-key}'
kubectl -n rook-ceph get secrets $REALM_NAME-keys -o jsonpath='{.data.secret-key}'

Create Secondary Zone and Configure Realm Sync

This section explains how to create the Secondary Zone and configure synchronization by pulling Realm information from the Primary cluster.

Execute the following commands on the Control node of the Secondary cluster:

  1. Set Parameters

    export REALM_NAME=<realm-name>
    export ZONE_GROUP_NAME=<zonegroup-name>
    
    export REALM_ENDPOINT=<realm-endpoint>
    export ACCESS_KEY=<access-key>
    export SECRET_KEY=<secret-key>
    
    export SECONDARY_ZONE_NAME=<secondary-zone-name>
    export SECONDARY_OBJECT_STORE_NAME=<secondary-object-store-name>

    Parameters description:

    • <realm-name>: Realm name.
    • <zone-group-name>: ZoneGroup name.
    • <realm-endpoint>: External address obtained from the Primary cluster.
    • <access-key>: AK obtain from here.
    • <secret-key>: SK obtain from here.
    • <secondary-zone-name>: Secondary Zone name.
    • <secondary-object-store-name>: Secondary Gateway name.
  2. Create Secondary Zone and Configure Realm Sync

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: Secret
    metadata:
      name: $REALM_NAME-keys
      namespace: rook-ceph
    data:
      access-key: $ACCESS_KEY
      secret-key: $SECRET_KEY
    
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephObjectRealm
    metadata:
      name: $REALM_NAME
      namespace: rook-ceph
    spec:
      pull:
        endpoint: $REALM_ENDPOINT
    
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephObjectZoneGroup
    metadata:
      name: $ZONE_GROUP_NAME
      namespace: rook-ceph
    spec:
      realm: $REALM_NAME
    
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephObjectZone
    metadata:
      name: $SECONDARY_ZONE_NAME
      namespace: rook-ceph
    spec:
      zoneGroup: $ZONE_GROUP_NAME
      metadataPool:
        failureDomain: host
        replicated:
          size: 3
          requireSafeReplicaSize: true
      dataPool:
        failureDomain: host
        replicated:
          size: 3
          requireSafeReplicaSize: true
      preservePoolsOnDelete: false
    
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephBlockPool
    metadata:
      labels:
        cpaas.io/builtin: "true"
      name: builtin-rgw-root
      namespace: rook-ceph
    spec:
      name: .rgw.root
      application: rgw
      enableCrushUpdates: true
      failureDomain: host
      replicated:
        size: 3
      parameters:
        pg_num: "8"
        
    ---
    apiVersion: ceph.rook.io/v1
    kind: CephObjectStore
    metadata:
      name: $SECONDARY_OBJECT_STORE_NAME
      namespace: rook-ceph
    spec:
      gateway:
        port: 7480
        instances: 2
      zone:
        name: $SECONDARY_ZONE_NAME
    EOF

Configure External Access for Secondary Zone

  1. Obtain UID of Secondary Gateway

    export SECONDARY_OBJECT_STORE_UID=$(kubectl -n rook-ceph get cephobjectstore $SECONDARY_OBJECT_STORE_NAME -o jsonpath='{.metadata.uid}')
  2. Create an external access Service

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: Service
    metadata:
      name: rook-ceph-rgw-$SECONDARY_OBJECT_STORE_NAME-external
      namespace: rook-ceph
      labels:
        app: rook-ceph-rgw
        rook_cluster: rook-ceph
        rook_object_store: $SECONDARY_OBJECT_STORE_NAME
      ownerReferences:
        - apiVersion: ceph.rook.io/v1
          kind: CephObjectStore
          name: $SECONDARY_OBJECT_STORE_NAME
          uid: $SECONDARY_OBJECT_STORE_UID
    spec:
      ports:
        - name: rgw
          port: 7480
          targetPort: 7480
          protocol: TCP
      selector:
        app: rook-ceph-rgw
        rook_cluster: rook-ceph
        rook_object_store: $SECONDARY_OBJECT_STORE_NAME
      sessionAffinity: None
      type: NodePort
    EOF
  3. Add external endpoints to the Secondary CephObjectZone

    IP=$(kubectl get nodes -l 'node-role.kubernetes.io/control-plane' -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}' | cut -d ' ' -f1 | tr -d '\n')
    PORT=$(kubectl -n rook-ceph get svc rook-ceph-rgw-$SECONDARY_OBJECT_STORE_NAME-external -o jsonpath='{.spec.ports[0].nodePort}')
    ENDPOINT=http://$IP:$PORT
    kubectl -n rook-ceph patch cephobjectzone $SECONDARY_ZONE_NAME --type merge -p "{\"spec\":{\"customEndpoints\":[\"$ENDPOINT\"]}}"

Check Ceph Object Storage Synchronization Status

Execute the following commands in the rook-ceph-tools pod of the Secondary cluster

# enter rook-ceph-tools pod
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get po -l app=rook-ceph-tools -o jsonpath='{range .items[*]}{@.metadata.name}') -- bash

radosgw-admin sync status

Output example

          realm 962d6b80-b218-4fc8-8198-e498fabc4e9d (relam-primary)
      zonegroup 9de3acd7-0b01-4a04-ac84-1421c6103a16 (zonegroup-primary)
           zone 7b3ce7f5-7090-46f6-afb1-d1bb156053da (zone-secondary)
   current time 2025-12-04T06:27:15Z
zonegroup features enabled: resharding
                   disabled: compress-encrypted
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 6319ca70-964e-47be-8b96-7c5bf5a128b1 (zone-primary)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

metadata is caught up with master and data is caught up with source means sync status is healthy.

Failover

When the Primary cluster fails, it is necessary to promote the Secondary Zone to the Primary Zone. After the switch, the Secondary Zone's gateway can continue to provide object storage services.

Procedures

Execute the following commands in the rook-ceph-tools pod of the Secondary cluster

# enter rook-ceph-tools pod
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get po -l app=rook-ceph-tools -o jsonpath='{range .items[*]}{@.metadata.name}') -- bash

# Make the recovered zone the master and default zone
radosgw-admin zone modify --rgw-realm=<realm-name> --rgw-zonegroup=<zone-group-name> --rgw-zone=<secondary-zone-name> --master

# Update the period so that the changes take effect
radosgw-admin period update --commit --rgw-realm=<realm-name> --rgw-zonegroup=<zone-group-name>

Parameters

  • <realm-name>: Realm name.
  • <zone-group-name>: Secondary Zone Group name.
  • <secondary-zone-name>: Secondary Zone name.

Failback

Once the failed cluster is recovered on the primary site and you want to failback from secondary site, follow the below stepss

Procedures

Execute the following commands in the rook-ceph-tools pod of the Primary cluster

# enter rook-ceph-tools pod
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get po -l app=rook-ceph-tools -o jsonpath='{range .items[*]}{@.metadata.name}') -- bash

# check sync status, wait for sync from secondary site
radosgw-admin sync status

#           realm 962d6b80-b218-4fc8-8198-e498fabc4e9d (relam-primary)
#       zonegroup 9de3acd7-0b01-4a04-ac84-1421c6103a16 (zonegroup-primary)
#            zone 6319ca70-964e-47be-8b96-7c5bf5a128b1 (zone-primary)
#    current time 2025-12-04T07:18:26Z
# zonegroup features enabled: resharding
#                    disabled: compress-encrypted
#   metadata sync syncing
#                 full sync: 0/64 shards
#                 incremental sync: 64/64 shards
#                 metadata is caught up with master
#       data sync source: 7b3ce7f5-7090-46f6-afb1-d1bb156053da (zone-secondary)
#                         syncing
#                         full sync: 0/128 shards
#                         incremental sync: 128/128 shards
#                         data is caught up with source

# Make the recovered zone the master and default zone
radosgw-admin zone modify --rgw-realm=<realm-name> --rgw-zonegroup=<zone-group-name> --rgw-zone=<primary-zone-name> --master

# Update the period so that the changes take effect
radosgw-admin period update --commit --rgw-realm=<realm-name> --rgw-zonegroup=<zone-group-name>

Parameters

  • <realm-name>: Realm name.
  • <zone-group-name>: Zone Group name.
  • <primary-zone-name>: Primary Zone name.