February 17, 2022

Insert Next Disk

Making Ceph failed disk replacement seamless #

with Sebastian Wagner (Red Hat), Paul Cuzner (Red Hat), and Ernesto Puerta Treceno (Red Hat)

Red Hat Ceph Storage 5 introduces cephadm, a new integrated control plane that is part of the storage system itself, and enjoys a complete understanding of the cluster’s current state — something that external tools could not quite achieve as well because of their external nature. Among its many advantages, cephadm unified control of the state of a storage cluster significantly simplifies operations.

Replacing failed drives made easy #

For example, the older process to replace drives in ceph-ansible required multiple steps and running processes enforcing configuration on all nodes when what was desired was updating only one node’s configuration. Managing around drive encryption would at times involve further complexity.

New ways: replacing a failed drive with cephadm #

When a drive eventually fails, the OSD of that drive needs to be removed from the cluster. This command removes the OSD from a cephadm-managed cluster:

ceph orch osd rm <svc_id(s)> --replace

This command evacuates remaining placement groups from the cluster and marks the OSD as scheduled for replacement while keeping this OSD in the CRUSH hierarchy.

On supported hardware enclosures, the system can also blink the drive’s LED to help the administrator locate the specific disk that failed:

ceph device light on|off <devid>

Where is a device id that can be obtained by the command

ceph device ls

If the OSD was created by cephadm, recreating the OSD will be done automatically as soon as a new drive gets inserted. cephadm is aware of the at-rest disk encryption setup if one is present, and will transparently negotiate with the monitors to use the appropriate keys when encrypting a new drive. That’s it. The replacement process is complete.

If OSD was created manually or by ceph-ansible, cephadm needs to be told how to recreate that OSD by applying an OSD specification like the following:

service_type: osd
service_id: osd
placement:
  hosts: 
  - myhost
data_devices:
  paths:
  - /path/to/the/device

But that is not the entire story. The same process can also be managed from the management UI in interactive, step-by step fashion.

Replacing a Failed OSDs from the Dashboard #

A failed OSDs in a Ceph Storage cluster can also be replaced by a junior administrator with appropriate role-based access control (RBAC) permissions on the Dashboard. OSD IDs can be preserved while replacing the failed OSDs, which is both operationally easier to manage (by having a fixed set of ID assigned to each host) and optimizes memory usage (OSD ID gaps are undesirable).

The Cluster administrator can thus use the Dashboard’s RBAC capabilities to delegate a trainee to replace failed drives, without delegating additional permissions that the junior administrator is not yet qualified to operate, as detailed in the following short video.

Comments? Discuss on Hacker News.

Cross-posted to the Red Hat Blog.

Kudos

Insert Next Disk

Making Ceph failed disk replacement seamless #

Replacing failed drives made easy #

New ways: replacing a failed drive with cephadm #

Replacing a Failed OSDs from the Dashboard #

Now read this

Ceph Block Performance Monitoring