深入剖析 S2D Storage Pool ~ 不自量力の Weithenn

前言

自從 Windows Server 2016 推出後，大家對於 S2D (Storage Spaces Direct) 軟體定義儲存技術一直非常有興趣。

本文，將要深入探討在 S2D 軟體定義儲存運作環境中「Storage Pool」的功能改進：

每個 Storage Pool 最大支援「416 Drives」及「1 PB Raw Capacity」，同時最佳建議是「One pool per Cluster」。
執行「Enable-ClusterS2D」指令啟用 S2D 功能時，會自動建立 Storage Pool 並且把「合格的儲存裝置」(例如，未格式化過的磁碟) 加入到 Storage Pool 當中。同時，當加入新的 S2D 儲存節點時 (Scale-Out) 也會自動把相關的儲存裝置加入到 Storage Pool 當中，當儲存裝置發生故障時會自動退出並從 Storage Pool 當中移除。
透過 S2D PM「Cosmos Darwin」撰寫的 PowerShell，可以輕鬆幫助你了解 Storage Pool 及儲存裝置空間的使用情況。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

混亂的開始 (Resiliency, Slabs, Striping)

讓我們先從 3 Nodes 的 S2D 運作架構，每台 S2D Node 具備「2 x 800GB NVMe」以及「4 x 2TB SATA SSD」開始談起。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

什麼是 Resiliency？

舉例來說，當我們建立「1 TB - 2WayMirror」的 Volume 時，這表示在「不同 Server、不同 Drives」必須維護共「2 份」副本資料，以便 1 份資料損壞時另 1 份仍然可用。所以，雖然建立 1TB Volume 但實際會佔用 2 TB 空間，這樣的行為稱之「Footprint on the Pool」。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

什麼是 Slabs？

在建立的 1 TB Volume 儲存空間中，系統將會把它拆解成眾多的「Slab」並且每個 Slab 大小為「256 MB」(在 Windows Server 2016 技術預覽版本時稱為 Extent 大小為 1024 MB) 。因此，倘若建立 1 TB Volume 儲存空間的話便會拆解成「4,000 Slabs」。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

什麼是 Striping？

當我們採用 2-Way Mirror 的方式建立 Volume 並且拆解成眾多 Slab 之後，「每個 Slab」將會建立「2 份副本」，然後透過 S2D 演算法分別存放在「不同 Server, 不同 Drives」當中。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

因此，原則上眾多 Slabs 將會「平均且分散」的儲存到不同 Server 不同 Drives 當中。同時，預設情況下 Cluster 當中至少會有 5 個 Drive 儲存「Pool Metadata」以便後續「同步」(Synchronized)及「修復」(Repaired)等作業。此外，即便儲存的 Pool Metadata 都遺失，也不會影響 S2D 的部署及運作。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

S2D 這樣設計的目的為何？

效能考量

將 Volume 拆解成眾多 Slab 然後平均分散放置到多個儲存媒體時，這樣可以提供「資料讀寫」的效益同時可以使用多個儲存媒體，以期「最大化 IOPS & I/O Throughput」。因此，S2D 的 2-Way Mirror 機制與傳統的 RAID-1 並不相同。

提升資料安全性

當「儲存裝置」發生故障時，原有儲存的資料副本將會在其它位置進行重建，這個行為稱之為「修復」(Repairing) 並執行下列 2 項操作步驟：

修復步驟1、從「存活」的資料副本中讀取 (讀取單位為 Bytes)：

想像一下，假設 S2D 沒有把 Volume 拆解成眾多 Slab 的話，那麼當 1TB Volume 副本所在的硬碟損壞時，採用的若是機械式硬碟要進行上述修復程序的第 1 項時，光是讀取 1TB Bytes 資料這個動作，可想而知整個資料重建程序將會非常緩慢!!
回到 S2D 的運作架構，有顆機械式硬碟損壞裡面存放一堆 Slab，但是這些 Slab 存活的資料副本是分佈在「不同機械式硬碟」中，有的可能在 Drive03、Drive15、Drive17……等。
重點就是，這個重建的動作可以同時使用到「多顆」機械式硬碟而非「單顆」機械式硬碟!!

修復步驟2、在其它位置寫入「新的」資料副本，以取代遺失的資料副本：

原則上，只要哪裡有儲存空間就往哪裡寫。但是，機械式硬碟損壞的那台伺服器就會不再放置重新的資料副本。
簡單來說，透過 Volume 打散成 Slab 的機制，不但提升資料安全性讓故障事件發生時「Resync」的等待時間縮短。

考量 Reserve Capacity

簡單來說，在你所建立的 Storage Pool 當中，應該要「預留」一些額外的儲存空間而不要都用盡!! 舉例來說，當機械式硬碟故障進行資料副本的修復程序時，才有緩衝的儲存空間可供使用。這個概念有點類似 HDD Hot Spare 但在實際運作上又有點不同。

強烈建議要預留空間，但就算沒有預留儲存空間 S2D 還是能正常運作，只是當「故障事件」發生時倘若沒有預設儲存空間的話將會影響修復速度。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

自動建立儲存集區及重新負載平衡

當 S2D 叢集當中加入新的 S2D Node 時，會自動將 Slab 資料副本自動負載平衡到新的 S2D Node 當中，這個動作稱之為「最佳化」(Optimizing)或「重新平衡」(Re-Balancing)。例如，在本文 S2D Cluster 中原本只有 3 Nodes 現在加入第 4 台 Node，只要鍵入「Add-ClusterNode -Name <Name>」PowerShell 指令即可。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

順利將第 4 台 Node 加入後，「一開始」該 Node 的儲存裝置的使用空間還是空的。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

預設情況下，經過「30 分鐘後」 S2D 將會「自動」執行「重新平衡」(Re-Balancing)的動作，但隨著運作規模大小將會需要不同的時間 (例如，幾小時)。倘若，你不希望等待 S2D 自動執行的話，可以手動執行「Optimize-StoragePool -FriendlyName "S2D*"」的 PowerShell 指令讓 S2D 立即執行重新平衡的動作，並且透過「Get-StorageJob」查看重新平衡的工作進度。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

因此，當重新平衡作業執行完畢後，可以看到前 3 台 Node 的使用空間「下降」，因為平均且分散放置到第 4 台 Node 的儲存空間。

圖片來源： Windows Server Storage Team Blog - Deep Dive: The Storage Pool in Storage Spaces Direct

小結

建立 S2D 的 Storage Pool 之後，無須再針對 Storage Pool 進行調整 (例如，新增 HDD、刪除 HDD……等)或額外再建立 Pool。
建立的 Volume 會拆解成眾多 Slabs，然後平均分散到不同 Server / 不同 Drives。
最好在 S2D Storage Pool 當中「預留」儲存空間，以便故障事件發生時能夠更快的進行修復作業。
當 S2D 建立的 Storage Pool 加入新的 Node 時，會「自動」進行重新平衡的動作，保持資料副本平均分散到不同 Server / 不同 Drives 的原則。
你可以透過「Get-StoragePool S2D*」、「Get-StoragePool S2D* | Get-PhysicalDisk」查看 S2D Storage Pool 相關資訊。