Configure-DedupForMe
You’ve got Windows deduplication enabled on all your Configuration Manager Distibution Points, and your Content Library right? Awesome! You can stop reading here. Well, maybe don’t stop reading quite yet. I have a pretty cool Configuration Item / Baseline below.
I personally started considering deduplication to complement an implementation of BranchCache. They work well together! Deduplication enabled on your distribution point will ensure that the file hashes pre-calculated. BranchCache clients will request these hashes to validate the content they receive from peers. By having the hashes for all content readily available because of deduplication it will reduce server load from BranchCache file hash requests.
So? It is Single-Instance Storage, who cares about deduplication?
The content library in Configuration Manager is relatively robust. It has a few tricks up it’s sleeve including single-instance storage. This feature is even mentioned in the first line of the docs.
“The content library is a single-instance store of content in Configuration Manager. The site uses it to reduce the overall size of the combined body of content that you distribute.”
We are only keeping one copy of the files! That sounds great! Who needs deduplication? We don’t have duplicates! Deduplication has some additional benefits that will enhance the storage savings already built into the content library. It is able to inspect the volume for duplicate blocks and bits. The Microsoft docs gives a good overview of the technology, AND the FAQ section specifically addresses the improvements from Single Instance Store.
How does Data Deduplication differ from Single Instance Store?
Single Instance Store, or SIS, is a technology that preceded Data Deduplication and was first introduced in Windows Storage Server 2008 R2. To optimize a volume, Single Instance Store identified files that were completely identical and replaced them with logical links to a single copy of a file that’s stored in the SIS common store. Unlike Single Instance Store, Data Deduplication can get space savings from files that are not identical but share many common patterns and from files that themselves contain many repeated patterns. Single Instance Store deprecated in Windows Server 2012 R2 and removed in Windows Server 2016 in favor of Data Deduplication.
I’m Convinced, Let’s Implement
Ultimately, here is the link to the script on GitHub. It has a $Remediate variable which set to $true or $false. For the ‘Detection’ part of the configuration item you will set $Remediate = $false, and for the ‘Remediation’ part of the configuration item you will set $Remediate = $true.
The goal of the script is to determine what drives on the system are being used for SCCM content, whether it is a PKG share or a Content Library. Once they identified a list of exclusion folders generated. Only the SMSPKGE$ (E being the drive letter) and SCCMContentLib folders are supported for deduplication with Configuration Manager. It will also ensure that ‘No_SMS_On_Drive.sms’ file does not exist on the drive, as these drives will be skipped.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
switch ($No_SMS_Exists) { | |
$false { | |
Write-CMLogEntry –Value "Found that the 'No_SMS_On_Drive.sms' does not exist on $DrivePath – will check for DP folders." | |
$SMS_PackageShareFolder = [string]::Format('SMSPKG{0}$', $Volume.DriveLetter) | |
$SMS_PackageShareFolderPath = Get-ChildItem –Path $DrivePath –Filter $SMS_PackageShareFolder | |
if ($null -ne $SMS_PackageShareFolderPath) { | |
Write-CMLogEntry –Value "Adding $($SMS_PackageShareFolderPath.FullName) to inclusion list for $DrivePath" | |
$Include[$SMS_PackageShareFolderPath.FullName] = $true | |
} | |
$SCCMContentLibFolderPath = Get-ChildItem –Path $DrivePath –Filter 'SCCMContentLib' | |
if ($null -ne $SCCMContentLibFolderPath) { | |
Write-CMLogEntry –Value "Adding $($SCCMContentLibFolderPath.FullName) to inclusion list for $DrivePath" | |
$Include[$SCCMContentLibFolderPath.FullName] = $true | |
} | |
$IncludeCSV = Join-Path –Path $DrivePath –ChildPath $AdditionalIncludes | |
if (Test-Path –Path $IncludeCSV) { | |
Write-CMLogEntry –Value "CSV found for processing additional includes for deduplication – $IncludeCSV" | |
$ExplicitIncludes = Import-Csv –Path $IncludeCSV | |
foreach ($Inclusion in $ExplicitIncludes.Include) { | |
$Inclusion = [string]::Format('{0}\{1}', $DrivePath, $Inclusion) | |
Write-CMLogEntry –Value "$Inclusion added for processing" | |
$Include[$Inclusion] = $true | |
} | |
} |
With our ‘Inclusion’ folders identified for the drive we can generate what we will need for deduplication configuration, the exclusions list.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
if ($Include -ne @{ }) { | |
Write-CMLogEntry –Value "Marking folders [$($Include.Keys -join '; ')] for inclusion in dedplucation – will process $DrivePath" | |
$AllFolders = Get-ChildItem –Path $DrivePath –Directory | |
$Exclude = $AllFolders.FullName | Where-Object { $_ -notin $Include.Keys } | |
$Excludes = $Exclude -replace $DrivePath |
Now that the list generated we can set the configuration and we are all set!
$DedupVolume | Set-DedupVolume -ExcludeFolder $Excludes
The advantage of this being a baseline is that we can periodically check the configuration to automatically add or remove exclusions to the drive. And if new drives added to the server they will automatically configured for deduplication as well! No need to manually identify your content library drives and setup deduplication.
If you enable logging you will get a good summary of the detection and remediation that has happened as shown above.
There is a bit more to the script, such as enabling dedup on the drive if disabled. But the highlights above show some of the key components. You can also configure a few things. Such as the MinimumFileAgeDays and logging in the variables region at the top of the script. If you know you have additional folders that support deduplication, the script has a variable to store a CSV name in. Folder names found in that CSV are added for deduplication on the respective volume.
The script is available here. I would recommend setting this up as a Configuration Item. So that it can continually monitor and configure deduplication for you on your distribution points. The scripts will return a boolean $true / $false to allow for proper baseline compliance reporting. You will also need to make sure you have the deduplication windows feature enabled. This is an easy to create baseline that you could setup alongside this.
The GitHub has a CAB file as well. This is an exported baseline that contains the script being used in a CI. As well as an additional CI for installing the FS-Data-Deduplication Windows feature. You should be able to export this and use it in your environment. Keep in mind that enabling deduplication on a server does require a reboot, and I am not forcing it with the CI.
If you have any questions, run into any issues. Or need some help implementing feel free to hit me up on Twitter!
Add comment