The system show performance command displays information about the average compression rates of data under the heading “Compression”, as in the following example:
Note that gcomp is Global Compression (also known as de-duplication), and lcomp is Local Compression (which is traditional file-based compression, such as gzip).
Consistently low compression values indicate data with poor compression rates being written to the Data Domain system. Poorly-compressible data can cause space shortages and performance issue
This article lists steps on how to determine compression ratio and lists factors that can negatively impact compression performance. Not getting expected disk usage and possible slow backup performance can be experienced due to poor compression.
· not getting expected disk usage
· disk is full
· poor compression
· running out of disk space
· backups are slow
· All Data Domain Systems
· All Software Releases
Compression is achieved by removing redundant data. This works by splitting file at variable sized boundaries called segments, and each segment is checked for a duplicate and stored only if it is unique. As it is stored, it is compressed. An average compression ratio is 20:1 but actual numbers depend on the data and how the backup is done.
There are two types of compression.
· Global compression (aka dedupe), which is the process by which we remove redundant data.
· Local compression, which is the process of reducing the amount of space taken by data by using algorithms like gzip.
Identifying Compression ratios using the Command Line Interface(CLI)
There are several different tools for looking at compression, each with their advantages and disadvantages.
filesys show space
1. Display disk usage information using "filesys show space" Access the Data Domain system using the CLI 180649 at the command prompt type:
Looking at the total space written to the Data Domain System divided by the total space used will give an absolute compression ratio. This is accurate at the time it is run, but it does not differentiate the Global and Local Compression rates. This number can be affected by the difference in time when a file(s) is deleted and when a cleaning is completed.
146.3 / 24.1 = effective compression ratio of 6X.
1. Display compression information using "filesys show compression" at the commmand prompt type:
filesys show compression
This utility will show the Global and Local Compression ratios at the time the files were created, but becomes outdated as other files are deleted. You can specify a specific subdirectory, and also a time frame to help identify how your compression ratios are changing.
Overall compression = 4.1X
Global Compression = 114,273,877,789 / 40,483,781,960 = 2.85X
Local Compression = 40,483,781,960 / 27,777,975,850 = 1.48X
1. Display compression information using "system show performance" at the commmand prompt type:
system show performance
This utility will show the Global and Local compression as data is written This output can also be found at the bottom of daily Autosupport Email 180561.
Identifying Compression ratios using the Enterprise Manager(EM) a web based Graphical User Interface(GUI)
1. There are several different graphs within EM, the quickest compression view is located as soon as you login to EM 180649.
2. Select the system you’d like to view.
3. Click on the “Data Management” tab.
Causes of poor compression
· Exchange mailboxes. In contrast, the Exchange database or "information store" does compress well.
· Oracle archive logs. In general, Oracle logs get good local compression but no (1x) global compression because they contain all new data.
· Oracle/RMAN backups. These require some tuning of RMAN parameters. The need might become apparent when upgrading from Oracle 9i to 10g.
· Databases with high change rate, e.g., an active column that changes frequently.
· Informix/Onbar backups.
· Code workspaces
· Pre-compressed or pre-encrypted data
Some backup applications can compress and/or encrypt the data before writing it to the storage system. Enabling this feature hurts compression on the restorer in two ways. First, such data is not locally compressible. Second, and more importantly, such data does not compress well globally. This happens because pre-compression/encryption spreads small changes in the input data to widespread differences in the output data. This is especially true of encryption in the chained block mode. Regardless, even with pre-compression, the data ends up taking more physical space on the restorer than it would if the data was not pre-compressed.
· Tape markers not set
Some backup applications insert periodic markers into the backup stream—even when they write to a disk-type device instead of tape. The relative positions and values of these markers can vary from backup to backup, which hurts global compression. The following backup applications are known to insert frequent tape markers. The tape marker algorithm to use on the restorer can be set for specific applications, but in most cases “auto” will work just fine. To see what tape marker is set on the restorer use the command “filesys option show marker-type”. Here is a list of the available application tape-markers:
o Commvault (cv1)
o TSM (tsm1 for small-endian, tsm2 for big-endian e.g. SPARC)
o ETI Bakbox for Tandem (eti1)
o HP Data Protector (hpdp1)
o Legato Networker for VTL backups only (nw1)
o Syncsort (ssrt1)
· Multiplexing in backup software
When backing up to real tape, the backup application is usually configured to multiplex multiple input streams into a single output stream per tape drive. There are two reasons for this multiplexing. First, one cannot write multiple streams to a single tape drive at once. Second, multiple input streams may be needed to feed the tape drive with a data rate that is high enough to avoid frequent starts and stops, a.k.a., "shoe shining". When backing up to disk or virtual tape, multiplexing is not necessary because disk-based systems support multiple streams at variable speeds. In fact, multiplexing is undesirable. It makes restores slower. Also, it can degrade compression, because different streams might be multiplexed differently from backup to backup. The smaller the granularity of multiplexing, the more the potential degradation.
· Application data types known not to compress well.
Media datatypes such as PDF documents, audio, images, and video are already encoded compactly and do not get any further local compression on the restorer. They will benefit from global compression if the media files are backed up multiple times, if there are multiple copies of the same media file (perhaps with small changes), or if a media file is edited over time and is archived after each edit.
· Other factors that can affect compression.
o High change rate of data (more than 10%)
o Small files
o Non sequential writes to the DDR (some NFS clients)
· Likely causes of low compression
|Global Compression||Local Compression||Likely Cause|
|Low (2x – 4x)||Low (1x – 1.5x)||Precompressed or encrypted data|
|Low (1x – 2x)||High (>2x)||Unique but compressible data, e.g. database archive logs|
|Low (2x – 5x)||High (>1.5x)||Markers not detected or high data change rate or stream multiplexing|
|High (>10x)||Low (<1.5x)||Backups of the same compressed and/or encrypted data. This is uncommon|