– File based DeDuplication
– Fixed-Length Segment DeDuplication
– Variable-Length Segment DeDuplication
– Post-Process DeDuplication
– In-Line DeDuplication
Data Domain System Introduction
• DeDuplicating hardware system
– Variable-length segments
– Processors and RAM
– Ethernet and Fibre Channel connections
-Low-cost SATA disk drives
-RAID 6 in software
-NVRAM used to protect unwritten data
Data Domain DeDuplication:
# Source Based DeDuplication
– Uses DD Boost with DSP(distributed segment processing)
# Target Based DeDuplication
– accessible through CIFS, NFS, and VTL protocols
Data Domain Global Compression:
# Global Compression – Equals to DeDuplication and cant be turned off
# Local Compression – Compress data segments before writing to disk, Equals to file compressions(uses algorithms – lz, gz and gzfast) and can be turned off.
Stream-Informed Segment Layout (SISL) scaling architecture:
# SISL architecture provides fast and efficient deduplication:
• 99% of duplicate data segments are identified inline in RAM before they are stored to disk.
• System throughput increases directly as CPU performance increases.
• Minimizes the disk footprint by minimizing disk access.
The Data Domain system DeDuplication – How it works:
1. Segment – Data sliced into segments
2. Fingerprint – Segments given fingerprint ID (segment ID)
3. Filter – Fingerprint IDs compared to fingerprints in cache1.If fingerprint ID new, continue2. If fingerprint ID duplicate, reference, then delete
4. Compress – Groups of new segments compressed using common technique(lz, gz, gzfast)
5. Write – Segments (including fingerprints, metadata, & logs)written to containers,containers written to disk
Data Invulnerability Architecture:
– Data Invulnerability Architecture is an important EMC Data Domain technology that provides safe and reliable storage.
– The EMC Data Domain operating system (DD OS) is built for data protection. Its elements comprise an architectural design whose goal is data invulnerability. There are four technologies within the Data Invulnerability Architecture that fight data loss:
1. End-to-end verification –
# Verify Stripe Integrity
# Verify user data Integrity
# Verify file system metadata Integrity
2. Fault avoidance and containment
# New data never overwrites good data. (The system never puts existing data at risk.)
# There are fewer complex data structures
# The system includes non-volatile RAM (NVRAM) for fast, safe restarts
3. Continuous fault detection and healing
# periodically rechecks the integrity of the RAID stripes and container logs
# uses RAID system redundancy to heal faults
# During every read, data integrity is re-verified
# Any errors are healed as they are encountered
4. File system recoverability
# is a feature that reconstructs lost or corrupted file system metadata. It includes file system check tools.
Data Domain file systems:
# The administrative file system (called ddvar) – /ddvar
# The storage file system (called Mtree) – /backup (/data/col1)
Data Domain System Protocols:
# NFS – Network file system (NFS) clients can have access to Data Domainsystem directories and Mtrees.
# CIFS – The Common Internet FileSystem (CIFS) clients can have access to Data Domain system directories and Mtrees
# VTL – The virtual tape library (VTL) protocol enables backup applications to connect to and manage Data Domain system storage as if it were a tape library. All of the functionality generally supported by a physical tape library is available with a Data Domain system configured as a VTL. The movement of data from a system configured as a VTL to a physical tape is managed by backup software (not by the Data Domain system). The VTL protocol is used with Fibre Channel networking.
# DD Boost – The DD Boost protocol enables backup servers to communicate with storage systems without the need for Data Domain systems to emulate tape. There are two components to DD Boost: one component that runs on the backup server and another component that runs on a Data Domain system
# NDMP – If the VTL communication between a backup server and a Data Domain system is trough NDMP,no Fibre Channel (FC) is required. When you use NDMP, all initiator and port functionality does not apply.
Data Domain Data Paths:
# Data Domain data paths over Fibre Channel networks – VTL
# Data Domain data paths over Ethernet networks – NFS, CIFS, DD Boost and NDMP
Data Domain administration interfaces:
# The Enterprise Manager, which is the graphical user interface (GUI)
# The command line interface (CLI) – Access CLI via SSH, serial console, telnet, keyboard & monitor
Data Domain Initial Configuration using Enterprise Manager – Configuration Wizard: (Command Line – “config setup” Command)
Configuration Wizard consists of these sections:
3. File system,
Data Domain Manage System Access:
# User Privileges: 3 Type of Classes
# Administration access:
1. Active tier
2. Usable disks
3. Failed/Foreign/Absent Disks
Foreign Disks – The foreign state indicates that the disk contains valid Data Domain file system data and alerts the administrator to the presence of this data to make sure it is attended properly. This commonly happens during chassis swaps, or when new shelves are added to an active system.
–>Link Aggregation Definition:
Using multiple Ethernet network cables, ports, and interfaces (links) in parallel, link aggregation increases network throughput, across a LAN or LANs, until the maximum computer speed is reached.
–>Link Aggregation Bonding Types:
1. Round robin – Transmits packets in sequential order from first available link through the last in the aggregated group.
2. Balanced – Data sent over the interfaces as determined by the hash method you select.
3. LACP – Similar to balanced except for the control protocol that communicates with the other end and coordinates what links, within the bond, are available. It provides heartbeat fail-over.
–>Link Failover Definition:
A virtual interface may include both physical and virtual interfaces as members (called interface group members).
# How It Works
Link failover is supported by a bonding driver on a Data Domain system. The bonding driver checks the carrier signal on the active interface every 0.9 seconds. If the carrier signal is lost, the active interface is changed to another standby interface. An address resolution protocol (ARP) is sent to indicate that the data must flow to the new interface
–>Manage VLAN and IP Alias:
VLAN and IP alias network interfaces are used:
# For network security
#To segregate network traffic
#To speed up network traffic
#To organize a network
How It Works:
If you’re not using VLANs, you can use IP aliases. IP aliases are easy to implement and are less expensive than VLAN, but they are not a true VLAN. For example, you must use one IP address for management and another IP address to back up or archive data. You can combine VLANs and IP aliases.
Snapshot location: /data/col1/backup/
ex: /data/col1/backup/austin/.snapshot ; /data/col1/backup/scla/.snapshot
where, .snapshot is a directory
# Replication don’t replicate snapshot of a volume, it has to be manually configured for replication.
A fast copy copies files and directory trees of a source directory to a target directory on a Data Domain system. You can use the fast copy operation to retrieve data stored in snapshots. Fastcopy takes space (it’s like a clone).
–>Retention Lock: – Licensed feature
– Retention lock is an optional, system-licensed software feature that enables organizations to protect their data in non-writeable and non-erasable formats for a specified length of time, up to 70 years.
Retention lock protects against:
• Accidents and user errors
• Malicious activity
– Data which has been locked using the retention lock feature makes the data non-writeable and non-erasable. Files cannot be modified even after the retention time for the file expires. The retention period of a retention-locked file can be extended but not reduced.
– In order for a file to become locked with the retention lock, the file’s access time (called “atime”) must be set to a future date that is beyond the minimum retention period configured on the Data Domain system.
– The act of setting the atime is the signal to the Data Domain system to lock the file. As soon as this value is set, the file is locked and cannot be deleted or modified before that date.
– Data sanitization is sometimes referred to as electronic shredding
– With the data sanitization function, deleted files can be overwritten using a DoD/NIST compliant algorithm and procedures
– It removes any trace of deleted files with no residual remains preventing normally deleted data from being recovered.
– 5 phases of sanitization:
– Also called inline data encryption
– Protects data on a Data Domain system from unauthorized access or accidental exposure
– Requires software license
– When data is backed-up, data enters via NFS, CIFS, VTL, DD Boost and NDMP Tape Server protocols. It is then:
• Deduplicated (or globally compressed)
• Locally compressed
Important – encryption at a more granular level is not possible. Once enabled all the incoming data will be encrypted.
File System Cleaning:
– Cleaning reclaims physical storage occupied by expired data. For example, as retention periods on backup software expire data, old backups are removed from the backup catalog. Space from expired backups becomes available only after a system cleaning process reclaims the disk space.
– When application software expires backup or archive data, they are deleted in the sense that they are no longer accessible or available for recovery from the application. The data is not deleted immediately; it is removed during a cleaning operation. In the case of retention lock, expired files will not be deleted until the retention lock period ends.
– The default time schedule for file system cleaning is every Tuesday at 6 am and The default CPU throttle is 50%.
– navigate to Data Management > File System > Configuration > Clean Schedule
Data Domain Replication:
Types of Data Domain Replication:
– Directory Replication: For partial site, single directory backup
– MTreeReplication: For partial site, point-in-time backup
– Pool Replication: In a VTL setting, specified pools of virtual cartridges are treated as a directory (Destination does not require a VTL license)
– Collection Replication: For whole system mirroring (The fastest and lightest impact replication type)
# One fundamental difference between Mtree replication and directory replication is the method used for determining what needs to be replicated between the source and destination. MTree replication creates periodic snapshots at the source and transmits the differences between two consecutive snapshots to the destination
– 1 to 1
– many to 1
– 1 to many
– cascaded 1 to many
If the source Data Domain system has a lot of data, the initial replication seeding can take some time over a slow link. To expedite the initial seeding, you can bring the destination system to the same location as the source system to use a high-speed, low-latency link. Once data is initially seeded using the high-speed network, move the system back to its intended location. As data is initially seeded, only new data is sent from that point onwards.
• An option that reduces WAN bandwidth utilization
• Useful if using a low-bandwidth network link.
• Provides additional compression
• Only for replication with <6 Mb/s available bandwidth
• Use bandwidth and network-delay settings together to calculate the proper TCP buffer size for replication
#Low Bandwidth Optimization Using Delta Compression:
– Delta compression is a global compression algorithm that is applied after identity filtering. The algorithm looks for previous similar segments using a sketch-like technique that sends only the difference between previous and new segments.
– Delta compression reduces the amount of data to be replicated over low-bandwidth WANs by eliminating the transfer of redundant data found with replicated deduplicated data. This feature is typically beneficial to remote sites with lower Data Domain models
Resynchronize Recovered Data:
Resynchronization is the process of recovering (or bringing back into sync) the data between a source and destination replication pair after a manual break in replication.
EMC DD Boost:
– EMC Data Domain Boost extends the backup optimization benefits of Data Domain deduplication storage solutions by distributing parts of the deduplication process to the backup server or application client. DD Boost dramatically increases throughput speeds, minimizes backup LAN load, and improves backup server utilization.
– In a typical backup environment using in-line deduplication, client data is sent to a Data Domain system where the data is identified in segments. These segments are identified to be unique data or duplicate segments. If they are unique, they are compressed and written to the storage subsystem on the Data Domain.
DD Boost Features:
– Centralized replication awareness and management – Backup application well aware of replication enabled on the DD end and easy recovery of data can be done from the data residing in failover node.
– Distributed segment processing (DSP)
– Advanced load balancing and failover via interface groups
DD Boost – Deduplication and Distributed Segment Processing:
1. Segment the data
2. Mark finger print for the segmented data
3. compare the finger printed segments with DD
4. Filter the unique data
5. send and write the unique data in DD
DD Boost Configuration – Symantec NetBackup
1. License as required
2. Create devices, pools through backup server management console
3. Configure backup policies and groups to use Data Domain configured devices
4. Configure duplicate to use Data Domain configured devices on desired Data Domain systems.
1. License DD Boost.
2. Enable DD Boost
3. Set a Data Domain local user as a DD Boost user.
4. Create DD Boost storage units
1. License DD Boost
2. Enable DD Boost
3. Set a Data Domain local user as a DD Boost user.
4. Create DD Boost storage units .
# NetBackupconsole: Configure Data Domain systems as disk storage servers
a.Install Data Domain OST plug-in
b.Configure disk storage servers type OST
c.Create storage lifecycle policy
# Configure Data Domain systems (A and B) for Boost
a.Enable DD Boost
c.Create storage unit and CIFS share
# NBU Console: Configure Backup Policy
a.Create a Backup Policy
b.Apply Storage Lifecycle Policy to Backup Policy
# NBU Console: Monitor Activity for Backup and Opt. Duplication
a.Start backup policy and monitor activity
b.Monitor file replication
# NBU Console: Restore files from system B
a.Restore from secondary copy
b.Verify Restored Files
#Verify Files on Data Domain systems A and B
a.Verify File Replication/Space Usage Stats
b.Validate backup files and file replication files
Data Domain System Performance Metrics:
#system show performance – Command
proc recv send idle
—- —- —- —-
proc-percent of time spent processing network requests
recv-percent of time spent receiving requests over the network
send-percent of time spent sending requests over the network
idle-percent of time waiting for network data transfers2receivesendbackup
# system show performance
CPU disk ……… ‘CDBVMSF’
92%/ 94% 57% —V—-
34%/ 36% 66% —V—-
• C – cleaning
• D – disk reconstruction
• B – currently unused
• V – verification (used in the deduplication process)
• M – fingerprint merge (used in the deduplication process)
• S – summary vector checkpoint (used in the deduplication process)
• F – currently unused
# system show stats 2
•Reduce stream count
•Don’t clean during heavy input
•Don’t replicate during heavy input
•Consider using link aggregation
•Consider implementing DD Boost
Monitor a Data Domain System:
• Support bundle
• Autosupport logs and alert messages
Autosupport logs and alert messages:
– Report the system status and identify potential system problems
– Provide daily notification of the system’s condition
– Send email notifications to specific recipients for quicker, targeted responses
– Supply critical system data to aid support case triage and management
DD Operating System Upgrade:
• Release Types
– RA, IA, and GA – Restricted availability, Initial Availability and General Availability
• There is no down-grade path
– Read all release notes before upgrading
– When in doubt, contact Support before installing an upgrade
Preparing for DDOS Upgrade:
– Are you upgrading more than two release families at a time
## 4.7 to 4.9 is considered two families
## 4.7 to 5.0 is more than two families and requires two upgrades
– Time required
## Single upgrades can take about 45 minutes or more
## During the upgrade, the Data Domain file system is unavailable
## Shutting down processes, rebooting after upgrade, and checking the upgrade all take time
## Do not disable replication on either system in the pair
## Upgrade the destination (replica) before upgrading the source (originator)
– Stop any CIFS client connections before beginning the upgradeModule
Working on VTL Configuration:
Setting Up a Virtual Tape Library:
• Enable VTL
• Create a Library
• Create Tapes
• Import Tapes
#I# Enable VTL:
1. In the More Tasks menu, select Service > Enable.
The Enable Service dialog box appears.
2. In the Enable VTL dialog box, click OK.
The Enable Service Status dialog box appears.
3. When the Enable Service Status dialog box displays
Completed, click Close.
#II# Create a Library
1. In the More Tasks menu, click Library Create.
2. Enter the VTL library information:
Library Name – Name can be from 1 to 32 alphanumeric characters.
Number of Drives – From 1 to 256 tape drives. Systems with 4 G of memory (DD4xx, DD510 and DD530) can have a maximum of 64 drives.
Systems with 8 G to 24 G (DD560to DD690) can have a maximum of 128 drives. The DD880 with 48 G of memory can have up to 256 tape drives.
Drive Model – • IBM-LTO-1
Number of Slots – Number of slots in the library:
• Up to 32,000 slots per library
• Up to 64,000 slots per system
• This should be equal or greater than the number of drives.
Number of CAPs – (Optional) Number of cartridge access ports (CAPs):
• Up to 100 CAPs per library
• Up to 2000 CAPs per system
Changer Model Name – Click the drop-down list and select the model:
Check the backup software application documentation on the Data Domain support site for the model name that you should use.
3. Click OK.
#III# Creating Tapes
The default capacities for each IBM LTO drive type are as follows:
• LTO-1 drive: 100 GB
• LTO-2 drive: 200 GB
• LTO-3 drive: 400 GB
#IV# Importing tapes
Importing moves existing tapes from the vault to a library slot, drive, or cartridge access port (CAP). The number of tapes that you can import at one time is limited by the number of empty slots in the library. (You cannot import more tapes than the number of currently empty slots.)
1. In the Tapes view, either:
a. Enter search information about the tapes to import and
2. From the Import Tapes: library view, verify the summary information and the tape list, and click Next.
3. Click Close on the status window.
Working with Access Groups:
A VTL access group (or VTL group) is created to hold a collection of initiator WWPNs or aliases and the drives and changers they are
allowed to access. As well, a default group exists named TapeServer, where you can add devices that will support NDMP-based backup applications.
Access group configuration allows initiators (in general backup applications) to read and write data to the devices that are also in
the access group.
Access groups allow clients to access only selected LUNs (media changers or virtual tape drives) on a system. A client that is set up
for an access group can access only devices that are in its access group.
Note: Avoid making access group changes on a Data Domain system during active backup or restore jobs. A change may cause an active job to fail. The impact of changes during active jobs depends on a combination of backup software and host configurations.
View Access Group Information:
• LUNs Tab – LUN, Library, Device, In-Use Ports, Primary Ports, Secondary Ports
• Initiators Tab – Initiator, WWPN
Data Domain Encryption :-
Encryption of Data at Rest or “inline data encryption”
Protects from lost/stolen, accidental expose to a lost drive, or intrusion
Requires a license
Enables data on system drives or external storage to be encrypted, while being saved and locked, before it’s moved to another location
All ingested data is encrypted
Data that exists on the Data Domain before enabling encryption is not automatically encrypted but can be later
Inline Encryption happens during the Data Domain SISL Process:
Segment>fingerprint>Deduplicate (globally compress)>Group>Locally compress>Encrypt
The following Protocols can be encrypted as data is ingested: NFS, CIFS, VTL, DDBoost and NDMP tape server
The available types of Encryption are:
128bit or 256 AES (Advanced Encryption Standard)
Or both CBC (Cipher Block Chaining) and GCM (Galios/Counter mode)
*One important thing to remember is that all data entering DD system will be encrypted; there are NO other granular levels of encryption available
The feature can be enabled on the Encryption tab in File System shows status
Also, do not forget an Encryption passphrase when locking or unlocking file system or disabling Encryption; do not lose your passphrase, this is imperative
Data Domain DD860 – Technical Specifications – Real Size – 64 TB
– Applied Backup read throughput we are getting 100 GB/Hour
Logical Capacity (Standard) 1.4 – 5.7 PB (*)(****)(*****)
Logical Capacity (Redundant) 7.1 – 28.5 PB(**)(****)(*****)
Max. Throughput (Other) 5.1 TB/hr (Maximum throughput achieved using Symantec OpenStorage and 10 Gb Ethernet)
Max. Throughput (DD Boost) 9.8 TB/hr (***)
Power Dissipation 608 W
Cooling Requirement 2 075 BTU/hr
Data Domain DD7200 – Technical Specifications – Real Size – 96 TB
Capacity (Raw) Max. Usabe: 428 TB;
Max. Usabe w/ DD Extended Retention: 1.7 PB
Logical Capacity (Standard) 4.2 – 21.4 PB (*)(**)
Logical Capacity (Redundant) w/ DD Extended Retention: 17.1 – 85.6 PB (*)(**)
Max. Throughput (Other) 11.9 TB/hr (Maximum throughput achieved using NFS and 10 Gb Ethernet) (**)
Max. Throughput (DD Boost) 26.0 TB/hr (Maximum throughput achieved using DD Boost and 10 Gb Ethernet)