Some Best Practices in Data Domain Environment

EMC strong ly recommends turning off multiplexing in NetWorker so that backup data from multiple sources is not interleaved on the virtual tapes because this will significantly impact deduplication ratios.

Keep in mind, however, that turning off multiplexing will adversely affect performance unless the environment is adjusted to compensate as described next.

1. Edit the properties of each virtual device in NetWorker to set both the target sessions and max sessions to 1. Note that in some NetWorker versions, setting them both to 1 at the same moment does not actually change the value of both settings. It is therefore necessary to set the target sessions to 1 first, then come back and change the max sessions to 1.

2. In the NSR resource, increase the nsrmmd polling interval to at least 15 and the restart interval to at least 10.

3. Re-examine schedules so that the number of active sessions in NetWorker is always approximately 10 percent over the aggregate target sessions.

4. Add more virtual devices gradually to the existing environment. Do not assume that the storage node and server infrastructure can su pport significantly more devices. Monitor your environment and confirm that the storage nodes can handle the load.

VTL Best Practices Guide

This Best Practice guidelines to help ensure optimal performance and of the Data Domain Virtual Tape Library (VTL) in backup environments and also to ensure ease of supporting and maintaining the product.

APPLIES TO

· All Data Domain systems

· All Software Releases supporting VTL

· VTL Protocol

· Third Party Backup Application ("BA") such as Networker, TSM, etc.

SOLUTION

1. Basic Guidelines to avoid poor performance:

1. It is critical that you ensure that a VTL qualifier has been completed for your installation, and verified for proper operation. Use of unsupported HBAs, drivers, etc are a common source of problems.

2. Attempt to keep the Data Domain system less than 85% full. Filesystem Cleaning and other operations will be faster and more efficient when the system has enough available disk to perform these important tasks.

3. Try to schedule Filesystem Clean (also known as Garbage Collection or filesystem cleaning) to run during times when active backups are not running.

4. The default Filesystem Clean schedule is sufficient in the vast majority of environments. Consult the document "Scheduling Cleaning on a Data Domain System: Best Practices180572" to better understand this process. If you still feel there is reason to alter the default schedule to have Filesystem Clean run more often, please contact Data Domain Support to discuss.

5. Don’t schedule Replication to overlap 181888 with your active VTL backup window. Both processes require substantial resources and will complete faster if run separately rather than concurrently.

6. Never use encryption 181558, multiplexing 181912, or pre-compression 181558 or client-side deduplication from the client BA (ie. Networker, TSM), as they will greatly reduce the compression factor obtained on the Data Domain system. Perform these activities on the Data Domain system only. Some backup applications turn on these features by default (ie. HP Data Protector defaults to multiplexing), so please check that all of these are turned OFF for your application.

7. Although the Data Domain system may offer higher limits in configuration options (number of streams, throttling, replication, etc), use of more moderate configurations will often offer the best overall performance.

8. Be certain that you read and understand all alerts from the Data Domain system. If you don’t understand an alert, please call support for clarification.

9. Don’t just use the default pool for everything. Create at least one other pool and create all tapes in the pool(s) you created. If you currently use replication (or may in the future), it is important to create and use between 5 and 10 replication contexts (ie. a VTL pool) for improved performance.

10. Consult the appropriate compatibility matrix to ensure that your specific components are compatibile with VTL.

2. VTL Components

1. Initiators:

1. It is required that the FC initiator port must be dedicated to only Data Domain VTL devices.

2. Only initiators that need to communicate with a particular set of VTL target ports on a Data Domain system should be zoned with that Data Domain system.

3. Create a useful alias for every initiator you zone and connect to the Data Domain system, preferably including the hostname and port in the alias name.

4. Use only 1-to-1 zoning; create zones on your Fibre Channel switch composed of only one initiator and one target per zone.

2. Slots – The number of slots/drives a library should have will be dictated by how many simultaneous backup and restore streams will be running. Drive counts will also be constrained by the configuration and overall performance limits of your particular Data Domain system. Slot counts will typically be based on how many tapes will be used over a retention policy cycle.

3. CAPs

· Refer to the Data Domain Integration documentation for your specific Backup Application to determine if CAPs (Cartridge Access Points) need to be emulated for your particular environment. See "References" section for links.

4. Changer:

· There can be only one changer per VTL.

· In many cases, the changer model you should select depends on your specific configuration:

§ Use the RESTORER-L180 library emulation when using Symantec Backup software

§ Use the TS3500 library emulation when using IBM System i platform

§ You can also use the TS3500 library emulation when using TSM 6.2 on AIX 6.1 and AIX 5.3 platforms.

§ Most other installations should use the L180 library emulation (non-Symantec, non-IBM system i)

5. Tape Drive

1. auto-offline: If a tape is loaded, the drive is online. In this state, the changer is unable to move a tape from the drive without first unloading the tape. However, if Auto-Offline is enabled, there is an implicit drive unload and so the tape can be moved from the drive even if an Unload command has not been issued by the application. This setting can be useful for certain applications, and is global across the VTL service (one setting for all drives).

2. auto-eject: If a tape is moved from a drive or slot to a CAP it goes directly to the vault. This setting can be useful for those applications that check to see that tapes have been removed from the CAP. They will fail their library "eject" operation if tapes are still in the CAP after a time delay. Auto-eject makes these applications happy because tapes disappear immediately from the CAPs. This setting also is global across the VTL service (one setting for all drives)

3. It is best to utilize only one type of tape drive per library.

6. Target HBAs

1. Consider spreading the backup load across multiple FC ports on the Data Domain system in order to avoid bottlenecks on a single port.

2. Verify the speed of each FC port on the switch to confirm that the port is configured for the desired rate.

3. Set secondary ports to "none" unless explicitly necessary for your particular configuration.

4. Configure the host operating system driver for LUN persistent binding. Doing so avoids situations where, because of target changes, backup software or the operating system needs to be reconfigured. (From "Integrating the Data Domain System VTL with a Storage Area Network", page 8),

· VTL Operation

1. Slots: Create sufficient slots to contain the number of tapes you have created. Creating a few extra slots is generally not a problem, as long as it is not an excessive quantity.

2. CAPs

3. Tapes

1. Create only as many tapes as needed to satisfy your backup requirements. Starting tape count is generally less than twice the disk space available on the Restorer. Creating too many virtual tapes can create a scenario 181743 where the Data Domain system can fill up prematurely and cause an unexpected system outage. As the global compression statistics become available, additional tapes can be added incrementally.

2. If the system does reach 100% full you will need to delete any empty tapes that may exist on the system, then expire enough data to get the system below 80% capacity. In order to avoid this time-consuming task, it is important to prevent a system-full event 180487 from occurring.

3. On a replication destination system, never read from a tape that is currently being replicated.

4. Always use unique tape barcodes, even across different pools.

5. Always use the same tape suffix (size) across all pools. If for some reason you must use a different suffix, at a minimum you should keep the same suffix within a pool.

6. Optimal size of tapes will depend on multiple factors, including the specific BA being used, and the characteristics of the data being backed-up. In general, it’s better to use a larger number of smaller tapes than a smaller number of big tapes, in order to control disk usage and prevent system full conditions.

7. For TSM, it is recommended to use smaller tapes (ie. 30-50G) in order to help reclaim space more quickly.

4. Back-up Applications

1. Ensure that you are using the largest optimal block size for your BA for maximum performance operating with the Data Domain system. Note that the optimal number will depend on many factors, such as disk speed, OS caching, as well as your specific backup software. Please see your vendor’s recommendations, as well as the Integration Guides in the References section at the end of this article.

2. In general, a tape block size that is a multiple of 64K will offer better performance, but be sure to check the Best Practices or Integration guides for your specific software (see below for links). If accessing the Data Domain device with multiple backup servers, use the largest block size accessible by all servers in the environment (especially in a heterogeneous OS environment).

3. See Reference section below for links to Best Practices or Integration guides for your Backup Application.

· Access Groups

1. Numbering of devices within each discrete VTL access group should begin with LUN 0.

2. It is best to avoid changing VTL access group configuration while the Data Domain system is under heavy load.

3. It is recommended that you have exactly one initiator per access group.

· Statistics

1. When using the VTL filemark cache statistics, the reset of the statistics should be done prior to tapes being loaded into the drives. If the reset of the statistics is performed after the tapes are loaded and accessed in the tape drives, the vtl show detailed-stats command may be misleading. The report may display the "free" counts to be greater than the "alloc" counts, which is unexpected but harmless in this case. This is caused by resetting the statistics of the drives while they are in use; the reset of the statistics is not an atomic operation.

2. As a general best practice, the statistics should be reset prior to tapes being loaded into the drives.
REFERENCE

From <https://emc–c.na5.visual.force.com/apex/KB_HowTo?id=kA0700000004RuP>

Data Domain recommends running a clean operation after the first full backup to a Data Domain System. The initial local compression on a full backup is generally a factor of 1.5 to 2.5. An immediate clean operation gives additional compression by another factor of 1.15 to 1.2 and reclaims a corresponding amount of disk space.

A default schedule runs the clean operation every Tuesday at 6 a.m. (tue 0600) ) with 50% throttle.

To increase file system availability, and if the Data Domain System is not short on disk space, consider changing the schedule to clean less often.

Issues that can affect the cleaning process:

· If system is filling up, changing default values to more frequent or aggressive cleaning cycles should not be used to compensate this. Running cleaning every day will fragment the data. E.g. read speeds can be severely impaired. Global compression algorithm is dependent on good locality during writes so too frequent clean cycle will in addition bring de-duplication numbers down.

· Cleaning is a filesystem operation that will impact overall filesystem performance while it is running. Changing cleaning throttle higher from default of 50 will have impact performance during active cleaning cycle as the cleaning process will consume more resources.

· Changing the local compression algorithm will cause following cleaning cycle to run significantly longer as all existing data needs to be read, uncompressed and compressed again.

· Any operation that shuts down the Data Domain System filesystem or powers off the device (a system power-off , reboot or filesystem disable- command) stops the clean operation. The clean does notautomatically continue when the system and file system starts again.

· Replication between Data Domain systems can affect filesys clean operations. If a source Data Domain system receives large amounts of new or changed data while disabled or disconnected, resuming replication may significantly slow down filesys clean operations.

· If the directory replication is running behind e.g. due insufficient network bandwidth between the replication pairs (resulting to a replication lag) cleaning may not be able to run fully. This condition requires either replication break (and resync once cleaning has ran) or replication lag to catch up (e.g. increasing network link or writing less new data to source directory).

A Data Domain system that is full may need multiple clean operations to clean 100% of the file system, especially when one or more external shelves are attached. Depending on the type of data stored, such as when using markers for specific backup software (filesys option set marker-type … ), the file system may never report 100% cleaned. The total space cleaned may always be a few percentage points less than 100.

With collection replication, the clean operation does not run on the destination. With directory replication, the clean operation needs to be run on both the source and destination Data Domain systems.

To display the current date and time for the clean operation, use the filesys clean show schedule operation.

# filesys clean show schedule

To display the throttle setting for cleaning operations, use the filesys clean show throttle operation. Changes to the throttle setting will take effect without restarting cleaning.

filesys clean show throttle

From <https://emc–c.na5.visual.force.com/apex/KB_HowTo?id=kA0700000004Ru6>

NFS Best Practices for DataDomain

Purpose

This document provides information on a set of recommended best practices when deploying Data Domain storage systems for data protection and archive with the NFS (Network File System) protocol. This paper will provide insights on how to best tune NFS network components in order to optimize the NFS services on a Data Domain system. Best practices on configuring the NFS server, application clients, network connectivity and security.

Applies To

· All DDOS Models

· All DDOS Versions

· AIX

· Linux

· Solaris

Solution

Introduction

· Network file system (NFS) protocol originally developed by Sun Microsystems in 1983 is a client-server protocol, allowing a user on a client computer to access files over a network from the server as easily as if the network devices were attached to its local disks.

· NFS allows UNIX systems to share files, directories and files systems. The protocol is designed to be stateless.

· Data Domain support NFS version 3 (NFSv3) the most commonly used version. It uses UDP or TCP and is based on the stateless protocol design. It includes some new features, such as a 64-bit file size, asynchronous writes and additional file attributes to reduce re-fetching.

Configuration:

DataDomain:

· Make sure NFS service is running on the DD– #nfs status

· It is best practice to use hostname of the client while creating the nfs export.

· Please make sure all the forward and reverse lookups of the hostnames in the nfs export list are correct in the DNS server. #net lookup <hostname>

· To add more than one client use a comma or space or both.

· A client can be a fully-qualified domain hostname, class-C IP addresses, IP addresses with either netmasks or length, or an asterisk (*) wildcard with a domain name, such as *.yourcompany.com. An asterisk (*) by itself means no restrictions.

· It is a best practice to keep the client entries in the nfs export list to a manageable state. This can be achieved by using acl entries like *.company.com, network address.

· A client added to a sub-directory under /backup has access only to that sub-directory.

· The <nfs-options> are a comma-separated or space-separated or both list bounded by parentheses. With no options specified, the default options are rw, root_squash, no_all_squash, and secure.

The following options are allowed:

Following is the example of adding the nfs export client

· #nfs add /backup 192.168.29.30/24 (rw,no_root_squash,no_all_squash,secure)

· DD doesn’t support NFS locking. Please look at the NFS locking KB for detailed information.

· It is always a best practice to use multiple NFS mounts on the client with multiple IPs on Data Domain for better performance.

· For ex. TSM data is backed to /backup/TSM directory and SQL to /backup/SQL

· Create separate NFS export for /backup/TSM and /backup/SQL

#nfs add /backup/TSM hostname.company.com (rw,no_root_squash,no_all_squash,secure)

#nfs add /backup/SQL hostname.company.com (rw,no_root_squash,no_all_squash,secure)

· On the client mount them as two different mount points using two different IPs of DD.

#mount -F nfs -o hard,intr,llock,vers=3,proto=tcp,timeo=1200,rsize=1048600,wsize=1048600 <Ipaddress_1_of_DD>:/backup/TSM /ddr/TSM

#mount -F nfs -o hard,intr,llock,vers=3,proto=tcp,timeo=1200,rsize=1048600,wsize=1048600 <<Ipaddress_2_of_DD>:/backup/SQL /ddr/SQL

Security

· Port 2049 (NFS) and Port 2052 (mountd) must be open on the firewall if any existing.

· It is a best practice for better security use the “root_squash” nfs export option while configuring the export.

· Along with root_squash option set the anongid and anonuid to a specific ID on DD.

For example:

#nfs add /backup/TSM hostname.company.com (rw,root_squash,no_all_squash,secure,anongid=655,anonuid=655)

Configuration

· Please make sure nfsd daemon is running on the client.

· #/sbin/service nfs status

· It is a best practice to create a separate MTREE for each media/Database server. This will help in performing less number of Meta data operations.

· Check for stable writes. Run tcpdump when a backup is performed and check for stable flag in write packet. If clients are sending stable small writes less than 256KB, performance will degrade. It is due to pipeline commit behavior in 5.1, where there are long write latency with stable small writes. This issue is fixed in 5.1.1.3.

Mount option we recommend to use:

Linux OS

# mount -t nfs -o hard,intr,nolock,nfsvers=3,tcp,timeo=1200,rsize=1048600,wsize=1048600,bg HOSTNAME:/backup /ddr/backup

# mount -t nfs -o hard,intr,nolock,nfsvers=3,tcp,timeo=1200rsize=1048600,wsize=1048600,bg HOSTNAME:/data/col1/<mtree> /ddr/<mount point>

Solaris OS

#mount -F nfs -o hard,intr,llock,vers=3,proto=tcp,timeo=1200,sec=sys,rsize=1048600,wsize=1048600 HOSTNAME:/backup /ddr/backup

# mount -F nfs -o hard,intr,llock,vers=3,proto=tcp,timeo=1200,sec=sys,rsize=1048600,wsize=1048600 HOSTNAME:/data/col1/<mtree> /ddr/<mountpoint>

AIX OS

# mount –V nfs –o intr,hard,llock,timeo=1200,rsize=65536,wsize=65536,vers=3,proto=tcp,combehind,timeo=600,retrans=2 -p HOSTNAME:/backup /ddr

# mount –V nfs –o intr,hard,llock,timeo=1200,rsize=65536,wsize=65536,vers=3,proto=tcp,combehind,timeo=600,retrans=2 -p HOSTNAME:/data/col1/<mtree> /ddr/<mountpoint>

Network

· If the client is having multiple network interfaces please make sure the routing is correct and the NFS I/O operations are using the correct interface.

· Set the tcp buffers to the max value the client OS can support. For more detailed information please look at the below tuning guide for each client OS.

· Using ping check the RTT between the client and the DD.

· If there are any firewall between the media server and the DD make sure NFS port 2049 is open.

· Please make sure the MTU size used is same throughout the data path. Verify that the MTU size is consistent between DDR and client. An inconsistent MTU will cause fragmentation which can lead to slow backups. The tracepath tool can be used to check mtu size:

o #tracepath -n <destination IP address>

Client OS Tuning

Linux OS

· The Linux tuning guide describes how to optimize the backup and restore performance to DD. This tuning guide contains the tcp and memory tuning parameters and recommended mount options.

· Please use the following tuning guide for more details: Linux tuning guide

Solaris OS

· The Solaris tuning guide describes how to optimize the backup and restore performance to DD. This tuning guide contains the tcp and memory tuning parameters and recommended mount options.

· Please use the following tuning guide for more details: Solaris Tuning guide

AIX and TSM tuning Guide

· The AIX and TSM tuning guide describes how to improve backup performance within AIX/TSM environments through OS level tuning of the AIX server and TSM backup application.

· Please use the following tuning guide for more details: AIX and TSM tuning guide

Note: DD recommends that you properly test all planned configurations in a testing environment before applying it to a production environment. You should also back up all your pre-tuning configurations.

From <https://emc–c.na5.visual.force.com/apex/KB_HowTo?id=kA0700000004Rtm>

Advertisements

About Ahmad Sabry ElGendi

https://www.linkedin.com/pub/ahmad-elgendi/94/223/559
This entry was posted in EMC. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s