About the speaker
Andrea Mauro works in IT since 1995 and held several certifications. He works as a virtualization and storage architect, specialized on VMware (but also Microsoft, Citrix and Linux) solutions.
Basic principles of backup policies
In this session you will learn:
- What is a backup policy
- Design a backup plan
- What defines a backup policy
- Where do backups go
- How will backups be performed
More sessions by Andrea Mauro
Welcome to the Basic Principles of Backup Policies Lesson. My name is Andrea Mauro, and I will be your instructor for this part of Backup Academy.
Before we start this lesson, first let me tell you a little about myself. I have 17 years of experience in IT world. I am a virtualization and storage architect specialized on VMware, but also on Microsoft, Citrix, and Linux solutions. And I work for Assyrus, an Italian IT company. I have several certifications, some from VMware, like the VCDX, and some from other vendors like Microsoft. As well, some accreditation, like vExpert. I'm the founder and board member of the Italian VMUG. Also, I am a VMTN community moderator and blogger.
Now takes take a look at the agenda of this lesson. First of all, we will explain, what is a backup policy. We will also give some information about the flow and the process to design a backup plan. Then we will detail some aspects of a backup policy, like what, where and how. The purpose of this lesson is just give the basis of some concepts. For deep technical details on specific argument there are other lessons in the Backup Academy.
So let's explain what is a policy. It's a typical principle or rule or procedure or protocol to guide decisions and achieve results. The term is not normally used to denote what is actually done but how it can be done. Related to backup order, a policy becomes a set of rules to achieve the required backup goals. Usually, we can find several aspects covered by these rules. Some could be technical, but not only.
A backup policy usually does not specify the technical aspects that are related to specific backup solutions and products. A backup policy can work at eye level, matching the requirements and the constraints to some logical aspect of the backup task. Some common examples of requirements are recovery point and recovery time objectives that basically define how much history I have to save, for the RPO, and how fast I have to store, for RTO. For the constraints, clear examples are budget limits or backup windows. Backup windows are the time slot where a backup could be performed. In most cases, backup policies could be part of larger and more general plans, like the business continuity plan.
Basically, a backup policy defines some simple aspects, like what, where, and how, how much, how often, and so on. But there could be a lot of dependency between these aspects as well, also, with the requirements and the constraints. In order to match all those dependencies, usually, a deep analysis should be necessary to define the possible solutions. For example, a small backup window can be a big constraint in a backup policy and could conflict with some requirements, like a small RPO. In this picture, you can see how RPO and RTO could define some choices in the backup policies and solutions. In some cases, backup based solutions are not applicable. For example, with a really small RTO.
As explained in the previous slides, a backup policy usually works at eye level and it is the result of a typical IT flow that starts from an initial analysis and include several [restorations and tuning] phases. Again, a backup policy usually does not specify the technical aspects that are related to specific backup solutions and products. But when the backup solution has been selected, then a backup policy could become more specific and represent a technical implementation of the particular product. The term policy could also be present in some backup products. In this case, it's simply become a set of rules in the specific vendor solution that can be just an implementation of a set of a general eye level backup policy.
The what part of a backup is not necessarily a property of a backup policy. Because in some backup products the selection list is just used as an input to the policy in order to define the backup job. But when we define the source of a backup, we must consider different aspects that may limit our choices or may create a dependency with other backup policies. First, the required protection level that is just a requirement but defines several dependencies. Also, it could be different for different sources and difference types of sources. There is also the way to handle the backup tasks. This is usually more technical aspects but must be considered in the overall backup architecture.
Then we have the type of sources. Backup and also restore procedures are different depending on the type of the sources, like files, application data, systems, and virtual machines. Then, there is also the type of transport at source side. We can handle the backup stream in different ways, starting from a simple full copy or by using more complex models, like incremental transfers or duplication at the source side. Of course, the source size could be critical, and we must define how much data and estimate how big are the sources. Now, let's give more details about the previous points.
The way to handle the backup task is just a technical aspect that, of course, depends on the backup products. But we can find some generic approaches and aspects common in most products. One is related to the agent or agent-less design. The backup program may need a specific agent on the source side to handle the backup in the right way and with enough consistency. This is common in most of the application level backups where the agent is used to manage the right backup procedure inside the application. Agent could also be needed to manage the restore procedure.
Another aspect is about the way the backup is performed, hot or cold. In hot backup, the services and the systems can run during the backup job. Note that this approach could give some input in the environment. Cold backups are simplest to handle, and also the consistency is guaranteed in this kind of approach, but they could not meet the RPO requirements or the backup windows constraints. Note that usually to handle hot backup in the right way, an agent on the source side could be needed at least to give the right backup consistency.
Finally, we can talk on how the data flow is handled, using push or pull solution. Starting from the source or from the backup machine. Usually, this technical aspect is not relevant except in some special cases, like geographic backup or backup across firewalls and the net.
The type of source is an important aspect, because the backup site and the type of data on it can influence how backup and the restore procedure could be defined. The consistency of the backup data depends also on the source type and must be guaranteed in the right or at least in a proper way. We have to also consider the specific backup solution may not support all the possible source types or may require a different and specific agent, option, or components. At last, we can find some type of source objects. The simple cases are files and folders. Although most backup solutions support files on Windows operating systems, not all can handle files on Linux or other Unix operating systems.
Then, we could have application data. In this case, we need a deep knowledge of the type of application and how data are stored and how to handle a right consistency of them, both for the backup, but also for the restore procedure. This usually requires backup agents for specific applications, and not all the backup solutions can handle the same sort of application. More complex, handle backup of entire systems.
In this case, we want to protect the system and the related data instead of only a subset of its data. In this case, we are talking about a major level backup. By using specific features available in most of the backup products, we can perform from a backup at system level a restore of the entire system or a restore at file level. In some cases, also, a restore at the application level.
Finally, we can have virtual machine backup. In this case, it is really similar to the previous one except that now the systems are just virtual machines. Depending on the type of hypervisor, we can handle backup more efficiently compared to physical systems. And for some hypervisors, also, in an agent-less way. The backup is usually handled at the VM image level. The file that represents the VM that is similar to a system image of a backup system. But depending on the backup solution, we can have a different level of restore, as described in the previous point.
Let's spend some words about data consistency and integrity. That is really important in all good backup and good restore. We can have different levels. We could start from a crash consistent backup to reach a full consistent recoverable set of data. Of course, it depends on the type of the source. At file level, we may need the specific OS features or agent features to handle the consistency of open files. For example, on Windows, we can use VSS. At the application level, we can use an application specific solution, like backup agents or a procedure to export the data, or on Windows, use applications that are VSS-aware. With a system backup, we have to handle the entire system state in a proper way. Again, this usually requires agent or cold backup.
With virtual environments, we can use specific functions provided by the hypervisor level, like VM snapshot to have at least crash system backup, or others like VDAP, API on VMware vSphere. For more details about consistency, but also the different types of sources, there are several lessons in the Backup Academy about the backup of physical and virtual systems and also about VSS.
When we've selected the source, then there are different ways to handle the flow from the source to restoration during a backup job. A simple solution is use a full copy. On each backup, the entire backup set is copied. This usually means larger backup windows. Another solution could be using incremental transfer. Usually, it's possible to track on the source side which objects are changed from the previous backup. In this way, we can only send this difference. Note, that this could be simple for files, because we can use the archive [B] or the modified data to identify the new files, but it's more complicated for application and system level backup. Usually, an agent is needed to understand and handle the changes.
For virtual environments, this will require a specific function at the hypervisor level. For example, the CBT introduced from vSphere 4.0. Direct transport can be used to make backup more efficiently and reduce the backup windows, especially when the backup size is huge, by reducing the amount of data that must be sent during the backup job. Also, it can use full on the destination side to store this information in a similar way. Later, we will see full, differential, and incremental models on destination side. Of course, in both cases a compression and/or duplication at file or block level at this side may reduce the amount of data. In order to do this one, an agent on the source and several sources to handle these activities could be needed.
In case of a shared storage environment and/or a virtual environment, the transport type will also define how data are transferred. For example, using the shared network or a backup network, is the solution adopted on most agent based backup solutions. Or using the SAN, for example by [activated] data from the storage itself, instead of the sources like in this picture, or using other methods, like the other backup transport model on VMware virtual environment. By choosing a LAN-free solution, the backup can usually perform faster and/or with less impact on the production environment.
The size of the sources depends not only by the amount of data, but also by the source and the transport types. The type of the source will increase the amount of data. For example, a major level backup may include also unused space on the full system or deleted space that has not yet been reclaimed. A right transport type could be used to reduce the size of the data that must be transferred from the source, for example, by using incremental transfer. But how data changes in the time may limit incremental transfer, for example, in a database. And also, keeping integrity and consistency may require more space. The amount of data and how they are stored on the destination will also define some constraints on the type of destination and its size. Size could also create some dependency on how to handle consistency of the data. In most cases, some kind of snapshot technologies are used. This means that more additional space is needed to handle the snapshot of beta data.
Now, let's talk about where the backup could be stored. Basically, we could have two main types of destinations. One is disk based, where backup data are stored on some kind of disks, logical disks or network shares. The backup procedure is just a disk-to-disk flow. The second is tape based, where backup data are stored on tapes, on logical or virtual tapes, and the backup procedure is just a disk-to-tape flow. Each solution has some advantages and also some disadvantages. The good policy must choose the one that could best fit the requirements and constraints.
Also, we can have some hybrid approach where both disks and tapes are used. For example, in a disk-to-disk-to-tape flow, where disks are used as a first level of backup, or just for staging in order to save data faster. And then move the backup to the tape outside the backup window. This kind of approach can gain the best of both worlds. Finally of course, there are also other classifications for the destination. For example, using a cloud model, we can have on-premise or off-premise destinations. Backup could just become a service in a backup as a service model.
Backup to disk usually consists of some kind of structure file stored on proper storage. We can use a set of disks connected with direct attach or storage at a network, usually organized in one or more logical volume using some kind of RAID configuration. In most cases, RAID 5 or RAID 6 to maximize the available space and guarantee some kind of redundancy. Or we can use network shares. In this case, the destination is just a share using standard network sharing protocols, like NFS or [SIF SAN B]. On the destination side, of course, there are some kinds of disks organized in a similar way as the previous point. Working in [inaudible 19:43] or virtual appliances that usually work as a NAS using network shares. But some may also work with a SCSI by exporting logical disks or other way. For example, VTL as explained in the next slide. Note that backup appliances may not provide the backup software. In this case, you can use what you prefer. It must be compatible with the protocols exported by the appliance.
A disk based solution has several advantages over a tape based solution. First, they have a great capacity. Actually, the tape capacity is limited to 1.5 TB on LTO5, compared to disk capacity, 2 or 3 TB on common SATA disk. The fact that tape drive could be fast, 140 MB on compressed LTO5 compared to the average high capacity of disk that is around 35 MB per second. Disks can be strapped together to increase the speed of the resultant set. There are a lot of time [inaudible 20:59] for a tape lookup in the location that could slow down the start of the backup. And also, the restore time and the RTO could become too big. This solution has bigger scalability, especially in the appliance based solutions. In some cases, we can scale with more appliances from the same vendor and use them together as a single logical device.
With disk-based solutions, replication and disaster recovery are simple. We can use a storage based replication or also file level replication to make remote copies of the backup data. Finally, a backup disk solution is more flexible. For example, the duplication technology could be possible on it, rather than simple compression technology that is commonly supported on tapes. Also, most backup programs now support at least the backup to disk. Not all support also the backup to tape, especially for backup programs in virtual environments.
In a backup to tape solution, the destination usually consists in kind of tape device. We can start from a single tape unit that usually can only handle one tape cartridge at one time and could be directly attached to the web, internal or external connection using SAS or SCSI cables. Another choice could be an autoloader. These usually consist in an appliance with one or more tape units in several slots for tapes. Again, it could be connected using direct attach, like in the previous point, but in some cases, also with storage at a network. The next level is a tape library, similar to an autoloader but usually bigger and with more size and features. The connection in most cases is using the storage at a network. There is also a hybrid solution called virtual tape library. It could be an appliance, hardware or software, that can be used as an autoloader or tape library, but on the backend, it works usually with disks instead of tapes. You will get more information in the next slide.
Actually, tape standards defined by organizations, like the Linear Tape Open consortium, whose members include Hewlett-Packard, IBM, and Quantum, is officially released specifications for the LTO generation 5 with around 1.5 TB of native capacity and throughput of 140 MB per second of compression. It's working for future standards, so tapes are growing, but also disks are growing and faster. Also, autoloaders and tape libraries are usually more expensive solutions compared to similar disk-based solutions. But they are similar to some kind of backup appliance. The cost per-TB for tape compared to SATA disk is becoming not so different. At least there is still a big advantage of tape-based solutions that consists of the possible to remove and archive a tape. This is not so simple with a disk, although there are some kinds of solutions using a removable cartridge that contains the disk.
As explained in the previous slide, the virtual tape library is a hybrid solution that can emulate an autoloader or tape library, but it works on the backend, usually with disks instead of tapes. This kind of solution could be implemented as a gateway appliance to be used as a frontend for existing storage, but could also be a feature on some disk-to-disk appliance in order to give more options in supported interfaces and protocols. Of course, it could also be implemented with software-based products. For example, there are some virtual appliances for this purpose.
The advantages of using VTL is that it can be used to migrate from a backup-to-tape solution to backup-to-disk without changing anything on the backup programs, except to reconfigure for the new tape library. It could simply be used to improve a scale a disk-to-tape solution. Note that a virtual tape library may be able to export this data to physical tapes, so it could also be used in a multitier scanner. Also, it can give an interesting advantage in backup architecture that must be now freed because you can use the SAN protocols for tape library and move the backup traffic on the SAN side.
Now, let's talk about the “how” part of a backup policy that defines several aspects. One could be if a multitier or multilevel or hierarchy backup must be implemented. In this case, we have to define how data are distributed using different types of destinations, as described in the previous slides. One important aspect is the destination format. How data are saved and in which format, with which kind of relations with previous data, full incremental, differential, de-duplicated, and so on. Then, we have to define the backup frequency, how often are performed the backup jobs. Finally, there is the backup retention that defines how old data are removed from the destination and how much data must remain to be compliant.
In multitier backup, we could combine different techniques or use multiple levels or build a hierarchy organization. As previously described, we can combine disk-to-disk and disk-to-tape together to have two levels of backup in a disk-to-disk-to-tape model. But we can also have multiple levels of backup.
For example, the very first level could be the storage itself, using some storage features like Snapshot or recovery or consistency points. Formally, this solution is not a real backup, but we can use it to provide a really fast recovery procedure or as a source for other backup levels. Offline media could be another level, usually used as a final level. Also, in an environment with replication, we can use the replicas as an additional backup level. Of course, a limitation could be existent, depending on the backup program on replica compared to a real backup.
The format of the backup data depends on the backup data, but also, if we are using disk or tape approach. As previously described, by using a disk approach, we can have different options to store the backup data. We can use a set of files that are similar to archive files, but also a complete duplication archive could be implemented. This could be common on most of the disk-to-disk appliances. On tape destination, the type of the format is limited, usually, consisting only of a sequence of compressed archives. Note that most tape units can handle the compression in other, so you can upload this process to the unit itself.
About how the data are related with previous data, we can have different formats. In full mode, each backup is an entire full backup of the source data. This does not imply that data must always be transferred with the full backup from the source except the first time. Because the full set could also be rebuilt from the previous data in incremental transfer. In differential or cumulative mode, each differential backup includes all changes from the last full backup. Backup jobs are faster compared to the full backup, and to make a restore, you just need a full set and the last differential set.
In an incremental model, each incremental backup includes all changes from the last backup. For this reason, backup jobs are usually faster, but to perform a restore, you may need, in the worst case, the last full backup and all incremental backups.
Differential or incremental backup could be obtained using the full or incremental transfer. That is common using the second way when it is available. Those type of formats are normally used on disk-to-tape backup, depending also on the retention of the data. Of course, there is some input on the duration on the backup and also restore jobs. In a disk-to disk-solution, other formats could be implemented, like a duplication, as I read in the previous slide, but also other variations of previous formats, like reversed incremental. This backup is always rebuilt as full backup in previous or incremental backup or synthetic backup. That is an incremental model starting from a specific date and a reverse incremental before this date. Note that the security of backup data could be handled by some backup programs by using encryption for disk-to-disk appliances. This may be performed from the appliance itself. Also, some tape units are able to offload this process.
The backup frequency defines how much the different types of backup are performed in the different job schedules. RPO and other business requirements usually define those schedules. We can use and combine the different destination format type in a more complex schedule. For example, a full backup on the first Sunday of each month, differential on other Sundays and then incremental in all the other days.
Backup retention defines the minimum time, where data must be maintained, and when all data can be deleted and/or put. Of course, it depends on previous properties. The source type and its size could be a requirement that define a minimum destination size in order to guarantee the right retention. The frequency of the backup will increase the size of the backup data. More backup means more data, and this will limit the retention. The format type can reduce the required space. For example, by using the duplication or incremental backup and/or compression. In this way, we can increase the retention. The destination type also could define the amount of data that could be stored. We will get more information in the next slide. And of course, the required recovery point objective defines the retention needs. In some cases, it is possible to derive the choice from the available destination space. That will be just estimated if we are using compression or de-duplication. Then, we can define what could be a reasonable retention. But if the choice must be driven by the RPO, then the destination space must be designed to achieve the requirements.
Now, let's explain how retention could be handled in a different way if we are performing backup-to-disk or backup-to-tape. In a backup-to-disk solution, usually we can use incremental backup, because they could be [inaudible 34:00] efficiently without big disadvantages in the restore procedure. When more space is needed, we can scale with more disks to reach the required space, or use, if possible, compression and/or de-duplication to improve the space usage. Reclaim the space of expired data usually could be simple and faster, because backup data are stored on some kind of files.
In a backup-to-tape solution, usually we need a combination of different full, differential, and incremental models to improve the speed of the restore and try to store more information on the available media. But to scale away more space, other policies are needed to change and manage the tape media and to work also with offline subparts. Also, the reclaim on the backup space could be more complicated, and there are specific policies to handle the tape media rotation in order to recycle the old expired tapes and also try to maximize the tape lifetime.
Let's spend a couple of words about the media rotation policy, also called a backup rotation scheme, that defines how and when each media is used for a backup job and how long it's retained. Different techniques are evolved over time to balance data retention and restoration needs with the cost of extra data storage media. In this slide, you can see the main types of rotation policies. As explained in the previous slide, this policy is common with tapes, but could be used for [inaudible 35:47] media.
To conclude this lesson, let's make a short overview. First, we have explained what is a backup policy. Then, we have detailed what makes and defines a backup policy. But the most important message is why a backup policy is important. As you can understand, a backup policy could be used to design, but also to document, your backup.
It could be integrated in other plans, like a business continuity plan. For this reason, I recommend to keep the set of backup policies at eye the level and most vendor solutions independent as possible in order to improve and validate them. Only in the final phase you can adapt and implement them on your specific solution. This approach is common in several other fields, like for example in security.
Finally, although backup policies are usually applied to backup tasks, similar concepts can also be important to other kinds of solutions for data protection, like continuous data protection, replication, application level protection, or application.
This concluded the lesson. Thanks for your attention.