About the speaker
Backup strategies for Hyper-V
In this session you will learn:
- Hyper-V VM architecture
- VSS process
- Changed Block Tracking
- Forward incremental backup
- Increment injection
- Synthetic backup
- Reversed incremental backup
- How replication works
More sessions by Chris Henley
Good morning, good afternoon or good evening, wherever you're joining us from in the world today. My name is Chris Henley and I am a product strategy specialist with Veeam Software. Today I'd like to welcome to you to this Backup Academy session where we're going to talk about backup strategies for Hyper-V. For those of you that don't know me, I've spent the better part of the past 15 years working with Windows Server and the associated technologies that support and sustain Windows Server. Over the past couple of years, the technology that has been forefront in our mind is the technologies associated with virtualization, and in the Microsoft world that means Hyper-V.
Today we're going to focus in on one specific area of Hyper-V, and that area is backup strategies. Before we actually talk about backup strategies, it's probably important that we actually introduce you to the architecture of Hyper-V. If you understand the architecture of Hyper-V, we can clearly get an understanding of how principles like backup and recovery operations might take place based on the architecture that's existing. So let's begin here. The hardware is the base layer of the architectural diagram that we'll work with when it comes to Hyper-V. On top of the hardware, we're going to find the actual hypervisor itself. In the Microsoft world the hypervisor is called Hyper-V. On top of the hypervisor, which will manage all of the interactions between virtual machines and hardware, on top of that hypervisor, we're going to find virtual machines.
In the Hyper-V world, there are actually two different types of virtual machines. The virtual machine that's going to house the virtualization stack is called the parent VM, or parent virtual machine. Generally, that parent VM is going to have the virtualization stack as well as its own set of proprietary drivers that allow it to interact with the hypervisor, which will then in turn interact with the hardware on the virtual machine's behalf. There is only one parent virtual machine on any physical Hyper-V host.
Now, as we create additional virtual machines, they're going to be given the title of child virtual machine. Now, notice that each child virtual machine has its own set of proprietary drivers to interact with the hypervisor, but does not have a virtualization stack. The child VMs are going to rely on the parent virtual machine for virtualization stack offerings and will maintain their own set of proprietary drivers to interact with the hypervisor.
This architecture has a name, and the name is called microkernalized. In a microkernalized architecture, each of the child VMs can be effectively isolated from the parent, because they have their own set of proprietary drivers. We like this in that it adds a degree of flexibility and a degree of additional security that I think is going to serve Microsoft and Hyper-V very well in the future.
You know, we talked about architecture, but we didn't actually talk about the virtual machines themselves. It's probably a great idea that we stop and discuss what's in a virtual machine in Hyper-V. You're all familiar with the concept of a virtual hard disk, or a VHD file. Microsoft is changing the game yet again as we move to Hyper-V version 3.0, and they're moving to what's called a VHDX file format, which will extend size and functionality of that file format. But VHD or VHDX does not a virtual machine make. In fact, there are several additional components to a virtual machine.
When we look at a VM, there's also a settings file. We're going to call it config.xml, because that's what its name is if you find it in the file system. That file is going to contain all of the settings associated with the virtual machine. I also will have snapshots. This is an AVHD file. Those snapshots are going to be point in time snapshots of the virtual machine that we're going to move forward or back in time, apply those snapshots and get an operational capacity out of that virtual machine. And, last but not least, we also have a set of binary files associated with each virtual machine.
As you can see, a virtual machine is much more than just the VHD file. Now I know, you're probably thinking to yourself, the VHD file is the one that contains the operating systems, the applications, the files, the settings. Yeah, that's absolutely true, but without the configuration files, any associated snapshots of the binary files, the VHD files effectively are of no use to us.
So, when we talk Hyper-V virtual machines, and especially as we start thinking about this idea of backing up virtual machines, hopefully you see from the architectural perspective and from the contents of a Hyper-V virtual machine perspective that the backup of a VM is very different from what we would do in a physical environment. The extra hypervisor layer and the architecture, the differentiation between parent and child virtual machines, and then the additional functional components of each virtual machine in that hypervisor environment mean that we are looking at a very different process in backing up virtual machines than when we backed up physical ones.
Now Microsoft very clearly understood that the process for backing up virtual machines would be significantly different from the process of backing up physical ones, and they went back and started looking at ways that we could make this process standardized and simplified. And what they came up with was actually really kind of a nice little stroke of genius. I don't know if you remember, but way back in 2003, which is the better part of nine years ago, there was a neat little service that came out with Windows Server 2003 called the volume shadow copy service. Initially, that volume shadow copy service was designed to be used to provide you with access to previous versions of files, make recoverability accessible to the users in the environment. Now that VSS process has changed and evolved a little bit, and we now use that same VSS process to manage backup of virtual machines.
Now, the way the process works is really quite simple. Microsoft uses something called the volume shadow copy service, they use something called a VSS writer, and we use the actual virtual machines themselves. Now that volume shadow copy service will take a request, and the request is sent from something called a VSS requestor. Microsoft doesn't write the requestor. Instead, they write an SDK with some sample code, and they say, you write the requester. That VSS requester will make a request to the volume shadow copy service saying, I would like to take an image of a running virtual machine. The key is it's a running virtual machine. We don't have to stop them to make this image.
Hyper-V has something called a Hyper-V VSS writer. It's specifically designed to write images of virtual machines running in Hyper-V. That VSS writer will write the image of our virtual machine, pass it back to the VSS requester. That process of making an image of a running virtual machine through the VSS process dramatically improves our ability to do backup and restorations in Hyper-V. Now, before you think to yourself, so really all I have to do is write my own VSS requester, wait, that's not necessarily the case. Actually there's much, much more.
So I've got my image. I used the VSS process. Now what? Well, now what. We're going to introduce you to some software based technologies that have been put in play that will allow you to effectively backup and recover your virtual machine environments. I'm also going to throw in some interesting additional topics here that I think you'll want to know about because they are nice innovative changes in the industry. We're going to talk about change block tracking. Change block tracking is incredibly important when we start talking about ways to reduce the time necessary to make backups. This is really kind of cool, because by using something like change block tracking, I can now fit backups well within the available backup windows that I have.
We're going to talk about the most popular backup strategy on the planet, called forward incremental. We're going to then introduce you to what I think is one of the coolest innovations in software-based backup, called increment injection. We're going to talk about something called synthetics. And then finally, we'll end this segment with a discussion of something called reverse incremental backup. We'll kind of roll all of those pieces together and see what kind of amazing things we can do.
First, let's talk about change block tracking. Do you recognize this slide? Remember, this is our architectural diagram of Hyper-V. Remember our microkernalized design. Earlier in this discussion, I told you that there was one parent virtual machine per Hyper-V physical host. Because that is the case and because each of those child virtual machines are sharing a virtualization stack with that parent VM, when you use something called change block tracking, you can actually install a driver in the Hyper-V host that will monitor all of the child virtual machines and the parent that are running on that host.
What does that actually mean? Well, what it means is I now have a mechanism in place to monitor virtual machines that I have backed up or that I'm going to back up, and I can dramatically reduce the time it takes to make those backups because I can monitor what's already changes. So change block tracking is a neat feature. Let's go into some more detail, and I'll show you how it works.
So here we have a virtual machine. You can see the data represented by all the ones and zeroes. I've got a change block tracking mechanism, and that change block tracking mechanism is simply going to monitor that virtual machine for changes. Now, when a change comes along, and here's some new data, the change block tracking driver is going to note those changes and save them to something called a CTP file. Now that CTP file is going to hold those changes until we decide we're going to make our next backup.
Now the cool this is, without change block tracking, I would have to go back and parse that whole virtual machine start to finish, find the blocks that have changed, and then write those change blocks out to my incremental backup file. With change block tracking, I already know the blocks that have changed, because I've been monitoring it all along. I've got them listed in the CTP file. At this point, all I need to do is take those change blocks and write the change blocks to my incremental backup, in this case a .vib file. Now, I'm ready to go. No parsing all of the data and no comparisons. I already did that with the tracking driver in the CTP. This dramatically reduced the size of window necessary to complete an incremental backup.
Since we're on the topic of incremental backup, let's go here. I've been doing backups now for almost two decades. I've seen backups run in all kinds of environments in all kinds of methods, but one thing has not changed; the strategies that we use for backups, specifically, this one, incremental backup, have been around for a long, long time. I think they've been around for a long, long time because they made good sense. And I still think that today sometimes they make good sense as well. Let's review this. If we were going in using backup tools, and we were going to back up a Hyper-V virtual machine, we would start with a strategy that was an incremental strategy.
The first backup that we make is going to be a complete full backup. It's going to come out as a full.vbk. Now that full backup will contain all of the information from not only the VHD file, but also from the snapshots, from the config.xml, and from the binary files. Remember, when we're backing up virtual machines, I need to be able to restore the virtual machine in its entirety, not just the contents of the VHD. Once that virtual machine has been backed up, it's full.vbk, I'm going to use the change block tracking driver, I'm going to monitor changes that are made, and at this point, I am going to start making incremental backups.
Now, we're going to make incremental backups based on a schedule, and this is where I don't have a specific recommendation for you. Use a schedule that works for you. Some people do incremental backups on a daily basis. Some people do incremental backups on an every other day basis. Some do multiple increments per day. You pick and choose how you're going to do your incremental backups. Make them fit the environment that you're using. We're going to continue to make increments, and in this case we'll make four of them and we'll stop and we'll talk about how easy and how cool an incremental backup strategy really is.
The beauty of an incremental backup strategy is in the fact that the increments generally don't take a long time to make and they don't take a lot of disk space. I guess if we were to look at this and say ideally what would you like, if we threw out all of your knowledge about backups and backup strategies previous, you would probably say, "What I would like is a complete full backup of my virtual machine in regular intervals on any given day or set of days." But that takes a lot of disk space and a lot of time, so we settle for incremental backup, because it's the next best thing.
Now, when I'm using incremental backup and it's time to restore, I know you're very familiar with the process. Let's review it. I'm going to start with my full.vbk. I'll restore that. Then I have to restore each of the individual increments and add them to the full.vbk file. And that process is oftentimes painful and lengthy. There really needs to be a better way, and the better way is actually really fairly intuitive. Most of you will probably scratch your head and say, "Why didn't we just do this before?" When we talk about full.vbk and we talk about increments, instead of making traditional incremental backups - where we say full.vbk and then we'll just add increments, add increments, add increments until we're ready - it's possible to use the change block tracking mechanism that we've already introduced you to, make an increment, but then instead of saving the increment as an additional component to be rolled in later, I can actually inject that increment into my full.vbk.
So here are my changes. I make my CTP file just like we did before. But this time when I make my increment, I'm actually going to put the increment in the full.vbk file. I know which piece is changed. Right? I saved them all out to a CTP. If I know what increments changed, I know where they came from, why not just take them and put them in the full.vbk. Here's where life gets really interesting. Because I've taken my increments and I've injected them into my full.vbk file, when I'm ready to recover, I just recover the full.vbk. I don't have the roll in additional increments, because I already did the rolling in. What an interesting, innovative idea. It's called injecting increments into a full backup.
Now, with that said, I have just introduced you to a concept that is the equivalent of Pandora's box. If I have the ability to inject increments into a full.vbk, then frankly speaking, I can do what we said early would be our desire, where I have a full backup that is consistently updated on whatever regular schedule I choose, and I always have a full backup at the ready. Now, the concept here is called synthetic backup. It's called synthetic backup or synthetic full backup because as I build my increments, I actually have options. I could take my increment and inject it into my full backup. I could also build increments and actually just store them outside the backup. So I could do traditional backup on a forward incremental.
But then when it comes to Wednesday... we've got this week. So I've got a full backup on Sunday. Let's say that I decide, gosh, you know, I'm going to go ahead and do my regular incremental backup strategy with a full on Sunday and two increments Monday, Tuesday, but I would like to have a full on Wednesday, as well. Now, traditionally, if we wanted a full backup, we'd go back to the original virtual machine, parse it start to finish, and then make that backup full.vbk. Well, I don't really need to do that anymore because I have full.vbk already in existence, and I know all of the changes that have occurred since Sunday in the form on increments on Monday and Tuesday. So why not simply take my synthetic, or the components, a copy of full.vbk, and then add increments from Monday and Tuesday? Now Wednesday is identical to my currently running virtual machine.
That concept is called a synthetic full backup. The beauty of it is that, while I have a synthetic full backup, a complete copy of the virtual machine, I didn't have to go back and hand the virtual machine the overhead necessary to make that full backup. This idea of synthetics and synthetic full backups can be used to create a backup strategy that we simply have not used before. When I think about full.vbk and I think about the ability to inject increments, logic tells me that I probably ought to approach backup with a new strategy, one that gives me recoverability with maximum ability in terms of time and speed, but also recoverability in terms of ability to roll back in time.
Now, let's talk about that. That concept is called reversed incremental backup. Now we call this reversed incremental backup, because we're going to use the concepts associated with synthetics in conjunction with a concept that you already are familiar with called restore points. Now let's watch what happens here. So, I've got my original virtual machine. I've got my change block tracking driver. Changes occur. I take a VIB, I inject it into the VBK file. So now effectively, Monday's got my full.vbk. Then, with the changes that were replaced in the full.vbk, I save something called a VRB, a restore point, and I put the restore point on Sunday. All right, Tuesday comes along. I'm going to create an increment, inject the increment in the full.vbk, and then save the changes that were replaced as a restore point. Wednesday comes along, increment restore point.
Now, when you think about this in terms of strategy for recovery, the vast majority of the time, when we do a recovery, we want the most current data. There it is. Full.vbk. It also happens to be our most current file. If that is indeed what we need, it is the easiest possible recovery. Now sometimes - and I put this on Wednesday on purpose - you know that Microsoft releases updates on Tuesdays. Let's say we go in and we hit new updates on Tuesdays. We apply them on Tuesday night. Wednesday we come up and there is a problem. We need to be able to roll back.
Well, with reversed incremental, we can pick the point in time that we want to roll back to. So we can actually say, you know what, I'd actually like to roll back to Monday, the day before Tuesday's updates were applied. And by making that choice, I could take my full.vbk and then simply add the restore points back to Monday. So I get great flexibility with the most current data, and it all is up-to-date full.vbk. This idea of reversed incremental has tremendous potential when it comes to backing up virtual machines.
While we're here, many of you are probably thinking to yourself, if I get this straight, you're actually making an incremental (that's absolutely true), injecting the increment in the full.vbk (also true), and creating a VRB file. That sounds to me like the process would take longer than a traditional forward incremental and that is true. However, when we consider the cost and time savings that we're getting out of change block tracking mechanisms, and when you consider the flexibility which you're getting on recovery, I'll take this every time over a forward incremental strategy. Am I using a little bit more disk space? Yeah, probably so. And reasonably speaking, I'm probably taking a little bit more time than a forward incremental. However, what I'm getting in return is absolutely amazing.
You know, when we think about this idea of injecting increments into a full backup, there is a whole world of possibilities that starts to open up. Let's talk about another one. When we think about backing up virtual machine, at some point we start to talk about something called RPO and RTO. This idea is really pretty simple. We don't want to lose data and we don't want to lose operating time. So, RPO and RTO discuss those two concepts.
There is a portion of backing and recovery that says we should be able to backup continuously. That way, I actually have full functionality of an operating system or virtual machine ready to go at a moment's notice. They call this concept continuous data protection. Unfortunately, continuous data protection systems cost a significant amount of money. Generally speaking, we're talking side-by-side hardware, add operational capacity ready to go, side-by-side virtual machines, side-by-side disk space, as well as fairly complicated software to install, configure, manage, and operate a continuous data protection environment.
Well, it's actually possible to get what we're going to call near continuous data protection. Now when I say near continuous data replication, near means within about five minutes. Depending on the workload that you're using, five minutes is very, very good. RPOs and RTOs at five minutes are a challenge to acquire, to say the least. But with a piece of software like this and the innovations built in, you could do just that. Now generally, we find this idea of near continuous data protection as part of a process that we're going to call replicas and replication.
Now please resist the temptation to think replication like replication of databases. That's not what we're discussing here. In this case, replication is building an exact duplicate of a virtual machine and then keeping those duplicates synchronized between one another, or those replicas synchronized between one another. Now that you know the basic operations associated with increments and increment injection, it'll be very easy for you to understand the process of replication and how it can be implemented.
So, here's my two different physical hosts, right? I'm going to need two physical servers. One of them is going to be my host machine. The one that has the change block tracking driver right there, you can see it, it's hosting our virtual machine. Now we're going to say, gosh, I want to make a replica of that VM. Now a replica in this case means an exact duplicate copy of the virtual machine and all its associated files. We're going to copy that virtual machine across either a LAN or a WAN link, and we're going to place it on its target hardware. So at this point, really what I've done is copy the virtual machine and place it out on another physical host.
There are those out there who are saying, "Oh, that looks a lot like an export and then an import." Well, sort of, although we're going to take this further. We want more than just a copy of a virtual machine sitting out there. We actually want updates to the virtual machines. So at this point, what we'll do is we use the change block tracking driver. We'll create increments of the virtual machine on the host. We'll take those increments, sync them across the LAN or WAN, inject them into our target virtual machine and create a restore point. Now by doing this, I have two virtual machines that are replicas of one another that are being updated on a regular schedule. Now when we say regular schedule, you tell the software how regular the schedule is going to be. We can take it down to as low as every five minutes. Now with a schedule update every five minutes, we can get very, very good replication time values and replication loss times, RTO and RPO.
Here comes another increment. We're going to go ahead and move it over, inject it to the VM, create a restore point. Cool concept. The neat thing about working with replication is that, notice, if the source virtual machine should happen to fail for whatever reason - could be a hardware failure, could be software failure, could be intentional failure. Maybe we want to do an update, for example, and we need to shut it down and restart it. Whatever the reason is, I have a target virtual machine running as a replica that's up-to-date.
Now, if for some reason things go catastrophic on us and we lose our host machine, our source machine and there's some kind of functional reason, we can go to our target virtual machine, and we have restore points available. So we can say, you know what, take me back to the last restore point where I know things were good. That process is called fail over. Once we've failed over, we can also fail back and then save the changes. Replication gives me the ability to have near continuous data protection and do so in such a way that does not add significant cost to my infrastructure. Now why in the world would we do backup anyway? Well, the reason we do backup is because we need to be able to recover. The reality is that even in a virtual environment, things can go wrong, and when they do go wrong, we need to have a plan in place so that we can do restore operations.
I'm going to talk to you have two concepts in the world of backup with Hyper-V that are incredibly important. The first is standard restore operations. We've talked about the concepts of full backups, incremental backups, forward incremental and reversed incremental backups. When it comes to restoring, there's going to be a process that I'm going to just simplify here on the screen. Generally speaking, when we do backup and recovery operations, we take our full.vbk. When we're ready to restore, we run a restore wizard, and we restore the VBK to a traditional VHD and associated files format. Right? I add the config.xml, the snapshots, the binaries, and my VHD file. Once that is back in place, at that point, I make that VM available to my network to shoulder its workload.
Unfortunately, there is a time and data constraint. We're going to lose both time and data while we run the process of restore operations. Based on what I've already talked to you about today, many of you are probably scratching your head and thinking to yourself, hey, I have the ability to inject increments into a VBK file. I've got restore point capability. You're telling me we can't figure out a way to run the operating system from the VBK file? Well, actually we can. With instant VM recovery, and the title essentially says it all, instead of running a traditional restore operation where we're converting files from VBK back to their original VHD, XML, binary, AVHD, etc., and then making the VM available, we could actually just mount the VBK, start the operating system, and make the VM available.
Now my recovery operation can literally be done in minutes, instead of measuring that in time it takes to run the restore operation. Please note, our recommendation is that you actually use instant VM recovery as a temporary solution for recovering your virtual machines while you go about the process of restore operations. Remember, our goal here is to reduce RPO and reduce RTO. The lower we can get those, the better. We don't want data loss, we don't want operating time loss. So what do we think about backup and recovery operations? Here again, with instant VM recovery, we see a nice innovation that allows us dramatic improvements for reducing data loss and reducing time loss when it comes to virtual machines and VM recoveries.
From a summary perspective, we've talked today about Hyper-V virtual machines. We've talked about they're much more than just a single file. You are familiar with the architecture now. You understand what the microkernalized architecture look like. We understand VSS and how it works with the Hyper-V backup process. And we've introduced you to some amazing software innovations that extent backup and restore capabilities. This is Chris Henley saying thanks for spending some time with us at this Backup Academy session. We look forward to seeing you again soon.