|  Sign in

About the speaker

vExpert, CCIE, VCAP
Author, Blogger

David Davis is Virtualization Review's "How-To Guy" and VMware Evangelist. As a video training author for TrainSignal.com, he has produced more than 10 video training courses, including the popular VMware vSphere video training series.

Why virtual machine backups are different

  Email to a Friend    Download    Rate  Views: 3512


In this session you will learn:

  • Hardware Independence
  • Image Level vs. File Level Backups
  • Change Block Tracking (CBT)
  • New Possibilities With VM Backups – Fast Restore, Replication, Virtual Lab, and more
  • Traditional Backup Doesn’t Understand Virtualization
  • What is the Right Tool for the Job?

More sessions by David Davis


Welcome to Why Virtual Backup Machines are Different. My name is David Davis and I'll be your instructor for this lesson. You can reach me over on Twitter @DavidMDavis or via my blog, WMwarevideos.com.

Before we get started on this lesson, first let me tell you a little bit about myself. I'm a previous VMware customer. I've served as an IT manager and a server/setwork admin., with over 18 years of experience in the IT industry. I was also awarded the vExpert award by VMware two years running for my evangelism of the VMware virtualization. I'm a VCP4. I've got the new VCAP DCA certification as well, and then in the past I've done a lot of work with Cisco where I achieved the CCIE. I've authored hundreds of articles on virtualization around the Web, including on websites like Virtualization Review as well as in the print edition of Virtualization Review. I served as a VMworld speaker and a judge, and I'm best known for my vSphere video training that you'll find over at TrainSignal.com.

Now let's learn about what we'll be covering in this lesson on why virtual machine backups are different. I'll talk about hardware independence that's created by server virtualization. I'll cover the difference between image level versus file level backups. We'll touch on change block tracking, and then I'll cover some of the new features and possibilities that are made possible with virtualization backups, things like fast restore, replication, creating virtual labs and more. I'll cover why traditional backup software just doesn't understand virtualization, and then finally we'll end with what's the right tool for the job of backing up your virtual infrastructure. So with that, let's get started.

Backing up virtual machines in your virtual infrastructure is so much different than backing up physical servers in your physical server infrastructure. I'll be making the case for that throughout this lesson.

First let's start off with the traditional architecture. Here we have a physical server. On the bottom, you see the CPU, memory, network, and storage resources of that server. So you got the server architecture with its resources, and of course you load the operating system directly on that server and then the applications run inside the operating system. It's pretty straightforward, and it's the traditional approach that everyone is used to.

Virtualization provides so many benefits, and it completely turns this design around a 180 degrees. Now let's take a look at a virtual infrastructure. Here's the same physical server. On the bottom, you have the same resources – CPU, memory, network, and disk. On top of that server hardware, instead of loading a server operating system, like Windows 2008, we've loaded ESX server. It also could be ESXI or another virtualization product. But that virtualization layer is created by the hypervisor, and then on top of that hypervisor you can then create your virtual machines.

If you notice, each one of those virtual machines has its own virtual devices. So it has its own virtual CPU, virtual memory, virtual network interface cards, and its own virtual disk. On top of that virtual hardware, you load the guest operating system and then your traditional server applications. The benefit of course to virtualization is you've got this virtualization layer that allows you to do so much more with the same physical hardware that you were running just one operating system and a handful of applications on before.

Very likely you were achieving a very low resource utilization of that hardware and not getting very much for your money. With virtualization you can get so much more for your money by running so many more virtual machines on the same hardware. It's going to require so much less infrastructure than before, save you time, and save your company money. It also creates this hardware independence. So those cubes on the top, those virtual machines represent complete servers.

Those servers are now independent of the hardware, and those virtual machines can be moved easily from one physical server to another, even from one data center to another. They can be replicated across a wide area network or across the Internet. There are just so many possibilities that are opened up by this now hardware independent guest operating systems. I can put them on a flash drive. I can give them to a friend. I can export them and I can backup those complete virtual machines with an image level backup that I'll talk about in just a second.

So to sum up hardware independence, it's the virtualization layer that allows you to abstract the operating system from being tied to specific hardware. Those virtual machines and the guest operating systems inside have drivers that are really for virtual hardware. They're no longer for the physical hardware of your servers, and it's that hardware independence that's going to allow you to move those virtual machines from server to server within the same process or family or even to another data center. Hardware independence makes the benefits of virtualization possible. By having hardware independence, you can do image level backup from one physical server to another. You could vMotion a virtual machine from one ESX host to another. It's that hardware independence that completely changes how you should look at server backup.

Now let's move on from hardware independence to image level versus file level backups. With virtualization you can backup the entire image of a virtual guest machine. That virtual image of the virtual machine is going to contain not only the virtual machine's disk file, but also the configuration files for that virtual machine. It's a completely encapsulated image of that virtual machine, containing everything inside the guest operating system as well as the applications and data. It's usually one large file and maybe a handful of very small files. That virtual machine image could be restored not only on the same server but also on other servers or even another ESX host, let's say at a disaster recovery site. It's a completely portable version of that virtual machine.

Now another benefit of image level backup with virtualization is that no agent is required to be installed in each guest operating system. Previously, with traditional, physical server backup products, you would have to install agents in every server or on every desktop that you wanted to backup. Those agents had to connect to the server over the network. You had to make sure that the agents were updated, make sure that they didn't lose connectivity. Make sure that they could authenticate. There were all these potential problems including things like licensing. The agents had to be licensed and so forth. No agents are required with image level virtualization backup.

The other benefit is that no load is put on the guest operating system when the backups occur. So the backups can occur without any downtime for the virtual machine, for the applications, and for the end users that use those applications. That's a tremendous benefit of virtualization backup is that no end users are affected when the backups occur. That means that you can even do backups during the day. Backing up the image of the server is much faster and less disruptive than backing up the files in the guest OS through an agent. Again, that's because the agents in the guest aren't being used and no load is put on the guest to perform the backup.

Image level restores are also much faster than file level restores. I remember creating disaster recovery plans where we had a large run book, we called it, of all the steps we would have to perform. First we'd have to load an operating system on a physical server at the disaster recovery site. Then we'd have to load a backup agent. We'd have to load some specific drivers. Then we'd have to restore all the files, which would take hours or maybe even days depending on the size of the databases to get that physical server back up and running.

Image level restores of virtual machines aren't going to take anything like that. You just need to restore that individual image back to an ESX or a virtualization host that can run that image. It's really not that difficult. It's going to make not only a disaster recovery planning simpler, but also just traditional server restores whenever there's a problem with a server, or even if you need to let's say clone a server and copy a production server into a test or development area so that developers can work with a copy of the real production server.

Still I don't want to leave file level restores out and file level backup, because even with image level backups there are easy ways to restore individual files. You don't want to have to restore an entire 40 gig image level backup just to get out one or two specific files that were changed. The very cool thing with most virtualization backup applications is that you can actually mount the image level backups that you created and then just pull out a few individual files. It's very easy to do. I think you'll see that when you try out image level and virtualization backup software for yourself.

From image level backup, now let's talk about change block tracking or CBT. Change block tracking is a feature with VMware ESX and ESXI Version 4.0 or later, and it's only possible on virtual machines using hardware Version 7 or later. Change block tracking is made possible when using those versions of the virtualization software and the virtual hardware.

With change block tracking, virtual machine disk blocks that are changed are tracked outside of the guest OS. So the guest operating systems have no idea that this tracking is even happening. It's the virtual host or the ESX or ESXI servers that are tracking the blocks that are changed and actually stored and accessible through an API. So that virtualization backup software can pull that data and only backup the blocks of the virtual machine disk file that have changed since the last backup. It's kind of like the old archive bit that was set on files if those files needed archiving or if they needed backup in the old Windows or DOS operating system.

So change block tracking allows virtualization backup software to backup only those change blocks, and that's going to make the virtualization backups so much faster than ever before. You want to make sure if you're using virtualization that the backup option that you choose supports change block tracking. By the way, change block tracking will be covered in much more detail in a later lesson.

Thanks to virtualization there are just so many new possibilities related to virtual machines and virtual machine backup. We talked about virtual machine portability that's created by the hardware independence that virtualization provides. With virtual machine portability, these virtual machines can now be backed up and restored on another server. You can copy a virtual machine and put it on a flash drive. You can bring it home. Run it on a work station. Take it to another site. Take it to a disaster recovery site. There are just so many things that you can do, thanks to that virtual machine portability.

That hardware independence makes so many of these other features possible, like vMotion. With vMotion, a running virtual machine can move from one physical server to another without any downtime to the end users. Of course, your backup software needs to take these types of features into account. There's storage vMotion where running a virtual machine's virtual disk file can move from one storage array to another, again with no downtime to the end users.

There's a distributed resource scheduler or DRS, which effectively provides load balancing for the virtual infrastructure, and it does this by using vMotion to move a running virtual machine from one host to another to ensure that it gets the resources that it needs.

There's VMHA, or VMware High Availability, which will restart virtual machines that were running on a failed ESX server on another ESX server or across multiple ESX servers so that they can get back up and running as soon as possible.

There's another one that I didn't mention on that list which is Distributed Power Management or DPM. With DPM, during the night, let's say, when resource utilization is low across the virtual infrastructure, virtual machines will be consolidated onto fewer physical servers and then the physical servers that weren't needed will actually be powered off to save on electricity usage. This is an amazing feature to save your company money, but your backup software also needs to understand this and take it into account.

We talked about how change block tracing provides you much faster backup by only backing up the blocks that have been modified on a particular virtual machine and the guest operating system. There's also a much faster restore thanks to image level restore, and you can even get immediate access to the virtual machines that have been backed up to get them back up and running as soon as possible by directly mounting those virtual machines from the backup repository. That way you don't have to wait for the image of that virtual machine to be restored.

Disaster recovery is just tremendously simplified because no longer do you have this application and operating system being tied to a particular type of physical server. You've just got all these portable virtual machines that can run on whatever type of server you have to have as long as they can run ESX or ESXI. You've got this replication now of the change blocks that can happen using replication software. So using software replication you could take particular virtual machines, let's say, that were backed up, and you could replicate those across a wide area network or across a secure VPN tunnel to a disaster recovery site.

Now thanks to this hardware independence and portability you can replicate these virtual machines and perform offsite backup. You don't have to take home a bunch of tapes. You don't have to provide a service that comes in an armored truck and takes the tape away to some big secure mountain up in the hills. You can now replicate your virtual machines across a WAN yourself. Just like you create a backup job, you can create a replication job.

Then finally, you could use the virtual machines that you backed up to create a virtual lab and that virtual lab could be for testing. It could be for development. You could keep the lab up and going as long as you want. You could test application upgrades, software upgrades, give developers a chance to test some new features they want to implement, or maybe you want to just use that lab to bring it up and test to ensure that your backups are really going to work should you need to restore these virtual machine images that you've been backing up. So again, so many new possibilities created by virtualization. They all relate to virtual machine backups, and your backup software needs to understand this.

That brings me to my point that traditional backup software, in many cases, doesn't understand the virtual infrastructure. Now don't get me wrong. There are backup applications out there that have been around for years, and over the past year or two they've upgraded their software to adapt to the virtual infrastructure. Now they backup physical servers as well as virtual servers. But you need to really compare the features that those physical backup server applications have now implemented for the virtual infrastructure to make sure that they're going to understand everything that's going on and really provide you everything that you might need in virtualization software backup.

In the graphic here, you can see the typical virtual machine directory with the two most important files being the VMDK or the virtual machine disk file. That's the huge file right there, you can see on that screen. It's actually 10 gigs, and then there's the very tiny VMX file, which is the virtual machine configuration file. So your virtual machine backup software is going to backup typically everything in this directory depending on how you have it configured, but of course the two most important files are always going to be backed up – the virtual machine disk and the virtual machine configuration file.

Once the initial backup occurs, of course, hopefully you're only backing up the changes to the virtual machine disk file. You don't want to have to backup that same 10 gig file every single night. The point being here is that traditional, physical server backup software is only going to backup the operating system and the applications inside that VMDK file. Nothing inside the VMX file, which is very critical to that virtual machine functioning, would be backed up by the traditional agent based software.

That VMX file is going to contain information about the virtual machine's hardware. Like you see here in this graphic: how much memory that virtual machine has; how many VCPUs that virtual machine has; how many virtual hard drives that virtual machine has and the name of those virtual hard drives or the VMDK files that the VMX file points to. All these things are important to make that virtual machine function, and your backup software needs to understand that.

Also I've seen a lot of backup products out there that say they support backing up a virtual infrastructure. But really all they're doing is going to each individual ESX server, maybe they have an agent on that server or maybe they're just downloading the virtual machines that they find on that server. My point being is that they need to go to vCenter, and they need to use the correct vSphere APIs to backup the virtual infrastructure and learn about the virtual infrastructure to really know what's going on. In the graphic here, you can see Distributed Resource Scheduler moving, running a virtual machine from one ESX host to another using vMotion.

Is your backup software going to understand that that happened? Is it going to see that as a new virtual machine? Is it going to keep backing it up? What's going to happen? Without backup software going directly to vCenter and using the correct VMware APIs to learn about the virtual infrastructure, it's really not going to backup that virtual infrastructure effectively.

What's the right tool for the job? What's the right tool to backup your virtual infrastructure? I hope by the time that you're done watching this video training course on virtualization backup, you'll have much more insight into how to select the right virtualization backup software. I think there's a lot more to virtualization backup than you might think. It's a lot different than traditional, physical server backup.

In this lesson, we talked about how virtualization backup is so much different than physical server backup because of hardware independence. Also because of all these virtualization features, like vMotion, that move virtual machines from one host to another, you want your virtualization backup software to go to vCenter and learn about the virtual infrastructure using the correct VMware APIs. You also want your backup software to be able to back up virtual machines without interfering with the virtual machines and their applications.

You want no downtime for the applications when the virtualization backups are occurring. You want fast restore, even immediate access to those virtual machines to get them back up and running without ever having to do a long extended file restore. These are just some of the features that make virtualization backup different and some of the things you'll be learning about in this course.

Now let's review what we learned in this lesson. We started off by talking about hardware independence. Hardware independence is created when you begin using server virtualization. That virtualization layer, the hypervisor separates the physical server hardware from the newly created virtual hardware. That virtual hardware is inside each virtual machine container, and of course, on top of the virtual hardware is the guest operating system and your applications.

That entire virtual machine is actually now a portable, hardware independent unit that can be moved from one physical server to another without any sort of driver problems. That's because the guest operating system is now referencing the virtual hardware. It has drivers for the virtual hardware, and it really doesn't know anything about the physical server hardware. The drivers and the guest operating system aren't specific to the physical server hardware anymore. You can move that virtual machine. You can take it on a USB stick and move it to a VMware workstation, move it to another site that has the ESX or your disaster recovery site and load it on whatever hardware you choose. That hardware independence enables so many more of the advanced features that you think of when it comes to virtualization. Things like vMotion and Distributed Resource Scheduler are all made possible thanks to that hardware independence.

Next we moved on and talked about image level versus file level backups. With traditional, physical server backups you're performing file level backups. You have backup agents on top of each operating system, on each physical server. Those backup agents can have all sorts of problems. You can have licensing issues. They could lose connectivity to the backup server or have authentication problems, and they also take up resources from that operating system, from that physical server, and from the database or applications that are on the physical server.

They're having an impact on the applications and end users that are accessing the physical server, and that's why it can be so much of a pain to schedule your physical server backups and ensure that they don't cause problems on the network or with the physical server or the applications running on them. Traditional, physical server backups that are file level backups have a lot of problems. With virtualization we can now start using image level backups. With image level backups, you're backing up not only the virtual machine disk file, but also the virtual machine configuration file that contains information about the virtual machine hardware.

That image level backup, as we talked about, is now portable and can be moved to another physical server. You could restore that image level backup on another ESX host, and it would work without any sort of problem. Image level backups also allow us to do other really cool things. We can perform replication of those images to a disaster recovery site to have off site backups without paying for some backup tape service to come pick up our backup tapes. Oh, yeah, you don't have anymore backup tapes anymore. You just have images of these virtual machines that are replicated off site.

Of course, there are instances where you still want to put those images let's say on some sort of off site storage media, but without image level backup and virtualization this would all be so much more complex. Image level backups also enable us to mount those images dynamically, directly from the backup server to get those virtual machines back up and running immediately if they should go down. Also to create virtual labs.

Next we talked about change block tracking. Change block tracking is a new feature in vSphere 4. It requires that you use virtual machine hardware Version 7. What it does is for virtualization backup applications that talk to vCenter and use the vSphere APIs correctly, they can access this change block tracking data that's not kept inside the guest operating system. It's kept on the ESX servers. They know which blocks have changed on those virtual machines, and the virtualization backup software can backup only those blocks. That's going to make virtualization backup so much faster and also reduce your backup window if you even had any backup window at all, once you move to virtualization.

Plus there are so many new possibilities created with virtual machine backups. Now you can simply restore the entire image of a virtual machine instead of trying to restore individual files. I used to create disaster recovery plans, and I don't know if you've ever done that before, but it can be a humongous pain with physical servers. You could have a long run book where first you have to restore the operating system, load drivers on the new physical server at the DR site, then load the backup agent, license the backup agent. Oh yeah, you got to get the backup server up first, restore the backup server, restore the backup server indexes, then restore the files, and there could be thousands and tens of thousands of files that have to be restored. This entire process can take days to get a single physical server back up, restored, and that server fully functioning.

Virtual machine backups and image level backups are just going to make the whole process of disaster recovery and individual server restoration so much easier. You can also replicate these images across a wide area network to a disaster recovery site to have immediate offsite backup. You can create virtual labs for your developers or for testers, or let's say that you just want to perform a test software upgrade. Let's say you're considering upgrading your exchange server or another server. You could test that in a virtual lab by immediately restoring multiple virtual machines by mounting those off the backup server.

So many possibilities are now available thanks to virtualization and virtual machine backup. It's going to make your life and your company so much more efficient. Traditional server backup in many cases just doesn't understand virtualization. They're agent based. They might have been upgraded in the last few years to better adapt to virtualization, but many of them still don't go to the vCenter server. They don't understand the virtual inventory. They don't know where the virtual machines are. They might not be using the VMware API. They may be going directly to an ESX host. There are a lot of problems with traditional, physical server backup applications, and they may or may not match up apples-to-apples in features with virtualization specifically designed backup application.

That's going to bring me to my last point, which is what's the right tool for the job? What's the right tool to backup your virtual infrastructure? There are so many different tools out there, a lot of different options when it comes to virtualization backup, a lot of different ways to do it. So you really need to make sure that you educate yourself and make the right choice. I hope that by the end of you watching this video training course on virtualization backup you'll see that there are a lot of things to consider, you'll have much better education about how all this works. The features you should look out for and what makes virtualization and physical server backup so much different and that will, in the end, allow you to make the right choice and select the right tool for the job.

Thanks for watching this lesson on why virtual machine backup is so much different than physical server backup.

More expert videos

 Browse all videos 
Join your peers!
1000+ Backup Academy
Certified Professionals
 Ready? Take exam! 

Still not sure? Get a sneak peek to exam!