INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Backing up Volumes with Millions of files - Suggestions?

Backing up Volumes with Millions of files - Suggestions?

(OP)
Hi All,

Looking for your thoughts on this scenario.

At our site, we have a volume containing ~20,000 user home folders totalling ~30 million files. The volume is on a Netapp Filer, which backs up via NDMP to a Tape Library with 4 LTO4 tape drives.

All other volumes on the NAS backup fine (and fast) with a throughput of 100-200GB/hr, but as soon as the backups reach the above volume, performance is dire.

We had a single subclient to begin with for the above volume, and when the job started, it would sit there for at least 12 hours, simply scanning the volume before even beginning to back it up. The throughput would often only be around 5GB/hr and rarely go above 50-60GB/hr obviously caused by the sheer number of small files it is handling.

The volume has 26 subfolders in it representing the surnames of the users. We've now setup a sub client per folder to give us a better idea of the breakdown of each folder in our reports. However it still takes 4-5 hours to scan many of the folders, and the throughput is still only a few GB/hr on average. Also for some reason the overall backup window for this volume is now longer since we split it into separate subclients per folder.

We are looking at implementing synthetic fulls, however whilst this will solve the *full* backups, it seems likely the incrementals are still going to go into a 2 day window because of the sheer number of files they still have to scan.

Does anyone else have problems backing up volumes with millions of files, and what steps have they taken to improve the situation.

I'm interested in any thoughts/comments/suggestions on this subject.

RE: Backing up Volumes with Millions of files - Suggestions?

With such a scenario, you will most likely be able to get a fast drive in streaming mode. Using faster drives will even make the scenario worse .. at least for media and drive. It could also increase the backup time.

IMHO, a snapshot solution should be what you should consider.

RE: Backing up Volumes with Millions of files - Suggestions?

(OP)
605, when you say 'fast drive' are you referring to the fact we are using LTO4? and why would using faster drives make the situation and backup time worse?

What do you mean by streaming mode? I've looked at books on line, and the knowledgebase, but i can't find any reference to this term?

When you say a snapshot solution should be what we should consider, please can you elaborate? We are currently already using the snapshot features of the filer on a 30 day cycle for short term recovery of files, using the shadow copy features, but for DR purposes obviously we still need to backup to tape. So i'm not sure what else you might mean??

RE: Backing up Volumes with Millions of files - Suggestions?

I'll let the previous poster answer your questions as to what he meant, but in general terms, a faster drive can sometimes have problems writing data that is being fed to it slowly.

The problem exists because drives want to write data to the tape fast, but when the rate at which the data is being sent to it cannot match that high speed, they cannot sustain writing at the intended speed, with one of two consequences. Firstly, modern drives (which probably means your LTO4s as well) can slow down the tape so that it goes over the heads more slowly, in order to match the tape speed with the flow of data being written to them, although I'm not sure that the speed can be *exactly* matched (perhaps some drives are only 2 speed, or 3 or 4 - you know: normal speed, half speed, quarter speed?). Secondly, drives that cannot slow down, or even if they can, cannot slow down to exactly match the speed of the data being fed to them, suffer from an effect called "shoe-shining". This is where the drive isn't being given data at the high speed that it expects, so has to stop the tape, rewind it a little, and start it forward again. This constant stopping, rewinding, and forwarding is reminiscent of someone shining your shoes (rubbing a cloth back and forwards over the shoes) and is very bad for performance, and for head life! With the tape going back and forward over the heads, it's claimed that they can actually wear out faster. Besides, performance is sometimes atrocious as a result of shoe-shining, and sometimes it ends up being written slower than the rate of the already slow data! That is why, sometimes, a faster drive won't improve performance, and can sometimes make it worse! Ask your drive's vendor about whether it has variable speeds (and how many) and how it avoids the shoe-shining effect.

In a nutshell then, it would seem (from the information you've given) that you would get the most benefit from trying to ensure that you get the data to the drives faster, as the drives don't appear to be the problem, but I think you already knew that.

It would seem that both the scan and the backup itself is taking a long time. You have done exactly the right thing by placing your user folders under 26 separate folders, labelled A-Z. I have seen some sites that put them all in one folder, with awful results! I wouldn't separate the 26 into 26 separate sub-clients though (too many to run at once). I suggest 7 sub-clients, each with the full on a different day of the week. Try, by experimentation, grouping the 26 folders into the 7 sub-clients so that they are roughly evenly balanced, both by NUMBER of files and SIZE of data to backup (check Galaxy reports every day for a while to get the balance about right). It is a sad fact that people's surnames are not evenly spread amongst the alphabet, with lots more starting with "S", for example, than "X". You might notice that to be evenly balanced you may have to put a lot of folders in one sub-client (for example W, X, Y, Z) but only a couple in another (for example A and S). Note that even 7 sub-clients will be too many if the server doing the backup is not very fast.

Other suggestions:

* Delete unwanted files (and if the users won't do it, you do it for them! smile

* Make sure there is plenty of free space on the disk (a full disk can cause horrible fragmentation)

* Perform regular defragmentation of the disk, even if Windows says (as it notoriously does) that defragmentation is not needed (look at the fragmentation report yourself).

* Backup a snapshot of your disk rather than the live disk itself (I think this was what the previous poster was getting at). This means take a snapshot of the disk and let the backup run on that. At the end of the backup, dissolve the snapshot. Yes I know that won't make the thing run faster but it does take the load off the live disk (and maybe shift the work to a different server too).

* Best of all, move some of the user folders (eg. M to Z) onto a different disk. There are limits to how much data can be scanned and shifted n short periods of time, particularly with millions of little files to deal with. The problem of poor performance is almost certainly at the Windows file system side of it, and not the backup software, which of course will just move data from disk to tape as quickly as the hardware will allow it to. Windows is probably straining having to open and close so many files so quickly, with all of the overhead involved in doing that (for any file, regardless of size).

* Actually there is an even better solution: put your user files on a Unix platform. And please do not think that I am joking here. I think you will be surprised at just how much more quickly things happen in a proper operating system. There is unlikely to be any effect on your users by storing their files on a different platform. Oh, and use a Unix system for your Galaxy Media Agent as well. Then watch the data fly!

None of the above are quick fixes to your problem - sorry. You may need to get someone in if this problem continues to cause you difficulties.

Good luck.

RE: Backing up Volumes with Millions of files - Suggestions?

(OP)
Dear Craig,

That's some very interesting information there. I've never heard of the 'shoe-shining' effect, but i will definitely be investigating more into this.

Although i've found a relevant article on the HP forums that states (not necessarily factually) that shoe-shining shouldn't affect modern LTO drives. Although there is reference to it being variable between 40GB/hr-120GB/hr, but not what happens when it goes below 40GB/hr.

See http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1238171504362+28353475&threadId=1212744

I will find out more from our suppliers on this point as we definitely need to know more about this.

I forgot to mention that we do actually snap-mirror the data from the Primary Filer (where the home drives are served from) to a DR filer. It is then the DR filer that is backed up to tape. Therefore we are already backing up a snapshot of the data rather than the data itself (which is possibly what the previous poster was getting at). I should have mentioned that point.

In discussion here, we have already decided that there has been a considerable increase in the backup window, since we went from a single subclient to 26 subclients. Hence our next test was to actually bring it back down to 4 subclients which will give one subclient/job per tape drive, which might balance things a bit better.

Although the 7 subclients (1 for each day of the week) is an interesting idea, and we will seriously consider that option also.

Funny you should mention about putting the files on a Unix platform. This is the very platform we moved it 'off of'. This is because we had performance issues serving it from Unix boxes via a Samba layer to Windows clients. I'm not sure of all the details on this side of things, other than it was a causing performance problems that were apparently not resolvable except to move it *off* Unix.

Also note that the OS on our Netapp Filer is Netapp's own 'Ontap' OS, now Windows ;o), so Windows isn't involved in this particular backup issue. Our Netapp also does continuous disk defragmentation as part of its maintenance/housekeeping procedures.

You've given me some food for thought though, but interested in any more thoughts/comments on this.

Cheers

Mark

RE: Backing up Volumes with Millions of files - Suggestions?

Hi again.

Well it looks like you've already covered off some of my points and you make very interesting points in reply.

Would you like to share here what you find out from your tape drive vendor?

Good luck with it.

RE: Backing up Volumes with Millions of files - Suggestions?

'Streaming mode' is another word for constant tape motion (no repositioning).

If you cannot stream one drive with your data, you will
obviously not be able to stream another tape drive that you may add to achieve a higher throughput. A lot of users do this and are surprised that the situation becomes worse.

The faster the tape drives run, the more they will 'overshoot' when there is a data 'gap'. The more they do, the longer the repositioning time will become.

LTO4s will slow down and try to adjust but each tape drive has even a minimal write speed.

RE: Backing up Volumes with Millions of files - Suggestions?

> If you cannot stream one drive with your data...

How can you positively confirm whether a given drive is streaming or not?

RE: Backing up Volumes with Millions of files - Suggestions?

Actually, you can hear that by the drive noise - a bit hard when it is inside a jukebox.

The only way to do this is indirectly, via the backup transfer rate (for a given speed).

RE: Backing up Volumes with Millions of files - Suggestions?

Or you could have a look at the repositioning information in the drive logs.

RE: Backing up Volumes with Millions of files - Suggestions?

> Or you could have a look at the repositioning information in the drive logs.

Could you be more specific about these logs? Where are they? What are they called? What writes them?

It seems that there are very poor methods for sys admins to see if a drive is streaming properly or not. Listen to it? Come now. What if it's in a noisy environment (and whose aren't)? What if it's inside a big library and you can't get near it? What if it's across the other side of the city, the country, the world, the universe?

Let's all go out in the streets and demonstrate. Let's tell our drive vendors that we want better ways to determine what the drive is doing, and especially how it's coping with streaming. Has it slowed down? By how much? Is that managing to allow streaming or not?

RE: Backing up Volumes with Millions of files - Suggestions?

(OP)
> Actually, you can hear that by the drive noise

Not possible to hear in our setup. We have 4 drives all physically installed next to each other, in a large rack height Qualstar Tape Library, which itself is in a noisy data centre.

We do have good support from Qualstar, so i will see what info they can provide in this area, although the drives are actually manufactured by IBM, but we'll see what we can find out.

RE: Backing up Volumes with Millions of files - Suggestions?

Have you considered Synthetic Fulls?  

We have a server with roughly 16 million scanned images (.tif files totaling 350GB or so).  It takes a full 72 hours to do a "Regular" full from start to finish, but just about 20 minutes on average to do it's nightly incremental and just over 3 hours to complete a "Synthetic" full.

RE: Backing up Volumes with Millions of files - Suggestions?

> It takes a full 72 hours to do a "Regular" full from start to finish

For one of my Servers (7M Files / 1.3TB) I have a Scan Time of 3 hours, and a Transfer Time of 11 Hours, so I'm suprised that 16M / 350GB would take so long - what was you Scan time / Data tranfer breakdown?

> Have you considered Synthetic Fulls?  
Synthetic Fulls may be a good solution here, but you need to make sure that you enable "Verify Synthetic Fulls".
Verify Synthetic Fulls ensure that files that are static in nature are not lost.

Markey164 - just 2 quick questions:
Have you run any performance monitoring to see if the delays are hardware based?
Is it possible that you have OnAccess AntiVirus, or something similar slowing things down?




 

RE: Backing up Volumes with Millions of files - Suggestions?

I don't have any regular fulls in my job history any longer to look at the details of how long the scan phase took vs the data transfer phase etc.  I just remember how the few times that we did a regular full, we started the job on Friday night and it would run well into Monday afternoon.  Our longest retention period is 1 year so it has been more than that long since the last regular full was run on this box.

In our case, the server is pretty old hardware, DL380 G2 and the data exists on drives in an added SCSI shelf consisting of multiple 72GB 10K drives in a RAID 5 config.

We have had to do several restores of various files and folders and have never had a problem.  I am very happy running Synthetic Fulls on this and 3 other servers in our environment where those other 3 are across slow WAN links.

RE: Backing up Volumes with Millions of files - Suggestions?

> In our case, the server is pretty old hardware

Fair call, I guess given the Hardware and Storage, 72 hours probably isn't that much of a stretch.


> We have had to do several restores of various files and folders and have never had a problem.

Do you "verify Synthetic Fulls"?

RE: Backing up Volumes with Millions of files - Suggestions?

Perhaps filter the user profiles?

RE: Backing up Volumes with Millions of files - Suggestions?

you should consider Direct to Disk Option. I don't know if Netapp filers work the same but we have an EMC filer that requires no scanning to perform a backup. Craig sounds like he is correct when he says your disks just can't keep up with the LTO4 drives. how many disks do you have allocated for the volume on you filer that is giving you the issues? What you could do is add a subclient for the volume in concern and limit his streams to 2 or 3 and you will have to change your data path and set the policy to use a specified number of resources (tape drives) sort of match the filers abilities with the tape drives. You could try and change the chuck size on your tapes but i doubt that would resolve your scanning issue. I had a VTL for almost a year emu a IBM LTO4 drive ran very well but management was pain. I have since taken the unix head off of it offloaded compression to the MA moved the commserve off and i literally went from managing Commvault everyday to not having touched it in a month. DDO is the way to go Commvault handles it so much better and they will tell you this.

RE: Backing up Volumes with Millions of files - Suggestions?

Look at CommVault's Data Classification Enabler (DCE) - it's designed to significantly speed up the scan by using its own separately maintained change database. It is separately licensed, but it might be worth it for the system affected by your problem. Perhaps CV will give you an evaluation key for it so you can try it out.

Go to Books Online and search for "Data Classification Enabler".

RE: Backing up Volumes with Millions of files - Suggestions?

A Synthetic Full backup policy would ensure that you only backup the incrementals, this will reduce the backup time. The DCE will reduce the scan phase.

If after using these two you are still not happy then teh Image Level iDA is the way to go, this is block level so the file system being populated but millions of files is irrelvant.

In regards to a backup target.
Yes, you need to stream the tape drive in order for this to be an effecient device. Using disk targets is an option and certainly they way to go if you want to use sythetic full backups. The incremental can go to disk and full/sythFull goes to tape.

---------------------------------------
EMCTA & EMCIE - Backup & Recovery
Legato & Commvault Certified Specialist
MCSE

RE: Backing up Volumes with Millions of files - Suggestions?

(OP)
Sorry for the delayed response guys, been on holidays clown

@ Cabraun - Yes we have considered synthetic fulls. However, the scan phase itself seems to be 50% of the problem. Even if we move to synthetic fulls, it will still have to scan 30 million files regardless of whether the job is an incremental or full, so there is still a lengthy delay present. Whilst synthetics will help somewhat, i'm not sure this is the optimum solution.

Interesting your incremental only takes 20 minutes for 16million images. I wonder if the difference here is something to do with it being NDMP in our case, rather than a regular server backup?

@Psy053

* No i havn't run any performance testing. What would you advise in this scenario with a Netapp Filer?

* No AV software involved. This scenario is backing up a Netapp Filer. It runs a proprietary Ontap OS, and doesn't have any AV software on it.

@ standiferl - Hmm searching 'direct to disk' on the Commvault books online only returns 2 results, neither of which are relevant. I've heard of the feature, but can't seem to find any documentation on it. Is it perhaps known by another name? I know Commvault do change the names of their features from time to time.

In answer to your query on the number of disks in the volume in question, its a 7.25 TB volume, but i'm not sure how many disks its using, other than saying its obviously several. I don't directly look after and maintain the netapp, and i'm not sure how i can determine how many disks it is using.

@ CraigMcGill - I've had a read through the Data Classification Enabler, but it only talks about Windows idata agents. Can you confirm for sure if you can use DCE with NDMP, as the documentation doesn't make any mention of NDMP either way.

@Calippo - Regarding Synthetic's, not sure if this will entirely solve our problem as per reply to Cabraun above. Regarding DCE, the documentation only talks about windows idata agents, so i'm not sure if you can use this with NDMP?

The image level iDA looks interesting. I've not used this before, but i'll look into it.

Thankyou for all the responses so far guys. Hopefully this discussion will help others in a similar situation ;o). We'll keep experimenting, and discussing with our consultants and report back any progress or solutions we find.

RE: Backing up Volumes with Millions of files - Suggestions?

Sorry Markey

I didnt see any referance to a NAS client, my mistake.
With generic NDMP you can do Incr and Full.
If this NAS is a Celerra than rather than the PAX method using dump you could do VBB instead which is block-level and is supported with CommVault but thsi wont apply for you as your NAS is a NetApp.

I dont know of any other NDMP block-level backup supported via CommVault but perhaps NetApp have something simular. CommVault have excellant support on both EMC and NetApp.

regards

---------------------------------------
EMCTA & EMCIE - Backup & Recovery
Legato & Commvault Certified Specialist
MCSE

RE: Backing up Volumes with Millions of files - Suggestions?

It looks like you have three options.

First option, persue with NDMP but you should consider creating qtrees inside /vol/data so that the files to be backed up are logically partitioned. This will be much faster, as the filer won't need to walk the filesystem to discover files to be sent to tape, it can simply use the inode table attached to a given qtree. Qtrees have the same appearance as regular directories for users, so they won't notice any difference or need to make any share configuration changes. However, you can't create qtrees "in place" for directories like /vol/data/groups.  You will need to create the qtrees and then move the files into them.
Using qtrees you should see a marked reduction in the time for NDMP Phase I to complete.

Second option is that you could bring up a Windows or Unix iDataAgent and map those CIFS/NFS shares from the FS iDataAgent. With this method you can do a synthetic full backup policy (which you can with NDMP), then the only issue is the scan time which would be offset by the savings you gain in doing incremental backups.

I recently setup a Windows FS iDA to backup a Celerra. The NAS has 3TB of data and ten CIFS shares, 1GiG Ethernet. We configured a subclient for each share and defined synthetic full backups. The reason we did this rather than NDMP was because we wanted to single instance the backups to disk to the SIS enabled storage policy and also have the option of CI support. This works great by the way and a lot of other NAS users do this instead of NDMP.

Third option is that you may do SnapMirror to Tape, this is an nice NetApp solution to the problem but its not integrated so it would seem with the CommVault NDMP client.

Thats the end of my MindMeld.

---------------------------------------
EMCTA & EMCIE - Backup & Recovery
Legato & Commvault Certified Specialist
MCSE

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close