Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help - Translog Recovery takes too long 1

Status
Not open for further replies.

k108

Programmer
Jul 20, 2005
230
US
Hi,

Need some ideas.

We have a database with a very high number of transactions. As such, the transaction log grows very large in a short amount of time. There is an automatic script running that does a translog BACKUP, then once the backup completes, it RESTORES it to a standby database.

However, because the translog is so big, the restore is taking a very long time. So long, in fact, that the time overlaps with a new backup starting. This is a problem because the backup can't run if the restore is still happening.

Is there a better way to approach this problem?

Thanks!!
 
Check out my FAQ that I wrote on how to write your own log shipping faq962-5754. It's in the Setup and Admin forum.

I've run it before as a couple of different companies of some very high load transaction logs and it works great. Several other Tek-tips members have used it as well.

Unfornitually there isn't any way to speed up the restore of the transaction logs. The code in my log shipping scripts will address this issue for you however.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
Ok, thanks. I will take a look at this.
 
Hi mrdenny,

Some questions for you regarding your script:

1.) Is there an advantage to using osql to run the RESTORE commands?

2.) What is the difference between "Quit the job reporting success" vs. "Quit the job reporting failure"? (Couldn't find this info on BOL)

3.) Your job makes sure that no one is using the database that is about to be restored. If the database is in standby, would this ever be the case? Do you mean like users querying the database, or something else?

4.) What is Northwind.2.log? I am confused about this, as there is no reference to this file previous to the following commands:

exec xp_CMDShell 'del d:\RestoreFolder\Northwind.2.log', no_output /*This removed the last file we processed.*/

exec xp_CMDShell 'move d:\RestoreFolder\Northwind.log d:\RestoreFolder\Northwind.2.log', no_output /*This moves the current file into place for processing.*/

(That is, how do you know what the name of the last file was?)

5.) Referring to question #4 above, is this equivalent to using the INIT clause of the BACKUP LOG command, which would simply overwrite existing log files of the same name? And if so, is there a reason NOT to use the INIT command, given that you do NOT want to save old log files?

(Note: one thing I've noticed, is that if a log file is corrupt for some reason, BACKUP can't overwrite the file - so perhaps this is the reasoning here?)

Conclusion:

What differs from your script vs. our existing script, is that we do DIFF backups and we do not save old log files. They are constantly overwritten, so there is only ONE log file to be restored at any given time. In other words, we do this serially... one backup, one restore, one backup, one restore, and so on.

Maybe that is the problem?

But since it is very unlikely we are going to make any significant changes to the way this currently works, I guess maybe the best solution is to make sure that the restore isn't running BEFORE another backup starts. However, I haven't figured out a good way to do this yet!

I guess that is what I'm trying to figure out. Any ideas, again, would be appreciated!







 
1. No there's no advantage to using osql to run code over QA.

2. If you set a job step to "Quit the job repoting success" the job will stop running after that step and will be marked as Successful. If you set the job to "Quit the job reporting failure" if will stop running after that step and will be marked as failed.

3. If the database is in standby then no this shouldn't be a problem but anything can happen. Better to be safe. Yes I am referring to people querying the database (when you open EM it queries all databases on the SQL Server).

4. Northwind.2.log is the name of the file that is restored from. In my example you backup to the Northwind.log file. When the restore job starts it renamed Northwind.log to Northwind.2.log so that the backup job can continue to run while the restore job is running. This way, even if thier is a failure of the primary server the logs are there on the backup server they just need to be restored.

Only two file names are used Northwind.log and Northwind.2.log.

5. If you use the INIT command and there are logs in the file which have not been restore yet, your next restore will not work. You have to have all the logs in order to restore.

My scripts don't do any DIFF backups it's a pure log shipping method. Within my scripts there is only one physical file, however that file can have many backups within it.

To find out if the job is still running create a linked server from the primary server to the backup server and query the backup server for the job status of the restore job. If it's still running fail the job. If it's not running continue and backup the database.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
(USING CAPS TO DIFFERENTIATE THE TEXT HERE)

4. Northwind.2.log is the name of the file that is restored from. In my example you backup to the Northwind.log file. When the restore job starts it renamed Northwind.log to Northwind.2.log so that the backup job can continue to run while the restore job is running.

Only two file names are used Northwind.log and Northwind.2.log.

DOES THIS RENAMING HAPPEN AUTOMATICALLY? THAT'S WHAT I'M NOT UNDERSTANDING. HOW DOES NORTHWIND.2.LOG GET NAMED EXACTLY? (

I UNDERSTAND THE LOGIC OF USING 2 DIFFERENT NAMES, JUST NOT HOW IT GETS NAMED)


To find out if the job is still running create a linked server from the primary server to the backup server and query the backup server for the job status of the restore job. If it's still running fail the job. If it's not running continue and backup the database.

YES, I HAVE A WAY TO CHECK IF THE JOB RUNNING. BUT NOT SURE HOW TO CODE IT WITHIN MY SCRIPT. THAT IS, IT NEEDS TO BE A RECURSIVE STEP:

1.) CHECK IF RESTORE JOB IS RUNNING
2.) IF NOT RUNNING, START BACKUP
3.) IF RUNNING, DO NOT START BACKUP, WAIT, AND CHECK AGAIN LATER - CHECK AS MANY TIMES AS NECESSARY, UNTIL THE RESTORE JOB FINISHES AND THE BACKUP CAN NOW BEGIN.

IT'S STEP #3 I'M NOT SURE HOW TO CODE.

THANKS VERY MUCH FOR YOU HELP!! AT THE VERY LEAST, IT WAS GOOD TO SEE ANOTHER VERSION OF LOG SHIPPING.


 
There is an xp_cmdshell command that renames the file using the move command.

Code:
exec xp_CMDShell 'move d:\RestoreFolder\Northwind.log d:\RestoreFolder\Northwind.2.log'

Setup your job step something like this.

Code:
if ({code that checks if the restore is done}) = true
BEGIN
    BACKUP DATABASE ...
END
ELSE
BEGIN
    raiserror('The restore isn''t finished yet.',16, 1)
END

Set the job retry on the step to retry every 1 minute a max of how ever many times are in your backup interval.

If you backup every 10 minutes retry 9 times every 1 minute.

With this is the restore is still processing then the step will fail and will retry. If it's done the step will backup the database.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
Thanks for your help. Actually I found a much simpler way to do this, by breaking the job out into separate steps - avoids that whole recursive stuff and will be a lot more stable.
 
ok, i see how the naming is done, wasn't looking carefully enough
 
OK, but I do have ONE more question about your script!

Since you are only using the two files, Northwind.bak and Northwind.2.bak - why do you need to loop through the header file in Northwind.2.log to get a log file count??

Isn't it just the one file, Northwind.2.log?

Maybe I am missing a critical concept here?

Thanks

 
Because within that one file there could be more than one backup.

The backup script on my scripts doesn't look to see if the restore is finished. It doesn't care. It just continues to backup to the file. That way the log is always shipped every five minutes no matter what. This was important for the first system I used this on.

We had to ship the data every five mintes no matter what. Some times it took more than five minutes to restore the data so we would get two or three backups into the file before the next restore could start. Even though the backup system wasn't in sync if the production server crashed we had the logs and could get it in sync rather easily (within the 5 minute of lost data window).

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
Yes, I see. I guess I don't have to worry about this, because we use INIT on our logfile backups, so they get overwritten... there will never be more than one file per backup.

BTW, how were you able to enforce logs were shipped every 5 minutes? Isn't it dependent on how long the log backup takes? - what if there's lots of transactions the log backup doesn't fit into the 5 minute timeframe?

 
There wasn't anything we could do about a backup that took more than 5 minutes. Their weren't to may times when the backup took more than 5 minutes. Pretty much after database optimization and reindexing and after the full backup there were more than 5 minutes worth of logs so it would take an hour or so for the system to get back in sync.

If the full backup took 2 hours to complete the t/log backup would take like 30 minutes, then the next one would take like 10 minutes, then the next one like 4 minutes and the backups were back on schedule. We simply explained to the business that in order to keep the system running fast we had to have some windows where we were allowed to excede the 5 minute rule.

They were ok with that, not happy with it, but ok with it. We also told them that with the RAID 5 array + hot spares that were installed in the production system the odds of a failure while the system was doing the large backups wasn't very high.

To give you an idea of the load on the system it was the ad banner system for a site with ~10,000,000 page views per day. Each ad on the page had 2 hits to the database and every page had between 3 and 7 ads on it. That's about 405 transaction per second (~35,000,000 per day). The logs on the system were huge.

We had a multi-drive failure on the primary system (it happened about one every 16 months). With this log shipping system up and running we lost 3 minutes worth of data. This accounted to only a few hundred dollars. The previous time the system had failed before the log shipping was setup they lost 2.5 days worth of data (this was long before I got there). The 2.5 days worth of data cost them a fortune (~$360,000). Once that happened it was deturmined that it was worth the cost and time to set it up.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
Hi,

Fortunately our system is not as mission-critical as that! However, I would like to see it running more smoothly :)

I guess the concept that was hard for me to understand was that translogs take as "long as they take" to be backed up and restored. In other words, it's dependent on the number of transactions in the log, yes?? You can't set a timer to them. But from what you said, it does sound like you can tune the database in order to improve the backup/restore times of the logs? How?

Also, when people say "back up the logs more frequently", I'm not sure what this means. We do ours like this: full backup once a day, diff backups, then translog backups. Diff and translog backups run all day long. So I don't think I can increase the frequency!
 
Correct, log backup length takes longer the more transactions you have.

You can't really tune the database to get backup and restore times faster. About all you can do is backup more often.

Why are you doing diff backups all day long. A normal backup schedule should look something like this.

midnight - full backup
3a, 6a, 9a, 12p, 3p, 6p, 9p - diff backup
all the hours that are left - tran backup (if you need to do these more often then by all means.

Doing a diff backup every time you do a tran log backup is purpose defeating (and will make the backups take a log longer than is needed). If you are going to do diff backups each time you do a tran backup you may as well be in simple recovery mode and not bother with the tran log backups.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
Well, I guess I didn't explain that fully.

One full backup per day, then:

Diff backup 1, then translog backups run for 2 hrs

Diff backup 2, then translog backups run for 2 hrs

Diff backup 3, then translog backups run for 2 hrs

This three step process repeats throughout the day. So this is close to the 3 hr schedule you describe.

Having 3 different Diff backups at any given time gives us more point in time recovery options.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top