Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Replication with a twist.....

Status
Not open for further replies.

SQLBill

MIS
May 29, 2001
7,777
US
SQL Server 2000 SP3

I currently have a 200+ GB database with 9 months of data. The more data added to the database, the slower it gets (obviously). I currently have permission to 'trim' the database to 6 months of data. But I am working on a 'better' solution and need some advice/suggestions.

The majority of my users only need 30 days worth of data (short-term). A small amount need 6 months to a year worth (long-term).

My plan is to have two separate databases (different servers). One will hold 30+ days of data and the other will hold upto a years worth of data.

Issues.
1. The long-term database needs to include the same data that is on the short-term database (other words, it needs to be current) along with the older data.

2. If data is removed from the short-term database, I can't have it removed from the long-term. (on day 31, when I remove data to get it back to 30 days, I don't want anything deleted from the long-term database). This tells me that transaction replication won't work (I could be misunderstanding this type of replication).

3. Both databases need to be up 24/7 and query-able by the users.

Example:
Long-term database holds 1 July 2004 to 1 June 2005
Short-term database holds 1 June 2005 - 30 June 2005
July begins and I want to remove 1 June 2005 from the short-term database. This must not replicate to the long-term database, 1 June 2005 must stay in that database and the new data from 1 July must also be in there. So now I would have:
LT Db: 1 July 2004 to 1 July 2005
ST Db: 2 June 2005 to 1 July 2005

Eventually, data would be removed from the LT Db to keep it at a year. The main issue is replicating the data from the ST Db and being able to purge data from ST Db without it purging from the LT Db.

Transaction Replication seems to be out, as I understand it would duplicate the purge (DELETE) to the long-term database and makes the replicated database an exact copy.

Merge Replication is for several databases into one and that's not the case.

Any suggestions as to how to accomplish this via SQL Server?

A non-SQL Server solution that I am researching is to have the 'data feed' send the data to the two different servers at the same time.

-SQLBill




Posting advice: FAQ481-4875
 
Questions:

1) How concurrent do the two DBs need to be?

2) Which one (assuming you're not going to have 3 dbs) will be the one the users see all the time, the 30 day or the long one?

I'm going to post something that has some flaws, but maybe it'll stimulate discussion on a proper way to do this. Or maybe it'll give you some ideas.

First: Row Filters are your friend. @=)

Have you ever used them before? If not, this is the perfect reason to start. The first caveat is that you can't use JOINs on ROW FILTERS. It has to be a subquery in the WHERE clause or a regular WHERE clause.

I'd recommend giving all Tables that don't easily connect back to the main table or have a date associated with them a DateStamp column with a default of GetDate().

Once you have your Distributor, Publisher & Subscriber set up (Set up Distributor on one of the Subscribers to remove burden from main DB), create your publication(s) with Transactional Replication (requires persistant network connection, but keeps concurrency good). 1 pub for long term. 1 pub for short term (if you need it).

Use the Row Filter to pull only records where Date >= '01/01/2005' (for example). Have the Date @ 30 days + for the short and 356 days for the long.

Now here's the problem. Without manually changing the row filters on a daily/weekly/monthly basis, then truncating the subscriber's tables and re-running the SnapShot from the Publication's properties, the replication will continue to replicate data from that specific date on. I don't know of any good way to have this date change automatically and to not replicate old data.

I'm personally having trouble with PULL subscriptions never running, then breaking Rep when they do run. But this might be because of the type of replication I chose.

My second thought is to do a read-only DB via a log shipping type solution, but I'm not sure if that will suit your needs. One of my employers had me combine the two solutions so that certain dated information replicated to our server and then we log shipped it to their server (which I'm sure you've seen my posts on).


... After I posted all that REP stuff, I just thought of something. Have the 30 day DB face the users. Set up a job to do a complete DB backup every 30 days and a full Truncate Table setup. At this point you can use Snapshot or Transaction Replication. Snapshot you'd run between the backup and the Truncate. Transactional will run all the time. MERGE Replication = BAD in this situation.

The full backup ensures you don't lose the data if something fails in Rep. The Truncate Table job will clear the 30 day DB on a regular basis (or you could just do a Delete MyTable where DateStampCol > (GetDate() - 30 days) kind of thing).

Does any of this make sense? Sorry if I got confusing.



Catadmin - MCDBA, MCSA
"If a person is Microsoft Certified, does that mean that Microsoft pays the bills for the funny white jackets that tie in the back???
 
Replication is out.

It's time to think federated databases.

It's kind of an off ballanced federation, but it still fits the bill.

If you aren't up to speed on federated databases, it's something like this.

shortterm server has an invoices table with data from today to getdate()-30
longterm server has an invoices table with data from getdate()-31 and back.
nightly you move data from shortterm server to longterm server.

To query the data you have a couple of options.
1. Use a view like this.
Code:
create view vw_invoices as
select * from st.db.dbo.invoices
union all
select * from lt.db.dbo.invoices
go
2. If you want to use a proc, name basic idea.
Code:
create proc usp_invoice
    @Invoice_ID int
as
select * 
from st.dbo.dbo.invoices
where Invoice_ID = @Invoice_ID
union all
select * 
from lt.dbo.dbo.invoices
where Invoice_ID = @Invoice_ID
3. The "Denny Version" of #2.
Code:
create proc usp_Invoices
   @Invoice_ID int
as
if exists (select * from Invoices where Invoice_ID = @Invoice_ID)
   select * 
   from st.db.dbo.Invoices 
   where Invoice_ID = @Invoice_ID
else
   select *
   from lt.db.dbo.Invoices
   where Invoice_ID = @Invoice_ID
go
#3 lessesn the load to the long term server becuase it onyl checks that server when the data isn't available on the current system. This way only queries that require the old data are slow. If the data isn't old the query will stay fast.

#1 is the Microsoft standard federated database setup. In thier setup you actually rename your tables to something like p_invoices and name the view invoices so that the front end doesn't need to be modified.

The other bonus with #3 is that if you take the long term server down, the system will still keep running. With the other two options everything will grind to a halt because the union queries will die when they can't hit the other server.

If you need to have all the current data in the other database that isn't a problem, you'll just need to handle that data inserting into the long term system in some method. I'd recommend a nightly push accross, and simply write your reports using a union all accross the two servers to get todays data included.

If you need real time data, look into using Microsoft Message Queue to send the commands to the long term server.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
CatAdmin,

1. By concurrent, do you mean how current does the data need to be? They need to both have the same up-to-the-moment data.

2. Both will be seen by users. Just two different groups of users.
One group only cares about data from "this moment" to upto a week or so.
The other group needs data from "this moment" to upto a year old.

Can't make ANY changes to the database/table structure - it's proprietary 3rd party.

I can't get rid of the whole table at one time. This is kind of a first in - first out type of thing. 1 June will be removed when 1 July is in the database.

One thing.....this database ONLY gets INSERTS. It is never UPDATED. And the only time there are DELETES is when I need to purge/remove old data.

MrDenny,

If I follow you, the long term database would not have the data that is in the shortterm database. Right now that can't be the solution. The users of the long term would NOT have access to the short term database. ALso, the 'push' of current data can't happen at hte end of the day, it has to happen frequently to keep the long term database up-to-date.

Both,

Thanks for your input and any further suggestions on this. Because of the requirements, I might have to find a way to 'split' the data feed and have the data sent to both servers at the same time. But I'm still trying to find out if it's possible to do via SQL Server.

-SQLBill

Posting advice: FAQ481-4875
 
That's not a problem.

I was thinking that this all had to be handled by a single front end, that needed to be able to query both database servers.

What you'll be needing will definetly be a hand written system.

I'm assuming that the data would be inserted into the shortterm server then pushed to the long term server?

Based on that, I'd recommend putting triggers on the tables on the short term server that need to be pushed over that insert any data that's put into the shortterm server into the long term server.

If you wanted to make the system a little more redundant you could have the triggers put the data into a holding table on the short term server, then have a job move that data to the long term server every minute (or less if you put the job into a never ending loop).

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
MrDenny,

You and CatAdmin are showing me that I'm not going to be able to do this via SQL Server. Oh well...

MrDenny said:
I'm assuming that the data would be inserted into the shortterm server then pushed to the long term server?
That's what I was hoping to do.

MrDenny said:
Based on that, I'd recommend putting triggers on the tables on the short term server that need to be pushed over that insert any data that's put into the shortterm server into the long term server.
Can't do that. Putting triggers on the tables violates the third-party software license and means the application may no longer be supported.

Both of you have given me lots of good information. Now I can go to my management and tell them we are going to have to take the incoming data and send it to the two different servers at the same time.

Thank you both for your input.

-SQLBill




Posting advice: FAQ481-4875
 
It kind of sounds like that those who need the historical data may be using it for reporting or analytics. If this is the dase don't replicate your DB but rather build a Data Warehouse which is optimized for reporting. A well designed DW using a star schema and Cubes being used for the reporting can serve years worth of data in mere seconds. You could build an ETL Process to Extract the current data from your existing system(s) and then load it into the DW. DWs can easily hold years worth of data if properly designed.

Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
MDXer,

Both databases will be used for analysis and reporting. The long term will only hold 12 months of data.

Data warehousing is out due to the cost.

-SQLBill

Posting advice: FAQ481-4875
 
SQLBill,

Sorry I wasn't able to reply sooner. Suddenly got swapped at work.

I'm also sorry we weren't able to help more. It's a shame your third party software is so strict. Have you asked them if they have an option for situations like this? They may or may not have run into this problem before.



Catadmin - MCDBA, MCSA
"If a person is Microsoft Certified, does that mean that Microsoft pays the bills for the funny white jackets that tie in the back???
 
The vendor, us and other companies all use the vendors software to do remote monitoring. (Consider ADT - the home security company. Let's say they sold and allowed other companies to use their programs to also monitor homes. That's not what we do, but it will give you an idea.)

The vendor and other companies all maintain only 30 days of data - period. They don't maintain archives. We are the only ones maintaining such a large amount of data. So our situation is unique and the vendor HAS told my upper management "Why are you keeping data that long?".

So the vendor doesn't have a solution for us as keeping that much data isn't in their business plan for their program (even though it's capable of it).

-SQLBill

Posting advice: FAQ481-4875
 
Why not just maintain one db copy? Set up sepatate Query and programming to display data for both groups?
 
SQLBill,

Just ran into an article on TechRepublic that may or may not help you. It doesn't involve replication, but it *does* involve creating a FIFO stack.

Here's the link:


Does this give you any ideas?

My thought is there might be a way to combine replication with the FIFO stack. Not sure how well it would work, though.



Catadmin - MCDBA, MCSA
"If a person is Microsoft Certified, does that mean that Microsoft pays the bills for the funny white jackets that tie in the back???
 
CatAdmin,

I'll take a look at that link as work permits. Thanks.

PeteAmsterdam,

The problem is with the amount of data. Right now I can not keep more than 6 months of data without severely impacting performance. Even 6 months of data is impacting performance. For me, 6 months of data is over 200 GBs (currently 214 GBs of data).

I have a waiver so that I don't need to keep a years worth of data for now, but I'm supposed to find a solutiona and that's what I'm working towards.

-SQLBill

Posting advice: FAQ481-4875
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top