INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

SSIS Insert New Data Only to Backup Tables

SSIS Insert New Data Only to Backup Tables

(OP)
Hello,

I am new to using SSIS and have been tasked with creating a package to be a backup for some tables. Basically I have a total of 6 tables which have data in them that I need to create copies of in another database. I have created the 6 identical tables in the other database but was curious if someone could explain what SSIS tools I would need to use to create a package that would allow me to copy the NEWLY inserted rows from the original 6 tables to the newly created 6 tables for my backup in the other database. I have stumbled my way through having SSIS copy all the rows over but that will become cumbersome in the future as 220,000 rows will quickly turn into millions within the next few years. So I don't want to copy that many rows everyday when most of the data is already going to be inside of the tables already.

The original 6 tables are updated constantly all day and I just need to be able to run an SSIS package once per day to move that data into the backup tables, but again I just want to try and move only the new rows and not copy the entire table over again.

Any help would be appreciated.

I am using SSIS 2012.

Best,

Kevin C

RE: SSIS Insert New Data Only to Backup Tables

several factors get into place on a situation like this.

first question - will the backup tables be on a database on the same instance, and if so should have them on a different instance be an option to this design?

If the answer to the first question is no to the first point or yes to the second point then you have 2 options
1 - CDC or CT (data capture or change tracking) or custom trigger (this last one required a unique way of identifying rows)
2 - copy full tables

if the answer to the first question is Yes and No respectively then you may have another option if all tables have a way of uniquely identifying the rows. This option is to join both live and archive tables on their unique keys and copy over the new records based on this criteria - with high volumes this will eventually get slower with time

and finally if all tables have a "insertedOnDate" column that only gets updated on then this can also be used as a way to limit data transfer only to new rows.

And note that nothing of this is SSIS - it is all data selection.

All options can be done with a SSIS dataflow task - the join option can also be done with a straight insert into the archive table.
And for the purists I am not considering a linked server as an option for this as high volumes would make it quite slow

Regards

Frederico Fonseca
SysSoft Integrated Ltd
www.syssoft-int.com

FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

RE: SSIS Insert New Data Only to Backup Tables

(OP)
I did read up on CDC but I don't think my boss will like that, just a hunch. One concern I do have, about copying the full table every time is that the program that is constantly populating the original 6 tables has had the data go missing whenever the program had an update. So that is why I am creating the backup. So if I am copying the tables over that won't work because if the table has missing data and I copy it would mean that my 6 backup tables will have missing data. So it sounds like I would just need to make an argument for the CDC, which I can just add the changes that are collected in the CDC to my backup table right? I know how it does keep a record of inserts, updates, and deletes but nothing I read on it talks about adding the inserts,updates, and deletes to the table. Sounded more like an auditing tool to see when something was altered and how it was altered.

Thanks again.

RE: SSIS Insert New Data Only to Backup Tables

CDC will keep for a period of time a full copy of each record inserted, deleted and updated (before and after image).
so when using CDC one copies the data directly from the CDC version of the table. This is the one to use if you need the original record inserted into the table before any update is done to that record.

CT will only keep a list of the key fields and metadata about which fields changed within the record.

see https://technet.microsoft.com/en-us/library/cc2805... for details.

It can be used as an auditing tool but it is commonly used for incremental updates of other tables which is precisely what you are after.


the CDC tables are created on the system tables within your database, with schema CDC.
sample would be
dbo.mytable
would have a cdc table created as
cdc.dbo_mytable_ct
with some extra columns used by CDC

it does require the enterprise or developer editions.
see https://technet.microsoft.com/en-us/library/bb5224...

using CDC you would normally have a configuration table that contains the last LSN that was extracted by each table.
you then extract the data from CDC using one of their table functions supplying the stored LSN as the start point of extraction, and the current LSN as the end point.

quite easy to use really



Regards

Frederico Fonseca
SysSoft Integrated Ltd
www.syssoft-int.com

FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

RE: SSIS Insert New Data Only to Backup Tables

Hi Kevin

Try this
Lets call original data the Source, and the Backup will be the Destination

I believe that your Source have Primary keys. After creating the tables on the destination, open you SSIS project.

Pull in a Data Flow Task per each table, and have your OLE DB Source and Destination. In the OLE DB Source, have a query that references both your source and destination tables

e.g.

CODE --> Server

SELECT Column
     FROM SOURCE.dbo.TABLEA A
     
     WHERE  A.PRIMARY_KEY NOT IN (SELECT B.PRIMARY_KEY FROM DESTINATION.dbo.TABLEA B WHERE B.PRIMARY_KEY IS NOT NULL) 

This will return only the new data from your source table.

Then Join OLE DB Source to you OLE DB Destination and make sure your columns are mapped accordingly.

Remember that the above will only work if you do not want the updated records from your source to be be updated in the destination tables.
If you want data to be updated, you need to use a Slow Changing Dimension


Regards

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close