Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to split my tables, and when to use referential integrity? 2

Status
Not open for further replies.

Tekime

Programmer
Jun 5, 2003
30
US
Warning, this is a bit of a novel! Please bear with me :)

Hello,

This is kind of a two part question. I'll give a bit of history so you understand the project.

I'm working on an Access database and set of forms/reports for processing and storing chargebacks for my company. The processes for chargebacks are as follows:

1. We receive a chargeback notification via post. Notifications are sent to data entry (DE).

2. DE will enter the basic notification info (case #, transaction #, merchant account #, reason for chargeback, date of receipt and due date)

3. Notifications are sent to me, I will review the case and enter response info (internal/external reponse to chargeback and date of response)

4. Notifications responses are printed, and sent to accounting for final review. Accounting will review for final mistakes and ship them via Airborne Express and enter final response info (date of airborne, airborne tracking #).

5. We receive a response from the merchant account provider via post, this info is sent to DE.

6. DE enters the MAP final response info (response ID and date)

Right now I have a single table - chargebacks - that stores most of this info. There are several other reference tables for transactions, orders, response types, etc. There will be three different forms - one for data entry, one for myself (response/reporting) and one for Accounting to record the shipping info.

It seems like a little bit too much to split the chargebacks table in three for each of these processes, yet I would like to use referential integrity restraints on these tables. But it's impossible to enter part of the table without some of the required fields.

What would you do? Split the table into two tables? Three tables? At this point I think I might split the table into two, one for DE, one for myself, and have no restraints on the Airborne info. I'm not totally sure thouhg, this is my first big Access project and I really don't want to screw up and regret somethign later on.

Here is a pic of the relationships diagram:



Hopefully this will help you understand the DB a little better. Your help would be greatly appreciated, I know this is a lot to digest... :)

Thank you

 
There are two basic reasons I can think of to use relational integrity.
1. To cascade updates and deletes without having to write code to propagate the operation to child tabes, or alternatively to block them without having to write code to test for references in child tables. Since you're using surrogate keys (autonumbers) for everything (good idea!), the update part probably doesn't apply. You didn't say whether deletions are allowed in the application, but if they aren't then this reason becomes moot.
2. To catch programming errors which would leave the database in a paradoxical state. Once an application is finished and working, RI is normally superfluous, at least until maintenance changes the logic.

That said, I prefer to keep RI active anyway. In the context of an Access application, where you're not usually doing high volumes of transactions and mostly you're in interactive mode, the overhead is negligible. Having the guarantee of integrity is well worth it.

But I think there's another, non-RI reason to consider splitting up your table. With separate tables, you and the accounting department won't get in each other's way when you're trying to update the same chargeback. You can use pessimistic locking, which avoids the nasty surprises you can have with optimistic locking, and the coding is much easier.

The problem with optimistic locking ("No Locks" in the form's Record Locking property) is that, if two users go about updating the same record simultaneously, the second one will get an error message that allows them to review the changes the first user made, and then offers to overwrite the first user's changes, leaving no trace they were ever entered. The alternative is to cancel what they were doing and start over. Users with staunch integrity will be annoyed, and users with less integrity will take the easy way out and overwrite. The only way to avoid this temptation is to write a whole bunch of code to figure out what each user changed independently, and blend their changes if possible. It's a no-win situation.

With pessimistic locking ("Edited Record" in the Record Locking property) the problem doesn't come up, because the second user will be unable to update the record until the first user is done with it. And if the data to be changed is in different tables, even that delay doesn't happen--so much the better!

Rick Sprague
Want the best answers? See faq181-2886
To write a program from scratch, first create the universe. - Paraphrased from Albert Einstein
 
Rick, I truly appreciate the thoughtful reply :)

I didn't mention deletes because I'm not completely sure yet. I think that I will allow DE to delete records, and I will definitely want that ability. While I don't see us removing useful records, I do see the occasional mistake (double-entries maybe?).

I split my table tbl_Chargebacks into two tables: tbl_Chargebacks and tbl_Rebuttals (for the notifications we receive and the rebuttals we prepare respectively). I think this is more in line with the application processes in the first place. I will also create a third table with three values - rebuttalId, airborneNo, and dateShipped. Then Accounting can record outgoing shipping information, and I can use pessimistic locking without worrying about fighting over records.

I had done a little research on locking but your explanation was really a big help. Actually, it was a huge help. I think I was kind of fumbling along the same path you are recommending, but your post really clarified the bigger picture.

Regarding RI, I don't like the idea of any additional overhead unless it's necessary. The tbl_Orders table will most likely be populated with ~100,000 records (and up to 350,000) right off, and will continue to grow after that. For every order in the orders table, there can be one, two, four or ten transactions. As you can see the transactions table is going to grow exponentially in relation to orders.

Every time DE enters a new chargeback they enter a ccNumber into a text field, which triggers some VBA to set the rowsource of another combo box to a list of transactions made on that ccNumber. There are more than a few queries like this in the application, and I'm afraid of too much overhead. Every new chargeback or rebuttal means another query on a 100,000+ record table, which seems liek quite a bit to me. Without practical experience in Access with large record sets, I'm not sure what the impact of RI would be or even if it would matter when I am not performing cascaded deletes, etc.

Thanks again for your help Rick, you really made my morning seeing this post!

 
Whether to use RI with volumes this size is a judgment call. Here are some facts about the impact of RI:

Impact On Storage (disk space):

RI-enforced relationships require unique indexes on the related field(s) in the one-side (parent) table. In most cases, and specifically, in your case, the parent field is the primary key, so you would have a unique index anyway. In this situation, RI requires no extra storage space. Zero. Nada.

RI processing benefits from, but does not require, a non-unique index on the related field(s) in the many-side (child) table. In your case, these are Long fields, and Access will default to indexing them anyway. If you have tailored your indexes, you may have deleted some of these indexes, but you should really add them back if they support RI, so there might be some additional index space required here.

Impact On Processing Time/Network Traffic:

When a row is inserted into a table that is the child table in one or more RI-enforced relationships, the parent table's index for each such relationship has to be searched for a matching key. In most cases two page fetches are sufficient for each relationship, but it could be as many as four. It's also possible none will occur, because the index pages are cached locally and have not aged too much. In addition, I believe there will be lockfile accesses to lock the pages being read.

When a row is deleted from a table that is the parent in one or more RI-enforced relationships, the child table(s) have to be searched for foreign keys referring to the parent table row. This is again usually a two- to four-page search if the foreign keys are indexed; otherwise, a table scan must be done and the entire table must be read. Note: If rows are found, and you allow deletes to cascade, there will be many more page accesses to delete the rows and update the indexes of each child table, plus lockfile accesses for each page updated. Although this sounds intimidating, keep in mind that you would still have to do this in code if you don't let Access do it automatically. (Well, you could allow your database to lose integrity by leaving the child table rows orphaned, but I assume you don't want to do that.)

When a field which is part of an RI-enforced relationship is updated in a parent table, the child table(s) have to be searched just as they were for deletes. Also, if you cascade updates, the child table(s) and their indexes have to be updated. Again, if you don't let Access do this automatically, you'll probably have to do it in code yourself anyway.

When a field which is part of an Ri-enforced relationship is updated in a child table, the parent table has to be searched for the new value of the foreign key. Like insertions, this is an index scan requiring two to four pages. Note, however, that when using surrogate keys, it is unusual to allow updates to foreign keys in child tables, so this situation doesn't occur very often.

In summary, there are two cases where RI may cost you something relative to doing RI checking in your own code:
(1) During insertions, if you already know that the foreign key is valid, leaving RI unenforced will save you an index scan on the parent table.
(2) During updates, if you allow foreign key changes, and if you already know that the updated foreign key is valid, leaving RI unenforced will save you an index scan on the parent table.

Considering the effort of writing your own integrity checking and/or cascading code, and the potential for bugs to creep in and ruin your database's integrity, it's pretty hard to make a case for turning RI off. Furthermore, it makes good sense to start with RI enforced, and turn it off later if necessary, because it's much easier to stop applying constraints to conformant data than it is to apply constraints to nonconformant data.

My personal approach is to always leave RI implemented as fully as possible, unless I can prove that the costs are too much to bear. In other words, I let the burden of proof be born by the argument to remove RI, rather than to retain it.

One more thing: RI has absolutely no overhead for SELECT queries. Your combo box filling VBA code should be unaffected by RI.

And one caveat: The number of page fetches required to search an index, given a complete key, normally depends mainly on the number of levels in the index. My claims of two to four index page fetches are based on statistics from mainframe DB2 databases, where indexes with more than four levels are uncommon. On the one hand, comparing DB2's page size of 4K to Jet's page size of 1K suggests that Jet would need more index levels than DB2, but on the other hand, DB2 tables often have row counts in the millions or tens of millions. Thus, there is some uncertainty about the exact number of page fetches needed to scan a Jet index. However, the number of index levels grows inverse-exponentially (the reciprocal function of exponentially), so I'd be very surprised if a Jet index scan required more than five page fetches at the max.

Rick Sprague
Want the best answers? See faq181-2886
To write a program from scratch, first create the universe. - Paraphrased from Albert Einstein
 
Wow, I've learned a lot these few days. :) I have decided to continue using RI in this and most likely all projects down the road without having a good reason not to.

It makes sense that SELECT queries wouldn't be affected by RI but I wasn't sure about it. Considering the majority of regular-use queries on the large tables will be SELECTs, RI shouldn't impose any additional overhead. All INSERTs on the large tables will be done in batches and imported into the DB.

So far I have been enabling RI on all relationships in this project. I haven't changed any of the indexes generated by Access either. This is a subject of some confusion for me though, and I've been pouring over various articles and documents that address indexing (in Access and generalized RDBMS indexing methodologies). I know I'm straying from my original point, but maybe you can atleast point me in a good direction or recommend some reading on the topic of index selection methodology. Right now I have all indexes created by Access on primary and foreign keys in a relationship. I intend to create indexes on the most commonly searched fields in the DB as well, such as ccNumber, caseNumber, etc. Any tips on index selection?

Thanks again for the helpful post.

 
Use more indexing if most of your activity is querying the database, less if most of your activity is updating.

Try to use indexes that contain the columns in your WHERE clauses/criteria strings/filters. Don't bother indexing fields that you won't use this way.

Indexes on columns that have a very small domain (i.e. few distinct values in the column) aren't as helpful. The best example is Sex--always either M or F, so using an index only eliminates 50% of the rows.

If you have an application function that runs unacceptably slow, figure out the SQL statement underlying it and focus on creating indexes to speed that up. There's no good way to generalize this, so come back to TT if you need help.

Rick Sprague
Want the best answers? See faq181-2886
To write a program from scratch, first create the universe. - Paraphrased from Albert Einstein
 
Oops! Forgot one thing: Always index foreign keys. But you knew that already, dintcha?

Rick Sprague
Want the best answers? See faq181-2886
To write a program from scratch, first create the universe. - Paraphrased from Albert Einstein
 
I was pretty sure I should be indexing my foreign keys!

I'm finishing up this project withing the week, your insight was very helpful :)

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top