performance problem when counting records when count is than 10 millions

Srinuvelati · Dec 11, 2019

I have a performance problem

I have two tables one is master other is transaction data. And transaction data record count is in more than 10 millions

I need to get the the transaction data records count where the key field is in master. Please provide the best solution I have tried with joins ... in ... not in conditions. if you have any way out let me know

Mike Lewis · Dec 11, 2019

The solution this problem can be summarised in one word: indexes. As a minimum, you should have an index tag on the transaction table's foreign key, that is, the field in the transaction table that points back to the master table. This is usually something like a customer ID (if the transactions are invoices or orders), or something similar. I assume you alrady have indexes on the two tables' primary keys.

If you don't already have such a tag in place, you will notice a huge difference in performance when you add it. But you can go a bit further. If at all possible, make the key fields (the foreign key mentioned above and all primary keys) as short as possible. An integer data type is much more efficient than a long character string a multi-digit number, for example.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Mike Lewis · Dec 11, 2019

I need to get the the transaction data records count where the key field is in master.

Sorry, I just noticed that part of your question.

You will need something like this:

Code:

SELECT COUNT(TransID) FROM Transactions ;
  WHERE MasterID IN (SELECT MasterID FROM Master)

where TransID is the foreign key (e.g. CustomerID) and MasterID is the primary key of the Master Table.

But you willstill need index tags (on the two IDs) to get an acceptable performance.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Srinuvelati · Dec 11, 2019

my query is select count(*) from t1 where t1.a +t1.b in ( select t2.a from t2)

where t1 is transactional data t2 is master and t1 has index on a and b fields and t2 has index on a field

my only problem is I need count of t1 data that matches t2

my query is taking 12 seconds when data is 6 lakhs and I am going execute this on 12 millions data think of performance

TheProgrammerGuy · Dec 11, 2019

The answer is - without a doubt - indexes. If you will post your table structure and show exactly what you want out of the child table that would be the most efficient way to come up with an actual solution.

Larry - TPG

Mike Lewis · Dec 11, 2019

Are t1.a and t1.b character fields? If so, it might be more efficient to have an index on[tt] a + b[/tt] rather than separate indexes on a and b. It might be worth experimenting.

Also, make sure that a and b are as small as possible. The time taken to retrieve records using an index is proportional to the length of the index. If you can reduce the size of the two character fields, you will see an improvement.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Srinuvelati · Dec 11, 2019

Mike both are character fields

earlier I used to get the record count using the reccount() - deleted records count. which was very faster. Now I have some orphan records in transactional data which is creating problem and I need to join the table with master data.

Mike Lewis · Dec 11, 2019

Well, what I said about having an indexon [tt]a + b[/tt] still stands (as does making the fields as short as possible).

But what exactly is your goal? Is it to find (and perhaps delete?) the orphaned records? If so, it would be much more efficient to find records from t1 that are NOT in t2.

Or are you trying to perform some query on the transaction data, but you are worried that the orphaned records will muck up the results in some way? If so, a normal INNER join should solve that. Selecting from t2 inner joined to t1 will exclude the orphans.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Srinuvelati · Dec 11, 2019

I need to have correct count of data in transactional table that matches the data in master table. I hope you understand

Orphan records are not fully orphan.. which has value in key field but the value is not there in the master table
In short I need reccount() of transactional data that matches master table with good performance

I tried creating indexes.. I tried joins but failed to get good performance

TheProgrammerGuy · Dec 11, 2019

Are you familiar with Rushmore? Are you using it? To be more specific (without seeing your structure and telling exactly what you want). Rushmore is the answer to your problem.

Olaf Doschke · Dec 11, 2019

Hi Srinuvelati

You're hammering on the fact about orphaned data, but Mike got that, it has nothing to do with your performance problem, an index on a+b of the t1 table makes VFP use that index for looking up whether that compound key exists in the t2 master table.

Separate indexes on a and b don't help speed up the lookup process, really. You need an index on a+b, like here

Code:

Create Cursor t1 (a char(4), b char(4))
Index on a+b tag ab

Create Cursor t2 (a char(8))

* group 1
Insert into t2 values ('aaaabbbb')
Insert into t1 values ('aaaa','bbbb')
Insert into t1 values ('aaaa','bbbb')

* orphaned case
Insert into t1 values ('cccc','dddd')

* group 2
Insert into t2 values ('eeeeffff')
Insert into t1 values ('eeee','ffff')

Select count(*) from t1 inner join t2 on t2.a = t1.a+t1.b
Select t1.a, t1.b, count(*) from t1 inner join t2 on t2.a = t1.a+t1.b group by t1.a, t1.b

Rushmore optimization tells it uses the index tag ab, separate indexes are not used for join optimization.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Dec 11, 2019

And just to be clear, this will speed up in comparison with a case without the necessary index, but this will be a join optimization, no wonder. it's not turning the query to a single index seek, if you expect this to give you a record count as fast as reccount() you're expecting too much.

If you need this very often and fast, the usual solution will be to maintain your own record counters, a metadata table you update with each insert, update, and delete. Yes, this has the risc to become wrong if not all code also maintains this metadata table, so you'd perhaps put the maintenance into insert/update/delete triggers.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Koen Piller · Dec 11, 2019

Hi Srinuvelati,

Believe it, but the best response on your question is when you reply to our question:

If you will post your table structure and show exactly what you want out of the child table that would be the most efficient way to come up with an actual solution.

What is the reason you simply ignore?

Koen

Mike Lewis · Dec 11, 2019

In addition to posting your table structure (including all the index expressions), it would be helpful to know if this is a once-off operation (for example, to get rid of orphaned transactions that are present by accident) or if it is something you need to do a on a regular basis.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Koen Piller · Dec 11, 2019

Hi Srinuvelati,

Mike is, of course, very correct. Please do not forget to mention that. Also would like to know what the current blocksize ? sys(2012) is.

Koen

Srinuvelati · Dec 12, 2019

Olaf

the below sql is taking 12 seconds ... the sqls you have provided is taking more time

SELECT COUNT(*) FROM t1 WHERE t1.a + t1.b in ( select a FROM t1 ) INTO CURSOR t2

t1 has 35 fields index fields length is 11 chars and t2is a cursor has only one field

Olaf Doschke · Dec 12, 2019

Srinuvelati,

have you read anything we said?
My query is optimized by an index on a+b, did you create that? Your measurement is meaningless, until you create the index that query needs to run fastest.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Srinuvelati · Dec 12, 2019

I have already mentioned that I have index on a+b fields on transactional data and on a field on master data.

Olaf Doschke · Dec 12, 2019

You said you have index on a and b in transactional data, not that you have an index on a+b.

Code:

Index on a+b tag ab

So you have such an index?

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Srinuvelati · Dec 12, 2019

Yes I have that index...good amount of data in transactional table 6 lakh records

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

performance problem when counting records when count is than 10 millions

Technical User

Programmer

Programmer

Technical User

Programmer

Programmer

Technical User

Programmer

Technical User

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor