Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations sizbut on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Report Speed Tips 12

Status
Not open for further replies.

lbass

Technical User
Feb 9, 2002
32,816
US
I would like to collect your ideas on improving report speed. I have to chuckle when I see people complaining about reports that take 20 seconds. We have a statewide database with millions of records, hundreds of tables, and an old inefficient server, and if we can get our reports to run in under 15 minutes, we think we have a fast report. We also are unable to upgrade our clients, meaning we can't use the optimal native driver.

The following are some strategies for improving speed that I've picked up, but I would like to hear about other people's methods to expand (or correct) upon this list:

1-Pass all elements of the record selection criteria to the SQL statement.
a- Do not use (most) formulas for record selection
b- Follow guidelines in SynapseVampires FAQ
2-Perform grouping on server (in our case, it's faster NOT to, though)
3-Link tables on indexed fields whenever possible.
4-Select on indexed fields when possible.
5-Remove unused tables, unused formulas, unused running totals from the report.
6-Use conditional formulas instead of running totals when possible.
7-Use conditional formulas to return a desired field result or not, instead of using suppression to eliminate unwanted records.
8-If using running totals, evaluate the minimum number of times, i.e., on change of group instead of for every record, if this is an option.
9-Group on indexed fields when possible (not sure, just surmising)
10-Group on database fields instead of formulas when possible (not sure, just surmising)
11-Avoid subreports (in most cases), although they may result in a faster report if they eliminate significant row inflation.
12-Using a command as the datasource will be faster than using CR to link tables, select records, etc.
13-If using a command as datasource, develop parameters within the command.
14-Link tables by selecting a lead table that has one indexed record to link to tables containing multiple corresponding records.
15-Suppress unnecessary sections; suppress or hide details to facilitate grouping on the server.
16-Avoid nested formulas.
17-Whenever possible, limit records through selection, not suppression.
18-Use SQL expressions to convert fields to be used in record selection instead of using formula functions.

I'm sure I've missed many obvious concepts. Please add any you can think of. Links to other sources are welcome as well.

Thanks for contributing!

-LB
 
Looks pretty comprehensive.

Some of this depends upon your software version, and optimizing tips are also relative to the database being used, so I can only respond generically, but here's a few thoughts:

2-Perform grouping on server (in our case, it's faster NOT to, though)

This isn't always true, but the idea here is to allow the database to do the grouping/aggregates if it can. Most turn this on thinking that by doing so a group by clause will magically appear in the sql statement, which is NOT true unless the report is properly designed to allow for it, and it can hinder performance if it doesn't pass the group by clause.

9-Group on indexed fields when possible (not sure, just surmising)

Won't matter unless the database is doing the grouping, which means grouping on server would be on and the report properly designed to pass a group by clause. A SQL Command would allow for this.

10-Group on database fields instead of formulas when possible (not sure, just surmising)

I tested this years ago, certainly if the database can do your grouping it's better to do so, but a formula field containing simply {table.field} doesn't have much overhead as opposed to using the database field directly if the grouping is done locally.

12-Using a command as the datasource will be faster than using CR to link tables, select records, etc.

I doubt that this is true, at least not significantly (there's minimal overhead in the Crystal GUI), more of a factor would be the SQL being passed to the database.

15-Suppress unnecessary sections; suppress or hide details to facilitate grouping on the server.

I thought that you turned off the group on server? If so, I don't think that it will pass a group by clause from the GUI.

Would be a Lovely thing were you to post back your completed list afterwards, many people will benefit from your efforts here.

-k
 
Thanks, SV. I threw in a mixture of concepts around grouping on the server--in most cases that would be faster, I think, so yes, while in some cases I turn it off, in others I don't--and then the concepts to facilitate grouping on the server apply. I haven't tested to see the grouping on server by comparing simple vs. complex grouping, but I have left the grouping on server on for very simply grouped reports.

About the command as datasource vs. CR GUI--I've found that I can copy the Show SQL query from a conventionally developed report into a command and reap huge increases in speed. This doesn't seem like an obvious result--since both are passing to the SQL, but I am quite confident that this is the case. Maybe the "advantage" of having a slow system is that something that has an impact can easily be identified.

I hope a lot of people will respond with more ideas.

-LB
 
I'll try the pasted SQL for performance, I can't understand why it would differ though. It's still just passing SQL, what type of database and connectivity are you using?

If performance is a factor I use Stored Procedures, which generally prove much faster. I gather that isn't an option for you.

Then as opposed to a SQL Command, I tend to create reusable Views on the database, avoiding embedding this sort of thing in the report.

Not always a choice, but it reaps huge benefits for commonly used info.

-k
 
We don't have permissions to create views or stored procedures, so commands are as good as it gets.

I didn't mention our database and connectivity as I meant for this to be a general discussion, but the database is Oracle (can't determine the version), and the clients are Oracle 8.0.5 for Windows NT. Currently using a CR Oracle ODBC Driver 4.20 (DataDirect). Upgrading the clients statewide is also not an option at this point. CR version 11.0 SP2.

I'm also noticing that the same report run in 11.0 is dramatically slower than in 8.0 when there is a lot of server traffic, but both are equally fast when there is no traffic. Any ideas why this would be? I'm also having the odd experience in 11.0 of a much more complex report running dramatically faster than a scaled down version (fewer groups and fields) of the same report--with the same tables, linking, and SQL passing in both cases. Still testing why this is.

-LB
 
This may not be an option in your situation and it's not the most elegant solution under any circumstances, but we did it and the improvement was phenomenal. We were running a large number of reports against a single (financial) table that had grown to over 30 million rows. It was a total bottleneck even with new indexes and the most efficient SQL we could come up with. So we had the dba split the single table into 20 separate ones, one for each unique site code within, and we split the reports into site-specific reports scheduled to run at different times. We have not had any traffic issues since that point and the new reports run in seconds compared to hours before the change. Chuckle if you like because it is a lot of work to multiply your report count like that, but it solved every run-time and traffic issue we had. Since that change, we have identified another large table whose excess row count was due to historical data that we knew was not being used in reporting. We created another version that filtered out the archive data and saw vast improvements in the reports that hit that table as well. Like I said, these are not "elegant" solutions but the larger the tables get, the more the speed suffers under the best of conditions.
 
Any additional thoughts on this topic?

-LB
 
Your observations are intriguing, LB, especially since I'm working in a limited environment myself right now.

The network traffic issue seems odd if all things are equal, but who knows with Crystal...

A scaled down version of a report might not be using an index that is availble to a report using more fields due to a compound index, meaning mutiple fields.

So if one field is omitted, then the index can't be used, and it's back to a full table scan or a less efficient index.

There are tools for Oracle which allow you to see precisely which indexes are being used, and many other factors, it's not for the faint of heart or newbies.

Here's some delightfully romantic bedtime reading on the topic ;)


-k
 
Thanks, SV! The section at the end of the article looks particularly useful for writing commands. Also your idea about the compound index makes sense.

mocgp, Thanks for your comments also. This led me to test out and confirm another conclusion:

19-Set parameters to one value versus multiple values whenever possible, e.g., if you have separate sets of codes depending upon a hierarchy, e.g., state, region, city, then instead of using only one parameter that selects one or more cities that then constitute a region or a state, set up a string parameter {?location level} and use a select case statement like:

(
select {?location level}
case "State" : {table.state} = {?state}
case "Region" : {table.region} = {?region}
case "City" : {table.city} = {?city}
)

The lists of values are then set up with "None" as an option for each of the subparameters, and the prompt text instructs users like this: "If the above "Location Level" = "State", then select a state; otherwise select "None". They don't really have to select "None" for the formula to work, it makes more sense to the user on the parameter screen. This made one of my reports significantly faster.

-LB
 
Not sure how the report was originally set up, but check the Database->Show SQL to learn what the differences are.

I see no logical reason why the above criteria would differ from a "ALL" or "NONE"scnario from a SQl standpoint, but ot add to your dilema of optimizing SQL, rearranging a WHERE clause can change performace radically.

Stating that the above will increase performance means nothing until it's qualified, I'm glad that it sped one up, but what is in Crystal means nothing, what is passed to the database is everything.

I'd bet that I coulod write this out another way and net the same performance.

To have some fun with this, copy and paste a query into your query execution tool of choice and rearrange it and execute it.

lso keep in mind that if you run the report one way, then another, the data may be in cahce and that alone will jump the performance.

Getting Crystal to pass SQL can be voodoo, but optimizing queries is a refined, complex issue, and I see no logical reason why your means of passing parms would have any effect on it unless the query built differed from the previous in some way.

Ignoring parts of a record selection formula in different ways nets the same SQL, which is nothing.

-k
 
The way I did it before was to use:

{table.city} = {?city}

Then in a grouping formula, I used:

if {?Location Level} = "City" then
{table.city} else
if {?Location Level} = "Region" then
{table.region} else
if {?Location Level} = "State" then
{table.state}

...where each level represented a rollup of cities.

At times I have wondered about whether the data was somehow already organized in some way from a previous report run. If I'm testing report speed, how can I determine whether this is the case? Or how can I clear the cache (I know absolutely nothing about this, and the server is not under my control)?

-LB
 
I haven't done a complete read of all the suggestions yet, but thought I'd weigh in on "12-Using a command as the datasource will be faster than using CR to link tables, select records, etc".

LB you say that you wanted to keep this generic to the environment, but I've found that for report performance issues, the environment is a very important piece. I originally started out using CR with a SQL Server database. I am now working with a Borland Interbase database. What I've found is that many of the things I learned and used with SQL Server needed to be unlearned/relearned as it performs very differently.

I have found that using a command rather than the CR table linking GUI can result in huge increases in performances with Interbase. By making this change, I had one report go from taking over 17 seconds to run to under 2 seconds. Since in the application, some clients will run a batch of a couple hundred reports, this is huge.

When I have some more time, I will give the other suggestions a better read and hopefully have some more information to post.

B
 
Determining what's in cache is tricky, executing large reports that fill the cache with other tables might do the trick, however it's difficult to assure without having sysadmin on the database.

-k
 
Hi LB,

Since the current DBMS can't be touched, it may make sense to pursue a data warehouse or a data mart approach.

If there's a chance the organization would be willing to pursue this, propose a meeting and pitch the idea.

If there's no chance of this happening, you may be able to create your own data mart. Using views and materialized views on even a platform as simple as MS Access can make huge improvements in reporting speed and other benefits...

- Ido

view, e-mail, export, burst, distribute, and schedule Crystal Reports.
 
Ido makes a good point, you've no doubt heard me speak of LINKING (not importing) tables into Access in the forums.

Then you cna create queries within that filter the rows down to mamageable entities, and perform aggregates where it makes sense, and then you can use Access as the reporting database.

It's fugly, but I've seen it in very large IT shops that can't leverage the database fully.

-k
 
Thanks, bmarks and Ido for your comments as well. There has been discussion about building data marts for system wide use.

I haven't tried the linking/filtering in Access method, but will. I guess this becomes a two-step process--first you'd be running a report in Access to populate the database, and then running a second report (or multiple reports) in Crystal against that new database. I guess the assumption is that the Access database would only be updated periodically? Otherwise, how would a speed savings be realized? Or maybe I'm misunderstanding how to go about this.

-LB
 
No need to run reports in Access. "Linking" makes the original tables behave as if they are local to the Access database. You then create queries in Access...

The only 2-step cases are those where you wish to create "materialized views" in Access by running "Make Table" queries. You can automate such processes using simple Macros.

- Ido

view, e-mail, export, burst, distribute, and schedule Crystal Reports.
 
In reference to grouping on server. The server will only process if your report sql has a "where" clause in it.

The more formulas you have the slower the report will run because formulas are processed on the local machine so, groupng on them is less efficient than grouping on a database field.

Using page number instead of page n of m is fater becasue n of m has to process all pages before displaying first page in order to calculate total page count.

selecting "select distinct records" slows performance.

Running totals slows performance.

These are just a few I try to keep in mind while writing a report. I would love to see any other suggestions you receive.

 
To elaborate on Robbie's statement:

"In reference to grouping on server. The server will only process if your report sql has a "where" clause in it. "

A grouping on server is required to create a where clause in the SQL, as is the design of the report. This also applies to generating a Having clause (see below),

"The more formulas you have the slower the report will run because formulas are processed on the local machine so, groupng on them is less efficient than grouping on a database field."

Using SQL Expressions in lieu of formulas is a plus here, perhaps that's part of what Robbie is trying to say.

The significant gain is realized by performing the grouping on the server, if you place a database field in a formula, and Crystal has to do the grouping in either case, it will be similar. Formulas slow reports according to their complexity, not simply as the result of being formulas. Placing a date formula in the report header will not crush performance, so don't use this as a generic rule.

"Using page number instead of page n of m is fater becasue n of m has to process all pages before displaying first page in order to calculate total page count. "

It isn't faster overall, it simply makes the first page return faster, and that is also based on many other factors, such as formulas/record selection/Top N/etc. which is based on the total rowset. The trick LB often uses of a formula to count or sum a value and then using the group selection formula when it isn't passed as a HAVING clause will impede performance as well.

-k
 
Your right. placing a date formula in the report header will not crush perormance. However, formulas, all formulas, are processed on the local machine. So, the more formulas you have, the slower the processing. The more complex the formulas, the slower the processing. Grouping on a formula, therefore, must be done on the local machine. Even 5 or 6 simple formulas are not going to decrease performance. But, if you have one of those awful reports with let's say 80 or more formulas, you are going to notice and increase in runtime. Just wanted to clarify my point.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top