INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Another efficiency issue

Another efficiency issue

(OP)
Is it possible to re-code the following in a more efficient way? The code performs a so-called red-black SOR to solve a Poisson equation, dividing an iterative step into two half-steps with the second using updated neighbored values of the first.

CODE

DO ijswap = 0,1
      DO k = 2,n-2
      DO j = 2,m-2
      DO i = 2,l-2
         IF (MOD((i+j+k),2) .EQ. ijswap) THEN ! 
            compute some stuff depending on i,j,k
         END IF 
      END DO
      END DO
      END DO
   END DO ! ijswap 

I'm know that best would be to switch to a state-of-the-art solver, but i need to stay with this one for maintenance reasons. I'm concerned about the IF block again, since this is called within a 3d loop which is performed a several 1000 times during each run of my program. The solver consumes about 2/3 of the code's CPU time.

RE: Another efficiency issue

The mod operation uses division. Try IAND(i+j+k,1). Should be a lot cheaper.

RE: Another efficiency issue

(OP)
Thanks for the hint. I've tried it out, but didn't experience a performance gain. Maybe the compiler has already optimized this?

RE: Another efficiency issue

(OP)
I've just run 2 little test cases with a 2000³ loop, once computing MOD((i+j+k),2), once IAND((i+j+k),1). The former took 20.8 sec, tha latter 20.7. So there is no time to save in this place :)

RE: Another efficiency issue

Presently, the MOD operation and IF are being evaluated i*j*k times; but, if you take them out of the inner-most loop, it could reduce the number of MOD-calls and IF-evaluations down to j*k.

What I am thinking is that the stuff inside the inner-most loop is already only being done every other value of i...but once you know j and k before entering the i loop, you KNOW which values of i will yield an even or odd number and, so, I am thinking of something along the following lines:

CODE

DO ijswap = 0,1
    DO k = 2,n-2
    DO j = 2,m-2
        if (MOD((j+k),2) .EQ. ijswap) then
            istart = 3
        else
            istart = 2
        end if
        DO i = istart,l-2,2
            compute some stuff depending on i,j,k
        END DO
    END DO
    END DO
END DO ! ijswap 
I just wrote this here and have not compiled it or made sure it does "some stuff" for the correct indexes or not...so, you check it and make sure it works and/or that it is not backwards.

Hope this helps.

RE: Another efficiency issue

If the above alone does not do the job, maybe you can split the ijswap loop into two...one for 0 and one for 1; or, you can add an outer " IF (ijswap.eq.0) " to the one proposed above, to correctly set istart for both cases.

RE: Another efficiency issue

(OP)
Thanks for your help, splitting the IF helped a lot. My test case took 599 sec CPU time before i started to tune the code. Replacing most divisions by multiplications reduced that to ~570 sec. Following your suggestion gave a further massive improvement down to 528 sec.

RE: Another efficiency issue

Oh, good.

Say, I read the original post and it mentions something about the second iteration using updated neighboring values...what exactly is this all about? are you talking about actual array entries? I mean, do the values being updated depend only on values that are NOT being updated during the same iteration? If so, you may be able to further "parallelize" or at least open the door for parallel processing with a FORALL or an openmp directive.

RE: Another efficiency issue

(OP)
The routine is part of a numerical flow model, it solves a Poisson equation for pressure. The whole code was designed back in the 90s and still is used by most of my clients on single processor machines (Desktop PC).

The stuff inside the innermost loop looks like this:

CODE

pneu     = (- (p(i+1,j,k)*xpd(i,j,k) + p(i-1,j,k)*xpd(i-1,j,k))*dxi(i)  &
            - (p(i,j+1,k)*ypd(i,j,k) + p(i,j-1,k)*ypd(i,j-1,k))*dyi(j)  &
            - (p(i,j,k+1)*zpd(i,j,k) + p(i,j,k-1)*zpd(i,j,k-1))*dzi(k)  &
            + rs(i,j,k)) &
            * quot(i,j,k)
palt     = p(i,j,k)
p(i,j,k) = ijkxyz(i,j,k)*pneu*omega + palt*(1.e0-omega) 

As you guessed, the new values p(i,j,k) depend on their neighbors only. xpd, ypd, zpd are not modified during the iteration, also rs, quot and ijkxyz are fixed arrays.

RE: Another efficiency issue

Say, I have not had much opportunity to use FORALL, so, I was wondering if you could give it a try and post some performance feedback.

For the case of your three nested do-loops, it would look something like this:

CODE

do ijswap=0,1
    forall(k=2:n-2, j=2:m-2, i=2:l-2, mod(i+j+k,2).eq.ijswap) p(i,j,k) = something
end do 

...pretty elegant and rather concise, if you ask me. What is your opinion about this construct? do you think of it as clear or obfuscated?

Here it is after adding your assignment:

CODE

do ijswap=0,1
    forall(k=2:n-2, j=2:m-2, i=2:l-2, mod(i+j+k,2).eq.ijswap) &
        p(i,j,k) = (1.e0-omega)*p(i,j,k) + &
                   omega*ijkxyz(i,j,k)*(- (p(i+1,j,k)*xpd(i,j,k) + p(i-1,j,k)*xpd(i-1,j,k))*dxi(i)  &
                                        - (p(i,j+1,k)*ypd(i,j,k) + p(i,j-1,k)*ypd(i,j-1,k))*dyi(j)  &
                                        - (p(i,j,k+1)*zpd(i,j,k) + p(i,j,k-1)*zpd(i,j,k-1))*dzi(k)  &
                                        + rs(i,j,k))
end do 

Granted, this puts the call to the MOD function back into being evaluated (i*j*k) times; but I wonder if there is an alternate benefit to using FORALL. As it often happens, payback does not kick in until a problem is large enough; by the way, in your problem, what are the values for l,m,n?

If you would care to give it a try, I would like to hear your thoughts and test (timing) results.

Cheers,

gsal

RE: Another efficiency issue

(OP)
The forall construct appears to consume ~5% more CPU time than the 'splitted IF'. For this case, (l,m,n) = (105,105,43). Next i will compare both versions for a configuration with (l,m,n) = (750,500,23) which rewuired ~7 hours CPU time without any tuning of the program.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close