Next: Interaction of Communication Up: The HPF Model Previous: Simple Communication Examples
The following examples illustrate the communication implications of some more complex constructs. The purpose is to show how communication can be quantified, but again the explanations do not necessarily reflect the actual compilation process. It is important to note that the communication requirement for each statement in this section is estimated without considering the surrounding context.
Consider the following statements: REAL a(1000), b(1000), c(1000) !HPF DISTRIBUTE (BLOCK) ONTO procs :: a, b !HPF PROCESSORS procs(10) !HPF DISTRIBUTE (CYCLIC) ONTO procs :: c ... ! Assignment 1 (equivalent to Forall 1) a(:) = b(:)
! Assignment 2 (equivalent to Forall 2) a(1:1000) = c(1:1000)
! Assignment 3 (equivalent to Forall 3) a(2:999) = a(1:998) + a(2:999) + a(3:1000)
! Assignment 4 (equivalent to Forall 4) c(2:999) = c(1:998) + c(2:999) + c(3:1000) Some array intrinsics have inherent communication costs as well. For example, consider: REAL a(1000), b(1000), scalar !HPF DISTRIBUTE (BLOCK) ONTO procs :: a, b ... ! Intrinsic 1 scalar = SUM( a )
! Intrinsic 2 a = SPREAD( b(1), DIM=1, NCOPIES=1000 )
! Intrinsic 3 a = CSHIFT(a,-1) + a + CSHIFT(a,1) In general, the inherent communication derives from the mathematical definition of the function. For example, the inherent communication for computing SUM is one element for each processor storing part of the operand, minus one. (Further communication may be needed to store the result.) The optimal communication pattern is very machine-specific. Similar remarks apply to any accumulation operation; prefix and suffix intrinsics may require a larger volume based on the distribution. The SPREAD operation above requires a broadcast from procs(1) to all processors, which may take advantage of available hardware. The CSHIFT operations produce a shift communication pattern (with wraparound). This list of examples illustrating array intrinsics is not meant to be exhaustive.
There are other examples of situations in which nonaligned data must be communicated:
REAL a(1000), c(100,100), d(100,100) !HPF ALIGN c(i,j) WITH d(j,i) !HPF DISTRIBUTE (BLOCK,*) ONTO procs :: d ... a(1:200) = a(1:200) + a(2:400:2) c = c + d In the first assignment, the use of different strides in the two references to a on the right-hand side will cause communication. The second assignment statement requires either a transpose of c or d or some complex communication pattern overlapping computation and communication.
A REALIGN directive may change the location of every element of the array. This will cause communication of all elements that change their home processor; in some compilation schemes, data will also be moved to new locations on the same processor. The communication volume is the same as an array assignment from an array with the original alignment to another array with the new alignment. The REDISTRIBUTE statement changes the distribution for every array aligned to the operand of the REDISTRIBUTE. Therefore, its cost is similar to the cost of a REALIGN on many arrays simultaneously. Compiler analysis may sometimes detect that data movement is not needed because an array has no values that could be accessed; such analysis and the resulting optimizations are beyond the scope of this document.