** Next:** Interaction of Communication
**Up:** The HPF Model
** Previous:** Simple Communication Examples

The following examples illustrate the communication implications of some more complex constructs. The purpose is to show how communication can be quantified, but again the explanations do not necessarily reflect the actual compilation process. It is important to note that the communication requirement for each statement in this section is estimated without considering the surrounding context.

Consider the following statements: REAL a(1000), b(1000), c(1000) !HPF DISTRIBUTE (BLOCK) ONTO procs :: a, b !HPF PROCESSORS procs(10) !HPF DISTRIBUTE (CYCLIC) ONTO procs :: c ... ! Assignment 1 (equivalent to Forall 1) a(:) = b(:)

! Assignment 2 (equivalent to Forall 2) a(1:1000) = c(1:1000)

! Assignment 3 (equivalent to Forall 3) a(2:999) = a(1:998) + a(2:999) + a(3:1000)

! Assignment 4 (equivalent to Forall 4) c(2:999) = c(1:998) + c(2:999) + c(3:1000) Some array intrinsics have inherent communication costs as well. For example, consider: REAL a(1000), b(1000), scalar !HPF DISTRIBUTE (BLOCK) ONTO procs :: a, b ... ! Intrinsic 1 scalar = SUM( a )

! Intrinsic 2 a = SPREAD( b(1), DIM=1, NCOPIES=1000 )

! Intrinsic 3
a = CSHIFT(a,-1) + a + CSHIFT(a,1)
In general, the inherent communication derives from the mathematical
definition of the function. For example, the inherent communication
for computing `SUM` is one element for each processor storing part
of the operand, minus one. (Further communication may be needed to
store the result.) The optimal communication pattern is very
machine-specific. Similar remarks apply to any accumulation operation;
prefix and suffix intrinsics may require a larger volume based on the
distribution. The `SPREAD` operation above requires a broadcast
from `procs(1)` to all processors, which may take advantage of
available hardware. The `CSHIFT` operations produce a shift
communication pattern (with wraparound). This list of examples
illustrating array intrinsics is not meant to be exhaustive.

There are other examples of situations in which nonaligned data must be communicated:

REAL a(1000), c(100,100), d(100,100)
!HPF ALIGN c(i,j) WITH d(j,i)
!HPF DISTRIBUTE (BLOCK,*) ONTO procs :: d
...
a(1:200) = a(1:200) + a(2:400:2)
c = c + d
In the first assignment, the use of different strides in the two references to
`a` on the right-hand side
will cause communication.
The second assignment statement requires either a transpose of `c` or `d` or some complex communication pattern overlapping computation and
communication.

A `REALIGN` directive may change the location of every element of
the array. This will cause communication of all elements that change
their home processor; in some compilation schemes, data will also be
moved to new locations on the same processor. The communication volume
is the same as an array assignment from an array with the original
alignment to another array with the new alignment. The `REDISTRIBUTE` statement changes the distribution for every array
aligned to the operand of the `REDISTRIBUTE`. Therefore, its cost
is similar to the cost of a `REALIGN` on many arrays
simultaneously. Compiler analysis may sometimes detect that data
movement is not needed because an array has no values that could be
accessed; such analysis and the resulting optimizations are beyond the
scope of this document.

Thu Jul 21 17:05:43 CDT 1994