The following examples illustrate the communication implications of
some more complex constructs. The purpose is to show how communication
can be quantified, but again the explanations do not necessarily
reflect the actual compilation process. It is important to note that
the communication requirement for each statement in this section is
estimated without considering the surrounding context.
Consider the following statements:
There are other examples of situations in which nonaligned data must be
communicated:
A REALIGN directive may change the location of every element of
the array. This will cause communication of all elements that change
their home processor; in some compilation schemes, data will also be
moved to new locations on the same processor. The communication volume
is the same as an array assignment from an array with the original
alignment to another array with the new alignment. The REDISTRIBUTE statement changes the distribution for every array
aligned to the operand of the REDISTRIBUTE. Therefore, its cost
is similar to the cost of a REALIGN on many arrays
simultaneously. Compiler analysis may sometimes detect that data
movement is not needed because an array has no values that could be
accessed; such analysis and the resulting optimizations are beyond the
scope of this document.
REAL a(1000), b(1000), c(1000)
!HPF$ DISTRIBUTE (BLOCK) ONTO procs :: a, b
!HPF$ PROCESSORS procs(10)
!HPF$ DISTRIBUTE (CYCLIC) ONTO procs :: c
...
! Assignment 1 (equivalent to Forall 1)
a(:) = b(:)
! Assignment 2 (equivalent to Forall 2)
a(1:1000) = c(1:1000)
! Assignment 3 (equivalent to Forall 3)
a(2:999) = a(1:998) + a(2:999) + a(3:1000)
! Assignment 4 (equivalent to Forall 4)
c(2:999) = c(1:998) + c(2:999) + c(3:1000)
Some array intrinsics have inherent communication costs as well.
For example, consider:
REAL a(1000), b(1000), scalar
!HPF$ DISTRIBUTE (BLOCK) ONTO procs :: a, b
...
! Intrinsic 1
scalar = SUM( a )
! Intrinsic 2
a = SPREAD( b(1), DIM=1, NCOPIES=1000 )
! Intrinsic 3
a = CSHIFT(a,-1) + a + CSHIFT(a,1)
In general, the inherent communication derives from the mathematical
definition of the function. For example, the inherent communication
for computing SUM is one element for each processor storing part
of the operand, minus one. (Further communication may be needed to
store the result.) The optimal communication pattern is very
machine-specific. Similar remarks apply to any accumulation operation;
prefix and suffix intrinsics may require a larger volume based on the
distribution. The SPREAD operation above requires a broadcast
from procs(1) to all processors, which may take advantage of
available hardware. The CSHIFT operations produce a shift
communication pattern (with wraparound). This list of examples
illustrating array intrinsics is not meant to be exhaustive.
REAL a(1000), c(100,100), d(100,100)
!HPF$ ALIGN c(i,j) WITH d(j,i)
!HPF$ DISTRIBUTE (BLOCK,*) ONTO procs :: d
...
a(1:200) = a(1:200) + a(2:400:2)
c = c + d
In the first assignment, the use of different strides in the two references to
a on the right-hand side
will cause communication.
The second assignment statement requires either a transpose of c or d or some complex communication pattern overlapping computation and
communication.
Next: Interaction of Communication
Up: The HPF Model
Previous: Simple Communication Examples