Simple Communication Examples

Next: Aggregate Communication Examples Up: The HPF Model Previous: The HPF Model

Simple Communication Examples

The following examples illustrate the communication requirements of scalar assignment statements. The purpose is to illustrate the implications of data distribution specifications on communication requirements for parallel execution. The explanations given do not necessarily reflect the actual compilation process.

Consider the following statements:

      REAL a(1000), b(1000), c(1000), x(500), y(0:501)
      INTEGER inx(1000)
!HPF$ DISTRIBUTE (BLOCK) ONTO procs :: a, b, inx
!HPF$ ALIGN x(i) WITH y(i+1)
      ...
      a(i) = b(i)                    ! Assignment 1
      x(i) = y(i+1)                  ! Assignment 2
      a(i) = c(i)                    ! Assignment 3
      a(i) = a(i-1) + a(i) + a(i+1)  ! Assignment 4
      c(i) = c(i-1) + c(i) + c(i+1)  ! Assignment 5
      x(i) = y(i)                    ! Assignment 6
      a(i) = a(inx(i)) + b(inx(i))   ! Assignment 7

In this example, the PROCESSORS directive specifies a linear arrangement of 10 processors. The DISTRIBUTE directives recommend to the compiler that the arrays a, b, and inx should be distributed among the 10 processors with blocks of 100 contiguous elements per processor. The array c is to be cyclically distributed among the processors with c(1), c(11), ..., c(991) mapped onto processor procs(1); c(2), c(12), ..., c(992) mapped onto processor procs(2); and so on. The complete mapping of arrays x and y onto the processors is not specified, but their relative alignment is indicated by the ALIGN directive. The ALIGN statement causes x(i) and y(i+1) to be stored on the same processor for all values of i, regardless of the actual distribution chosen by the compiler for x and y (y(0) and y(1) are not aligned with any element of x). The PROCESSORS, DISTRIBUTE, and ALIGN directives are discussed in detail in Section

In Assignment 1 (a(i) = b(i)), the identical distribution of a and b ensures that for all i, a(i) and b(i) are mapped to the same processor. Therefore, the statement requires no communication.

In Assignment 2 (x(i) = y(i+1)), there is no inherent communication. In this case, the relative alignment of the two arrays matches the assignment statement for any actual distribution of the arrays.

Although Assignment 3 (a(i) = c(i)) looks very similar to the first assignment, the communication requirements are very different due to the different distributions of a and c. Array elements a(i) and c(i) are mapped to the same processor for only 10%of the possible values of i. (This can be seen by inspecting the definitions of BLOCK and CYCLIC in Section .) The elements are located on the same processor if and only if [(i -1)]/100] = (i - 1) mod 10. For example, the assignment involves no inherent communication (i.e., both a(i) and c(i) are on the same processor) if i = 1 or i = 102, but does require communication if i = 2.

In Assignment 4 (a(i) = a(i-1) + a(i) + a(i+1)), the references to array a are all on the same processor for about 98%of the possible values of i. The exceptions to this are i = 100k for any k = 1, 2, ..., 9, (when a(i) and a(i-1) are on procs(k) and a(i+1) is on procs(k+1)) and i = 100k + 1 for any k = 1, 2, ..., 9 (when a(i) and a(i+1) are on procs(k+1) and a(i-1) is on procs(k)). Thus, except for ``boundary" elements on each processor, this statement requires no inherent communication.

Assignment 5, c(i) = c(i-1) + c(i) + c(i+1), while superficially similar to Assignment 4, has very different communication behavior. Because the distribution of c is CYCLIC rather than BLOCK, the three references c(i), c(i-1), and c(i+1) are mapped to three distinct processors for any value of i. Therefore, this statement requires communication for at least two of the right-hand side references, regardless of the implementation strategy.

The final two assignments have very limited information regarding the communication requirements. In Assignment 6 (x(i) = y(i)) the only information available is that x(i) and y(i+1) are on the same processor; this has no logical consequences for the relationship between x(i) and y(i). Thus, nothing can be said regarding communication in the statement without further information. In Assignment 7 (a(i) = a(inx(i)) + b(inx(i))), it can be proved that a(inx(i)) and b(inx(i)) are always mapped to the same processor. Similarly, it is easy to deduce that a(i) and inx(i) are mapped together. Without knowledge of the values stored in inx, however, the relation between a(i) and a(inx(i)) is unknown, as is the relationship between a(i) and b(inx(i)).

The inherent communication for a sequence of assignment statements is the union of the communication requirements for the individual statements. An array element used in several statements contributes to the total inherent (i.e. minimal) communication only once (assuming an optimizing compiler that eliminates common subexpressions), unless the array element may have been changed since its last use. For example, consider the code below:

      REAL a(1000), b(1000), c(1000)
!HPF$ DISTRIBUTE (CYCLIC) ONTO procs :: a, b, c
      ...
      a(i) = b(i+2)                     ! Statement 1
      b(i) = c(i+3)                     ! Statement 2
      b(i+2) = 2 * a(i+2)               ! Statement 3
      c(i) = a(i+1) + b(i+2) + c(i+3)   ! Statement 4

Statements 1 and 2 each require one array element to be communicated for any value of i. Statement 3 has no inherent communication. To simplify the discussion, assume that all four statements are executed on the processor storing the array element being assigned.

Then, for Statement 4:

Element a(i+1) induces communication, since it is not local and was not communicated earlier;
Element b(i+2) induces communication, since it is nonlocal and has changed since its last use; and
Element c(i+3) does not induce new communication, since it was used in statement 2 and not changed since.

Thus, the minimum total inherent communication in this program fragment is four array elements. It is important to note that this is a minimum. Some compilation strategies may produce communication for element c(i+3) in the last statement.

Next: Aggregate Communication Examples Up: The HPF Model Previous: The HPF Model

paula@erc.msstate.edu
Thu Dec 8 16:17:11 CST 1994