Next:
ALIGN and REALIGN Up: Data Alignment and Previous: Syntax of Data

DISTRIBUTE and REDISTRIBUTE Directives

The DISTRIBUTE directive specifies a mapping of data objects to abstract processors in a processor arrangement. For example,

      REAL SALAMI(10000)
!HPF$ DISTRIBUTE SALAMI(BLOCK)
specifies that the array SALAMI should be distributed across some set of abstract processors by slicing it uniformly into blocks of contiguous elements. If there are 50 processors, the directive implies that the array should be divided into groups of 200 elements, with SALAMI(1:200) mapped to the first processor, SALAMI(201:400) mapped to the second processor, and so on. If there is only one processor, the entire array is mapped to that processor as a single block of 10000 elements.

The block size may be specified explicitly:

      REAL WEISSWURST(10000)
!HPF$ DISTRIBUTE WEISSWURST(BLOCK(256))
This specifies that groups of exactly 256 elements should be mapped to successive abstract processors. (There must be at least [10000/256] = 40 abstract processors if the directive is to be satisfied. The fortieth processor will contain a partial block of only 16 elements, namely WEISSWURST(9985:10000).)

HPF also provides a cyclic distribution format:

      REAL DECK_OF_CARDS(52)
!HPF$ DISTRIBUTE CHESS_BOARD(BLOCK, BLOCK)
!HPF$ DISTRIBUTE DECK_OF_CARDS(CYCLIC)
If there are 4 abstract processors, the first processor will contain DECK_OF_CARDS(1:49:4), the second processor will contain DECK_OF_CARDS(2:50:4), the third processor will contain DECK_OF_CARDS(3:51:4), and the fourth processor will contain DECK_OF_CARDS(4:52:4). Successive array elements are dealt out to successive abstract processors in round-robin fashion.

Distributions may be specified independently for each dimension of a multidimensional array:

      INTEGER CHESS_BOARD(8,8), GO_BOARD(19,19)
!HPF$ DISTRIBUTE CHESS_BOARD(BLOCK, BLOCK)
!HPF$ DISTRIBUTE GO_BOARD(CYCLIC,*)
The CHESS_BOARD array will be carved up into contiguous rectangular patches, which will be distributed onto a two-dimensional arrangement of abstract processors. The GO_BOARD array will have its rows distributed cyclically over a one-dimensional arrangement of abstract processors. (The ``*'' specifies that GO_BOARD is not to be distributed along its second axis; thus an entire row is to be distributed as one object. This is sometimes called ``on-processor'' distribution.)

The REDISTRIBUTE directive is similar to the DISTRIBUTE directive but is considered executable. An array (or template) may be redistributed at any time, provided it has been declared DYNAMIC (see Section 3.5). Any other arrays currently ultimately aligned with an array (or template) when it is redistributed are also remapped to reflect the new distribution, in such a way as to preserve alignment relationships (see Section 3.4). (This can require a lot of computational and communication effort at run time; the programmer must take care when using this feature.)

The DISTRIBUTE directive may appear only in the specification-part of a scoping unit. The REDISTRIBUTE directive may appear only in the execution-part of a scoping unit. The principal difference between DISTRIBUTE and REDISTRIBUTE is that DISTRIBUTE must contain only a specification-expr as the argument to a BLOCK or CYCLIC option, whereas in REDISTRIBUTE such an argument may be any integer expression. Another difference is that DISTRIBUTE is an attribute, and so can be combined with other attributes as part of a combined-directive, whereas REDISTRIBUTE is not an attribute (although a REDISTRIBUTE statement may be written in the style of attributed syntax, using ``::'' punctuation).

Formally, the syntax of the DISTRIBUTE and REDISTRIBUTE directives is:

H303	distribute-directive	is	DISTRIBUTE< i>distributee dist-directive-stuff

H304	redistribute-directive	is	REDISTRIBUTE distributee dist-directive-stuff
					or	REDISTRIBUTE dist-attribute-stuff :: distributee-list

H305	dist-directive-stuff	is	dist-format-clause [ dist-onto-clause ]

H306	dist-attribute-stuff	is	dist-directive-stuff
					or	dist-onto-clause

H307	distributee		is		object-name
					or	template-name

H308	dist-format-clause	is	( dist-format-list )
					or	* ( dist-format-list )
					or	*

H309	dist-format		is	BLOCK  [ ( int-expr ) ]
					or	CYCLIC [ ( int-expr ) ]
					or	*

H310	dist-onto-clause		is	ONTO dist-target

H311	dist-target		is	processors-name
					or	* processors-name
					or	*

Constraint:	An object-name mentioned as a distributee
		must be a simple name and not a subobject designator.

Constraint:	An object-name mentioned as a distributee may not
		appear as an alignee.
Constraint:	An object-name} mentioned as a distributee may not
		have the POINTER attribute.

Constraint:	A distributee that appears in a REDISTRIBUTE
		directive must have the DYNAMIC attribute (see Section
		3.5).

Constraint:	If a dist-format-list is specified, its length must
		equal the rank of each distributee.

Constraint:	If both a dist-format-list and a processors-name
		appear, the number of elements of the dist-format-list
		that are not ``*'' must equal the rank of the named
		processor arrangement.

Constraint:	If a processors-name appears but not a 
		dist-format-list, the rank of each distributee
		must equal the rank of the named processor arrangement.

Constraint:	If either the dist-format-clause or the dist-target
		in a DISTRIBUTE directive begins with ``*'' then
		every distributee must be a dummy argument.

Constraint:	Neither the dist-format-clause nor the dist-target
		in a REDISTRIBUTE may begin with ``*''.

Constraint:	Any int-expr appearing in a dist-format of a
		DISTRIBUTE directive must be a specification-expr.

Note that the possibility of a {\tt DISTRIBUTE} directive of the form
!HPF$ DISTRIBUTE dist-attribute-stuff :: distributee-list

is covered by syntax rule 301 for a
combined-directive.


Examples:
!HPF$ DISTRIBUTE D1(BLOCK)
!HPF$ DISTRIBUTE (BLOCK,*,BLOCK) ONTO SQUARE:: D2,D3,D4
The meanings of the alternatives for dist-format are given below.

Define the ceiling division function CD(J,K) = (J+K-1)/K (using Fortran integer arithmetic with truncation toward zero.)

Define the ceiling remainder function CR(J,K) = J-K*CD(J,K).

The dimensions of a processor arrangement appearing as a dist-target are said to correspond in left-to-right order with those dimensions of a distributee for which the corresponding dist-format is not *. In the example above, processor arrangement SQUARE must be two-dimensional; its first dimension corresponds to the first dimensions of D2, D3, and D4 and its second dimension corresponds to the third dimensions of D2, D3, and D4.

Let d be the size of a distributee in a certain dimension and let p be the size of the processor arrangement in the corresponding dimension. For simplicity, assume all dimensions have a lower bound of 1. Then BLOCK(m) means that a distributee position whose index along that dimension is j is mapped to an abstract processor whose index along the corresponding dimension of the processor arrangement is CD(j,m) (note that m X p >= d must be true), and is position number m+CR(j,m) among positions mapped to that abstract processor. The first distributee position in abstract processor k along that axis is position number 1+m*(k-1).

The block size m must be a positive integer.

BLOCK by definition means the same as BLOCK(CD(d,p)).

CYCLIC(m) means that a distributee position whose index along that dimension is j is mapped to an abstract processor whose index along the corresponding dimension of the processor arrangement is 1+MODULO(CD(j,m)-1,p). The first distributee position in abstract processor k along that axis is position number 1+m*(k-1).

The block size \(m\) must be a positive integer.

CYCLIC by definition means the same as CYCLIC(1).

CYCLIC(m) and BLOCK(m) imply the same distribution when m X p >= d, but BLOCK(m) additionally asserts that the distribution will not wrap around in a cyclic manner, which a compiler cannot determine at compile time if m is not constant. Note that CYCLIC and BLOCK (without argument expressions) do not imply the same distribution unless p >= d, a degenerate case in which the block size is 1 and the distribution does not wrap around.

Suppose that we have 16 abstract processors and an array of length 100:

!HPF$ PROCESSORS SEDECIM(16)
      REAL CENTURY(100)
Distributing the array BLOCK (which in this case would mean the same as BLOCK(7)):
!HPF$ DISTRIBUTE CENTURY(BLOCK) ONTO SEDECIM
results in this mapping of array elements onto abstract processors:

Distributing the array BLOCK(8):

!HPF$ DISTRIBUTE CENTURY(CYCLIC) ONTO SEDECIM
results in this mapping of array elements onto abstract processors:

Distributing the array CYCLIC(3):

!HPF$ DISTRIBUTE CENTURY(BLOCK(256)) ONTO SEDECIM
results in having only one non-empty block-a partially-filled one at that, having only 100 elements-on processor 1, with processors 2 through 16 having no elements of the array. new

A DISTRIBUTE or REDISTRIBUTE directive must not cause any data object associated with the distributee via storage association (COMMON or EQUIVALENCE) to be mapped such that storage units of a scalar data object are split across more than one abstract processor. See Section for further discussion of storage association.

The statement form of a DISTRIBUTE or REDISTRIBUTE directive may be considered an abbreviation for an attributed form that happens to mention only one distributee; for example, _=13_}



Next:
ALIGN and REALIGN Up: Data Alignment and Previous: Syntax of Data


paula@erc.msstate.edu
Thu Dec 8 16:17:11 CST 1994