a presentation by

Disclaimer

                this is an informal talk by a poor Fortraner, do not aspect to learn a lot,

              
                probably you already know more Fortran than me...

              
This talk is designed for a heterogeneous class, mixing newbies and experienced programmers
Fortran is NOT a legacy language!
  • Fortran is the BEST general-purpose programming language for SCIENTIFIC COMPUTING:
    • near the best perfomance with easy;
    • the best ARRAYS HANDLING you will ever see;
    • HPC ready;
    • near all the math functions you will ever want built-in the language;
  • the first high-level language to be STANDARDIZED;
Belive me…
Matlab is a TOY compared to Fortran!

Fortran was originally developed by IBM (the FORmula TRANslating system) in the mid of 50’

  • FORTRAN, first released by IBM in 1956
  • FORTRAN II, released by IBM in 1958
  • FORTRAN IV, released in 1962, standardized
  • FORTRAN 66, appeared in 1966 as an ANSI standard
  • FORTRAN 77, appeared in 1977, structured features
  • Fortran 90, 1992 ANSI standard, free form, modules
  • Fortran 95, a few extensions
  • Fortran 2003, object oriented programming
  • Fortran 2008, a few extensions
  • Fortran 2015, incoming standard, mainly parallel features
Fortran is not dead, it is worth to learn it!

Three main reasons to learn Fortran:

  1. it is a de-facto standard in scientific computing;
  2. there are tons of codes for doing anything;
  3. it has a brilliant future.
which one I should use?
Obviously, the last one!
  • free-format source code;
  • names can consists of up to 31 characters;
  • dynamic memory handling:
    • ability to operate on arrays (or array sections) as a whole;
  • sane programming style:
    • generic names;
    • optional arguments and calls with keywords;
    • operator overloading;
  • recursive, pure and elemental procedures;
  • derived types:
    • Object Oriented Programming.
Modern Fortran compilers are backwards compatible!
Develop new codes in Modern Fortran mixing-in old legacy codes is effortless!
Use Free Format!
  • ASCII text up to 132 columns;
  • case insensitive i.e. PROGRAM is the same as ProGram;
  • using mixed case for statements and variables is not considered a good programming practice; be considerate to your collaborators who will be modifying the code.
  • use whatever convention you are comfortable…
  • … but ADOPT a convention! Do not code as it comes;
  • be consistent with your convention throughout.
Bad practice
                    program My_bAD
  integer :: NabCD
     parameter :: naBcd = 10
character(nabcD) :: abc = 'hello'
print*, ABC
program MY_bad

                  
Good practice
                    program my_good

  integer, parameter      :: chars_number=10
  character(chars_number) :: greetings='hello'

  print '(A)', trim(greetings)
program my_good

                  

(Dis)Sectioning a Fortran Program
A Fortran program consists of one or more PROGRAM UNITS
                program
...
end program

              
                module
...
end module

              
                subroutine
...
end subroutine

              
                function
...
end function

              
  • the PROGRAM unit is the main, there should be only one main unit;
  • the PROGRAM unit can contain one or more subprogram units such as SUBROUTINE or FUNCTION;
  • the subprogram units should be used to perform simple tasks: function or subroutine of more than 50/100 lines are bad designed!
  • the subprogram units related each other should be wrapped into a MODULE:
    • be MODULAR should be an imperative for you!
  • inside each program (sub)unit define data and operations on data.
Your First Fortran Program
                program the_greeter
print '(A)', 'Hello World!'
end program the_greeter

              
Your First MODULAR Fortran Program
                module the_greeter_worker
contains
  subroutine say_something(what)
  character(*), intent(in) :: what !< What I have to say?
  print '(A)', what
  end subroutine say_something
endmodule the_greeter_worker

program the_modular_greeter
use the_greeter_worker
call say_something(what='Hello modular World!')
end program the_modular_greeter

              

The standard Fortran file extension is .f90 (for standard 90+), e.g. the_greeter.f90

Fortran is a compiled language

Building a Fortran application means to pass through 3 steps:

  1. write the sources;
  2. generate the compiled objects;
  3. link the executable.
            # generate the_greeter.o compiled object
gfortran -c the_greeter.f90

# generate the_greeter executable
gfortran the_greeter.o -o the_greeter

# generate the_greeter executable in one command
gfortran the_greeter.f90 -o the_greeter

          
Compilers

The best Fortran compilers you will ever see are also FREE!

  1. GNU gfortran, https://gcc.gnu.org/wiki/GFortran;
  2. Intel Fortran, https://software.intel.com/en-us/fortran-compilers, free for non commercial use;
How to define data containers
Fortran provides 5 fundamental, intrinsic data-types:
INTEGERs
INTEGERs are integer values, which can be represented exactly by the computer
REALs
REALs are floating point numbers, representing real numbers; they are not exact representations of real numbers, having only a finite number of decimal places, the FINITE PRECISION issue
CHARACTERs
CHARACTERs are text characters, usually encoded as ASCII
LOGICALs
LOGICALs are boolean variables, which can store a value of either .TRUE. or .FALSE.
COMPLEX
COMPLEX variables are effectively just two real variables, one storing the real component of a number, the other storing the imaginary component
The BAD coming from the past…

Fortran is capable of implicit typing of variables, i.e. variables obtain a type on the basis of ITS FIRST CHARCTER… a nightmare!

                |ABCDEFGH|IJKLMN |OPQRSTUVWXYZ|
|________|_______|____________|
| real   |integer| real       |

              
Explicit is BETTER than implicit!
For backward compatibility reasons, Fortran default is IMPLICIT… always start program units with the statement implicit none to disable the default implicity
                program explicit

implicit none

end program explicit

              
                module explicit

implicit none

end module explicit

              
                subroutine explicit
! if not contained into
! a program/module with
! "implicit none"

implicit none

end subroutine explicit

              
Explicit declare all variables
                TYPE[, attributes] [::] variable

              

where TYPE is an intrinsic type (or a derived one not yet seen), attributes are optional attributes of the variable declaration and variables is the, obviously, the variable name. The double semicolon :: are optional, but highly recommended!

Initialize Variable at its declaration

It is possible and highly recommended to assign and initial value to all variables. It could be very helpful, to initialize them into their definition

                integer, parameter :: cars_number=100

              

The assignment on definition is NOT ALWAYS POSSIBLE, neither DESIRABLE.

Constants types

Constants are classified into two categories:

  • literal constants that have not a name;
  • named constants that have obviously a name.
Literal Constants
                  100
3.14159265
'Hello World!'
-3.134324390984353454e23 ! subtle mistakes...

                

Avoid to use literal constants: MAGIC NUMBERS are not clear!

Named Constants
                  integer,       parameter :: cars_number=100
real,          parameter :: pi=3.14159265
character(12), parameter :: greetings='Hello World!'
real,          parameter :: wrong=-3.134324390984353454e23 ! subtle mistakes...

                

The PARAMETER attribute is your CONSTANT friend!

What?
Arrays are collections of values of the same type
How?

Arrays are declared or by means of dimension attribute, either by directly define a dimension along-side the array name, e.g.

                      integer, dimension(2:15) :: array_with_explicit_bounds
integer, dimension(11)   :: array_with_implicit_bounds
integer                  :: array_with_explicit_dimensions(3)

                    

Why?
Highly efficient handling of collections of values, especially for contiguous (in memory) elements

Defaults
  • individual elements are accessed by subscripting the array;
  • default lower bound is 1 and can be omitted;
  • upto 7 dimensions;
  • in contrast to C/C++, Fortran arrays are column major.

Bad Practice
                      integer, dimension(5, 3) :: array ! avoid magic numbers
integer                  :: c
integer                  :: r
do r=1, 5
  do c=1, 3
    print*, array(r, c) ! wrong memory access
  enddo
enddo

                    
Good Practice
                      integer, parameter                           :: rows_number=5
integer, parameter                           :: cols_number=3
integer, dimension(rows_number, cols_number) :: array
integer                                      :: c
integer                                      :: r
do c=1, cols_number
  do r=1, rows_number
    print*, array(r, c) ! right column-major memory access
  enddo
enddo

                    
              real :: a(0:20), b(3, 0:5, -10:10)

            
Terminlogy
  • rank: number of dimensions;
                  `a` has rank 1 and `b` has rank 3

                
  • bounds: upper and lower limits of each dimension of the array;
                  `a` has bounds 0:20 and `b` has bounds 1:3, 0:5 and -10:10

                
  • extent: number of element in each dimension;
                  `a` has extent 21 and `b` has extents 3,6 and 21

                
  • size: total number of elements;
                  `a` has size 21 and `b` has 30

                
  • shape: the shape of an array is its rank and extent;
                  `a` has shape 21 and `b` has shape (3, 6, 21)

                

Note that

  • arrays are conformable if they share a shape;
  • the bounds do not have to be the same
                  c(4:6) = d(1:3)

                
Arrays Computations, (one of the place) where Fortran wins!

The arrays handling is superb in Fortran, computations on them are very efficient

                  integer, parameter              :: elems_number=10
real, parameter                 :: delta=1.0/elems_number
real, dimension(0:elems_number) :: abscissa
real, dimension(0:elems_number) :: temperature
integer                         :: e

! array computations

! array constructor
abscissa = [(delta*e, e=0, elems_number)]

! elemental function operating on each array element
temperature = sin(abscissa)/2.0

print "(A)", "Abscissa, Temperature"
do e=0, elems_number
  print "(E15.6, 1X, E15.6)", abscissa(e), temperature(e)
enddo

                
Arrays Allocations, dynamic memory handling

Arrays can have static defined sized (at compile time) or a dynamic size (at run time). Dynamic allocation allow efficient handling of memory with a very little (negligible or null) overhead: dynamic arrays allocation is not the evil!

                  integer                         :: elems_number
real                            :: delta
real, allocatable, dimension(:) :: abscissa
real, allocatable, dimension(:) :: temperature
integer                         :: e

! input from user
print "(A)", "Enter the number of elements"
read *, elems_number
delta=1.0/elems_number

! allocate arrays
allocate(abscissa(0:elems_number))
allocate(temperature(0:elems_number))

! array computations

! array constructor
abscissa = [(delta*e, e=0, elems_number)]

! elemental function operating on each array element
temperature = sin(abscissa)/2.0

print "(A)", "Abscissa, Temperature"
do e=0, elems_number
  print "(E15.6, 1X, E15.6)", abscissa(e), temperature(e)
enddo

                
Finite Precision Issue
  • computers have a finite memory dimension => finite precision representation of data
  • particularly important for numbers: algorithms (especially the ones not so robust/well posed) could be affected by round-off related problems
Subtle mistakes
                  real,    parameter :: pi_greek=3.14159265358979323    ! you will not get what you expect...
real*8,  parameter :: pi_greek=3.14159265358979323    ! still wrong...
real*16, parameter :: pi_greek=3.14159265358979323    ! no improvements, still wrong...
real*16, parameter :: pi_greek=3.14159265358979323_16 ! ok, sometimes works, other fails...

                
How to Deal with Finite Precision?
  • BE Parametric;
  • BE Portable;
  • BE Conscious (of what you are doing).

USE KIND SPECIFICATION!

Specify the KIND of all intrinsic types, BE Parametric!
                  integer(kind=...)            :: a_parametric_integer
real(kind=...)               :: a_parametric_real
character(kind=..., len=...) :: a_parametric_character
logical(kind=...)            :: a_parametric_logical
complex(kind=...)            :: a_parametric_complex

                

The default value of the kind of each intrinsic type is PROCESSOR DEPENDENT: do not expect to obtain the same default kind on all OS/Compiler/CPU combinations (hereafter architecture).

Which Parameter I have to use?

Again, Magic Numbers are the Evil also for Kinds!
                  integer(kind=4) :: a_bad_parametric_integer
real(kind=4)    :: a_bad_parametric_real
...

                

The definition KIND=4 is not standard portable definition, it does not mean anything on which you can rely on: on some (the most part) architectures it means 4-bytes, but it is not ensured to happen.

Let Compiler select kinds for you, BE Portable!

By means of the built-in functions selected_xxx_kind

                  ! parametric portable kinds
integer, parameter        :: I4 = selected_int_kind(9)             ! 4-bytes
integer, parameter        :: R8 = selected_real_kind(15,307)       ! 8-bytes
integer, parameter        :: CK = selected_char_kind(name='ASCII') ! ascii kind

! parametrized portable definition of variables
integer(kind=I4)          :: a_good_parametric_integer
real(kind=R8)             :: a_good_parametric_real
character(kind=CK, len=2) :: a_good_parametric_character

                

By means of the intrinsic module (F03+) iso_fortran_env

                  ! parametric portable kinds
use, intrinsic :: iso_fortran_env, only : I4 => int32, R8 => real64, character_kinds
integer, parameter        :: CK = character_kinds(1)

! parametrized portable definition of variables
integer(kind=I4)          :: a_good_parametric_integer
real(kind=R8)             :: a_good_parametric_real
character(kind=CK, len=2) :: a_good_parametric_character

                
Literal Constants have their own kind, BE Conscious!
                  use, intrinsic :: iso_fortran_env, only : R8 => real64

implicit none
real,          parameter :: pi_greek_1=3.14159265358979323
real*8,        parameter :: pi_greek_2=3.14159265358979323
real(kind=R8), parameter :: pi_greek_3=atan(1.)*4.
real(kind=R8), parameter :: pi_greek_4=atan(1._R8)*4._R8

print '(E23.15E3)', pi_greek_1
print '(E23.15E3)', pi_greek_2
print '(E23.15E3)', pi_greek_3
print '(E23.15E3)', pi_greek_4

                

What you expect? The output could be surprising…

                   0.314159274101257E+001
 0.314159274101257E+001
 0.314159274101257E+001
 0.314159265358979E+001

                
Arithmetic Operators
  • +: addition
  • -: subtraction
  • *: multiplication
  • /: division
  • **: exponentiation
Relational Operators
  • ==: equal to
  • /=: not equal to
  • <: less than
  • <=: less than or equal to
  • >: greater than
  • >=: greater than or equal to
Logical Operators
  • .and.: intersection
  • .or.: union
  • .not.: negation
  • .eqv.: logical equivalence
  • .neqv.: exclusive or
Character Operators
  • //: concatenation
Operators Precedence
  • all operator evaluations on variables is carried out from left-to-right
  • arithmetic operators have a highest precedence while logical operators have the lowest precedence
  • the order of operator precedence can be changed using parenthesis, ‘(‘ and ‘)’
  • a user can define his/her own operators:
    • user defined monadic operator has a higher precedence than arithmetic operators, while
    • dyadic operators has a lowest precedence than logical operators.
Figure-images/intrinsic_functions.png
A very large set of (optimized) intrinsic functions, especially for mathematical computations
If (then-else) construct

The King of all conditionals, the if construct. The minimal usage

                  if (expression) statement

                

where expression is a valid logical expression. If the expression is .true. (when evaluated), the statement is executed. A slightly more complex usage

                  [name_of_if:] if (expression1) then
  statement1
else if (expression2) then
  statement2
else
  statement3
end if [name_of_if]

                
  • the if-then-else construct allows branching of algorithms;
  • each logical expression (expression1, epression2,...) are executed sub-sequentially;
  • when a block statement is executed the if-then-else is left (i.e. the control goes to end if);
  • the last else is executed (if present) if all the previous expression are .false..
select case construct

Often if-then-else construct is not the optimal branching control, select case construct could be more efficient

                  [name_of_case:] select case(expression)
case(selectors1)
  statement1
case(selectors2)
  statement2
case default
  statement3
end select [name_of_case]

                

where expression is a generic expression (not only of logical type), selectors1, selectors2,... are list of (constant) values or range of values.

  • a range of values if expressed by a colon, e.g. case(2:11) means for all cases where expression is in between [2,11];
  • the values in selectors must be unique;
  • the expression is evaluated only one time! (efficient branching);
  • the first matching case is executed;
  • the last case default is executed (if present) if all the previous cases are not matched.
do loop (definite)

The minimal definite loop is of the form

                  [name_of_do:] do counter=lower_bound, upper_bound [, stride]
end do [name_of_do]

                

where counter is an integer variable, lower/upper_bound are the loop bounds and the optional stride is the stride used (default 1). If the stride is negative a count-down is performed, e.g.

                  [name_of_do:] do counter=upper_bound, lower_bound, negative_stride
end do [name_of_do]

                
do loop conditional (definite)
                  [name_of_do:] do while (expression)
end do [name_of_do]

                

where expression is evaluated at the beginning of each loop step and if .true. the statements are executed.

do loop (indefinite)

It is possible, and often desirable, to loop indefinitely at compile time and check for the occurrences of some conditions at run time

                  [name_of_do:] do
end do [name_of_do]

                

The loop execution is controlled by means of the intrinsic function exit (to break the loop) and cycle to continue the loop with the next step

                indefinite do
  if (expression1) exit indefinite
  if (expression2) cycle indefinite
end do indefinite

              

If the name of the loop is omitted, the loop exited or cycled is the nearest.

IO Facility
  • IO are accomplished by operations on files
  • but files are not strictly files in Fortran…
  • files are handled by unit number;
  • there are also internal unit for Data Transfer;
  • Input => read, Output => print, write;
  • IO formatted and unformatted;
  • File => open, close;
  • inquire;
  • user defined derived-type-IO

read has a long list of arguments…

Sequential

Formatted

                      READ (eunit, format [, advance] [, asynchronous]
      [, blank] [, decimal] [, id] [, pad] [, pos]
      [, round] [, size] [, iostat] [, err] [, end]
      [, eor] [, iomsg]) [io-list]

                    

Formatted, List-Directed

                      READ (eunit, *[, asynchronous] [, blank]
      [, decimal] [, id] [, pad] [, pos]
      [, round] [, size] [, iostat] [, err] [, end]
      [, iomsg]) [io-list]

                    

Formatted, Namelist

                      READ (eunit, nml-group[, asynchronous] [, blank]
      [, decimal] [, id] [, pad] [, pos] [, round]
      [, size] [, iostat] [, err] [, end] [, iomsg])

                    

Unformatted

                      READ (eunit [, asynchronous] [, id] [, pos]
      [, iostat] [, err][, end] [, iomsg]) [io-list]

                    

Direct-Access

Formatted

                      READ (eunit, format, rec [, asynchronous] [, blank]
      [, decimal] [, id] [, pad] [, pos] [, round]
      [, size] [, iostat] [, err] [, iomsg])
      [io-list]

                    

Unformatted

                      READ (eunit, rec [, asynchronous] [, id] [, pos]
      [, iostat] [, err] [, iomsg]) [io-list]

                    
Internal
                      READ (iunit, format [, nml-group] [, iostat] [, err]
      [, end] [, iomsg]) [io-list]

                    

Namelist

                      READ (iunit, nml-group [, iostat] [, err] [, end]
      [, iomsg]) [io-list]

                    

write has a long list of arguments too…

Sequential

Formatted

                      WRITE (eunit, format [, advance] [, asynchronous]
       [, decimal] [, id] [, pos] [, round]
       [, sign] [, iostat] [ , err] [, [iomsg])
       [io-list]

                    

Formatted, List-Directed

                      WRITE (eunit, * [, asynchronous] [, decimal]
       [, delim] [, id] [, pos] [, round] [, sign]
       [, iostat] [, err] [, iomsg])[io-list]

                    

Formatted, Namelist

                      WRITE (eunit, nml-group [, asynchronous] [, decimal]
       [, delim] [, id] [, pos] [, round] [, sign]
       [, iostat] [, err] [, iomsg])

                    

Unformatted

                      WRITE (eunit [, asynchronous] [, id] [, pos]
       [, iostat] [, err] [, iomsg])[io-list]

                    

Direct-Access

Formatted

                      WRITE (eunit, format, rec [, asynchronous]
       [, decimal] [, delim] [, id] [, pos]
       [, round] [, sign] [, iostat] [, err]
       [, iomsg])[io-list]

                    

Unformatted

                      WRITE (eunit, rec [, asynchronous] [, id] [, pos]
       [, iostat] [ , err] [, iomsg])[io-list]

                    
Internal
                      WRITE (iunit, format [, nml-group] [, iostat]
       [ , err] [, iomsg])[io-list]

                    

Namelist

                      WRITE (iunit, nml-group [, iostat] [ , err]
       [, iomsg])[io-list]

                    

Avoid magic numbers, again!
Units 5 and 6 are standard input and output: better approach is to use iso_fortran_env
Bad Practice
                  read(5, *) user_input
if (error_occur) then
  write(*, '(A)') error_message
else
  write(6, '(A)') output_message
endif

                
Good Practice
                  use, intrinsic :: iso_fortran_env, only : error_unit, input_unit, output_unit
read(input_unit, *) user_input
if (error_occur) then
  write(error_unit, '(A)') error_message
else
  write(output_unit, '(A)') output_message
endif

                
The basics
  • open, connect Fortran logical unit to a file or device; declare attributes for read and write operations;
  • close, terminate connection between logical unit and file or device
  • rewind, position sequential or direct access file at the beginning of the file (the initial point);
  • inquire, request information on the status of specified properties of a file or logical unit.
Open
                      OPEN ([UNIT=] io-unit[, FILE= name] [, ERR= label]
      [, IOMSG=msg-var] [, IOSTAT=i-var], slist)

                    

where slist is a list of one or more specifier, e.g. access='stream', form='unformatted' etc…

Close
                      CLOSE ([UNIT=] io-unit[, STATUS = p] [, ERR= label]
       [, IOMSG=msg-var] [, IOSTAT=i-var])

                    
Rewind
                      REWIND ([UNIT=] io-unit[, ERR= label]
        [, IOMSG=msg-var] [, IOSTAT=i-var])

                    

Inquire

Inquiring by File

                      INQUIRE (FILE=name[, ERR=label] [, ID=id-var]
         [, IOMSG=msg-var] [, SIZE=sz]
         [, IOSTAT=i-var] slist)

                    

Inquiring by Unit

                      INQUIRE ([UNIT=]io-unit [, ERR=label] [, ID=id-var]
         [, IOMSG=msg-var] [, SIZE=sz]
         [, IOSTAT=i-var] slist)

                    

Inquiring by Output List

                      INQUIRE (IOLENGTH=len) out-item-list

                    

KISS, Keep It Simple and Stupid!
  • avoid too long logics, difficult to debug and maintain;
  • split complex goals in a set of small and simple sub-goals;
  • exploit Fortran subprogram units, i.e. subroutine and function.

This allows to

  • re-use code-blocks;
  • repeat the same operations on different datasets;
  • hide (local) implementation for a safer namespace mangling.
Bad Practice
                      do i=1, N
  a(i) = cos(b(i-1)) + sin(c(i+1))
end do
do i=1, N
  d(i) = cos(e(i-1)) + sin(f(i+1))
end do
do i=1, N
  g(i) = cos(h(i-1)) + sin(p(i+1))
end do

                    
Good Practice
                      subroutine do_something(a, b, d)
...
end subroutine do_something

call do_something(a=a, b=b, c=c)
call do_something(a=d, b=e, c=f)
call do_something(a=g, b=h, c=i)

                    

Declaration
                  [prefix [prefix]] subroutine subroutine_name [([d-arg-list]) [lang-binding]]
! modules access, if any
use module_1
...
implicit none
! variables (dummy arguments and local ones) definition
...
! executable statements
[contains]
  ! internal subprogram units
end subroutine subroutine_name

                

where

  • prefix (optional) is one of elemental, impure, module, pure, recursive;
  • d-arg-list (optional) is the list of one or more dummy arguments;
  • lang-binding (optional) is the C binding BIND (C[, name=ext-name]).
Usage

Exchange data through its dummy arguments list or by IO (impure)

                  call subroutine_name [([d-arg-list])

                
Declaration
                  [prefix [prefix]] FUNCTION function_name [([d-arg-list]) [suffix] [lang-binding]]
! modules access, if any
use module_1
...
implicit none
! variables (dummy arguments and local ones) definition
...
! executable statements
[contains]
  ! internal subprogram units
end function function_name

                

where

  • prefix (optional) is one of data type-specifier, elemental, impure, module, pure, recursive;
  • d-arg-list (optional) is the list of one or more dummy arguments;
  • suffix (optional) is result(result_name);
  • lang-binding (optional) is the C binding BIND (C[, name=ext-name]).
Usage

Exchange data through its returned value or by dummy arguments list (impure) or by IO (impure)

                  my_resutl = function_name [([d-arg-list])

                
Specialities
  • call by reference: only the memory addresses of the arguments are passed;
    • any change to arguments changes the actual argument;
  • always specify the intent of dummy arguments;
  • compile-time check of calling signature mistakes is possible only the procedure interface is explicit;
    • the actual and dummy arguments must correspond in type, kind and rank.
Interface
  • interface block is a powerful structure that was introduced in Fortran 90;
  • give a calling procedure the full knowledge of the types and characteristics of the dummy arguments;
  • provide a way to execute some safety checks when compiling the program;
Intent
  • intent attribute was introduced in Fortran 90 and is recommended as it
    • allow compilers to check for coding errors;
    • facilitate efficient compilation and optimization;
  • declare if a dummy argument is
    • Input: intent(in)
    • Output: intent(out)
    • Both: intent(inout)
  • a dummy declared as intent(in) in a procedure cannot be changed during the execution of the procedure.
Optional
  • allow defaults to be used for missing arguments
  • make some procedures easier to use
  • once an argument has been omitted all subsequent arguments must be keyword arguments
  • the present intrinsic can be used to check for missing arguments
                  subroutine get_temperature(vel, boltz, temperature)
use parameters
real(dp), dimension(:,:), intent(in)  :: vel
real(dp), optional,       intent(in)  :: boltz
real(dp),                 intent(out) :: temperature
integer(ip)                           :: i
real(dp)                              :: ke

if (present(boltz)) then
  kb = boltz
else
  kb = boltz_parameter
endif

ke = 0._dp
do i = 1, natom
   ke = ke + dot_product(vel(i,:), vel(i,:))
end do
temperature = mass * ke / (3._dp * kb * real(natom - 1, dp))

end subroutine get_temperature

                
Keyword Arguments
  • allow arguments to be specified in any order
  • make it easy to add an extra argument - no need to modify any calls
  • helps improve readability of the program
  • are used when a procedure has optional arguments
  • once a keyword is used, all subsequent arguments must be keyword arguments
                  ! equivalent callings
call get_temperature(velocity, boltzman, temp)
call get_temperature(vel=velocity, boltz=boltzman, temperature=temp)
call get_temperature(vel=velocity, temperature=temp, boltz=boltzman)
call get_temperature(velocity, temperature=temp, boltz=boltzman)

                

Always prefer KEYWORD ARGUMENT style!

                  ! a lot of this kind of callings
call baseline_sub(arg1=foo, arg2=bar, arg4=baz)
...
! modify the baseline_sub adding a new optional argument do not
! comport to modify all previous callings

! new callings
call baseline_sub(arg1=foo, arg2=bar, arg4=baz, arg3=fofo)

                
Modern Style
  • free format
  • attributes
  • implicit none
  • do, exit, cycle, case
Fixing the flaws
  • allocatable arrays
  • derived types
Module-oriented Programming
  • internal subprograms
  • private, public, protected
  • contains
  • use
  • explicit interfaces
  • optional arguments & intent
Formula translation
  • array syntax, where statement
  • extended & user-defined operators
Odds
  • fortran pointers
  • command line arguments
  • environment variables
  • preprocessor
  • interoperability with C (binding)
A lot of new features…
  • Derived type enhancements: parameterized derived types, improved control of accessibility, improved structure constructors, and finalizers.
  • Object oriented programming support: type extension and inheritance, polymorphism, dynamic type allocation, and type-bound procedures.
  • Data manipulation enhancements: allocatable components, deferred type parameters, VOLATILE attribute, explicit type specification in array constructors and allocate statements, pointer enhancements, extended initialization expressions, and enhanced intrinsic procedures.
  • Input/output enhancements: asynchronous transfer, stream access, user specified transfer operations for derived types, user specified control of rounding during format conversions, named constants for preconnected units, the flush statement, regularization of keywords, and access to error messages.
  • Procedure pointers.
  • Support for the exceptions of the IEEE Floating Point Standard (IEEE 1989).
  • Interoperability with the C programming language.
  • Support for international usage: access to ISO 10646 4-byte characters and choice of decimal or comma in numeric formatted input/output.
  • Enhanced integration with the host operating system: access to command line arguments, environment variables, and processor error messages.
A minor revision of Fortran 2003, but still A lot of new features…
  • do concurrent construct
  • contiguous attribute
  • block construct
  • error stop statement
  • internal procedures can be passed as actual arguments
  • procedure pointers can point to an internal procedure
  • maximum rank increased to 15
  • newunit= in open statement
  • g0 edit descriptor
  • unlimited format item
  • new intrinsic (acosh, asinh…)
  • shell commands: execute_command_line
  • iso_fortran_env: compiler_version and compiler_options
… but the 2 revolutions are
  • submodules
  • coarrays
Module, what?
  • introduced in Fortran 90
  • allow to write object based code
  • a module is a program unit whose functionality can be exploited by other programs which attaches to it via the use statement.
  • can contain
    • global object declaration: replaces Fortran 77 COMMON and INCLUDE statements
    • interface declaration
    • procedure declaration (since modules already contain explicit interface, an interface statement is not required)
  • within a module, functions and subroutines are called module procedures
  • module procedures can contain internal procedures
  • module objects that retain their values should be given a save attribute
  • modules can be used by procedures and other modules
  • modules can be compiled separately
  • by default, all module entities are public
  • to restrict the visibility of the module procedure only to the module, use the private statement
Bad Practice

Old-fashioned Fortran library

                      subroutine foo(arg1, arg2, arg3)
real    :: arg1, agr2(*), arg3
integer :: i
! GOD is real...
god = sum(arg2)
i = bar(arg2)
arg3 = god/i
end subroutine foo

integer function bar(arg)
real :: arg(*)
bar = size(arg, dim=1)
end function bar

                    

Good Practice

Modern, robust Fortran module library

                      module library
implicit none
private
public :: foo
contains
  pure subroutine foo(arg1, arg2, arg3)
    real, intent(in)  :: arg1
    real, intent(in)  :: agr2(*)
    real, intent(out) :: arg3
    real              :: god
    integer           :: i
    god = sum(arg2)
    i = bar(arg2)
    arg3 = god/i
  end subroutine foo

  pure function bar(arg) result(bar_size)
    real, intent(in) :: arg(*)
    integer          :: bar_size
    bar_size = size(arg, dim=1)
  end function bar
end module library

                    

Must-have of Module

Always start with Implicit none statement

                  module library
! optional use-association of other modules
implicit none
end module library

                

Hide ALL! Expose only the minimum

                  module library
private ! all PRIVATE!
public :: what_really_concern ! expose only what really concern
protected :: read_only_objects
end module library

                

Rename entities on use-association for names-collision handling

                  module library
use another_library, only : external_concern => what_really_concern
private
public :: what_really_concern
end module library

                
Avoid Globals, as much as possible!
  • once upon a time there was COMMON BLOCK… the evil!
  • now Module can be used as a more robust GLOBALS CONTAINER
                      module globals
implicit none
private
public :: foo, bar
protected :: baz

integer :: foo = 0
real    :: bar = 0.0
logical :: baz = .false.

contains
  subroutine initialize
  foo = 1
  bar = 2.0
  baz = .true.
  end subroutine initialize
end module globals

                    

Pros
  • all details of the data structure & order is isolated to a single source file. No need to replicate COMMON BLOCKs throughout each subroutine or use the non-Fortran 77 INCLUDE as a work-around.
  • avoid problems with unequal COMMON BLOCK definitions.
  • can use all data variables by default, or select and rename a subset.
  • encapsulate data together methods!

Cons
  • all the cons of Globals…
    • thread-un-safe: all use-associated users can (concurrently) modify public data;
    • passing globals by use-association make obscure the signature of procedures;

The revolution
  • extending the language!
  • strongly couple data-structure definition and methods operating on them!
    • safety
    • clearness
    • conciseness
    • reusability
How?
  • defined by user
  • can include different intrinsic types and other derived types
  • components are accessed using the percent operator (%)
  • only assignment operator (=) is defined for derived types
                  type :: line
  real :: x(2) ! abscissa
  real :: y(2) ! ordinate
end type line

type(line) :: a_line

                
Pack related-data
Exploit derived types for a less error-prone procedures usage
Bad Practice
                      integer           :: Ni
integer           :: Nj
integer           :: Nk
real, allocatable :: centers(:,:,:)
real, allocatable :: nodes(:,:,:)
real, allocatable :: normals(:,:,:)
real, allocatable :: faces(:,:,:)
real, allocatable :: volumes(:,:,:)

call compute_metrics(Ni, Nj, Nk,    &
                     centers,nodes, &
                     normals, fases &
                     volumes)

                    

Good Practice
                        type :: mesh
  integer           :: Ni = 0
  integer           :: Nj = 0
  integer           :: Nk = 0
  real, allocatable :: centers(:,:,:)
  real, allocatable :: nodes(:,:,:)
  real, allocatable :: normals(:,:,:)
  real, allocatable :: faces(:,:,:)
  real, allocatable :: volumes(:,:,:)
end type mesh
type(mesh) :: a_mesh
call compute_metrics(grid=a_mesh)

                      

Data hiding and Encapsulation
Exploit derived types for avoid globals
Bad Practice
                      module globals
implicit none
integer           :: Ni
integer           :: Nj
integer           :: Nk
real, allocatable :: centers(:,:,:)
real, allocatable :: nodes(:,:,:)
real, allocatable :: normals(:,:,:)
real, allocatable :: faces(:,:,:)
real, allocatable :: volumes(:,:,:)
end module globals

program bad
use globals
implicit none
call compute_metrics(Ni, Nj, Nk,    &
                     centers,nodes, &
                     normals, fases &
                     volumes)
end program bad

                    

Good Practice
                        module mesh_t
implicit none
private
public :: mesh
type :: mesh
  integer           :: Ni = 0
  integer           :: Nj = 0
  integer           :: Nk = 0
  real, allocatable :: centers(:,:,:)
  real, allocatable :: nodes(:,:,:)
  real, allocatable :: normals(:,:,:)
  real, allocatable :: faces(:,:,:)
  real, allocatable :: volumes(:,:,:)
end type mesh
end module mesh_t
program good
use mesh_t, only: mesh
implicit none
type(mesh) :: a_mesh
call compute_metrics(grid=a_mesh)
end program good

                      

The Problem

Apply the same PROCEDURE to different arguments (for type, kind and or rank), e.g. convert number to string

                  integer :: step_number
real    :: time
real    :: delta
integer :: unit
integer :: t
delta = 0.1
time  = 0.0
do t=1, 100
  time = time + delta
  open(newunit=unit, file=’clock-‘//string(t)//’.dat’)
  write(unit, "(A)")’current time: ‘//string(time)
  close(unit)
enddo

                

Say, you want the STRING function to accept both the integer t and the real time, in the same of the many built-ins like cos, sin, etc... You can develop a STRING_INTEGER, STRING_REAL, but if you need to accept also different kinds? You then need a STRING_INTEGER_8, STRING_INTEGER_16, STRING_INTEGER_32... and you need a different call for each argument type/kind/rank,

                ... file='clock-'//string_integer_32(t)//'.dat')
... 'current time: '//string_real_32(time)

              

Too much ERROR PRONE!

The Solution

Generic Names avoid the plethora of different names exploiting dynamic dispatching at compile-time (no penalty!)

                      module generic
implicit none
private
public :: string
interface string
  module procedure string_integer, string_real
endinterface
contains
  elemental function string_integer(n) result(str)
  integer, intent(in) :: n
  character(11)       :: str
  write(str, ‘(I11)’) n
  end function string_integer
  elemental function string_real(n) result(str)
  real, intent(in) :: n
  character(13)    :: str
  write(str, ‘(E13.6E2)’) n
  end function string_real
end module generic

                    

Very CONCISE, CLEAR and ROBUST

                      use generic
integer :: step_number
real    :: time
real    :: delta
integer :: unit
integer :: t
delta = 0.1
time  = 0.0
do t=1, 100
  time = time + delta
  open(newunit=unit, &
    file='clock-'//string(t)//'.dat')
  write(unit, "(A)") &
    'current time: '//string(time)
  close(unit)
enddo

                    
PENF, a concrete application
Portability Environment for Fortran poor people

A KISS library for exploiting codes portability for modern (2003+) Fortran projects

https://github.com/szaghi/PENF

The Problem

Create an algebra for your new shining type, e.g. for your vector class

                  type, public :: Vector
  real :: x = 0.0 ! Cartesian component in x direction.
  real :: y = 0.0 ! Cartesian component in y direction.
  real :: z = 0.0 ! Cartesian component in z direction.
end type Vector

                

It would be fantastic to be able to do VECTORIAL COMPUTATION… isn’t it?

                type(vector) :: vector1
type(vector) :: vector2
type(vector) :: vector3
...
vector3 = vector1 + vector2
vector3 = vector1 - vector2
vector3 = 2.0*vector1 + vector2
vector3 = vector1 - 1.4
vector3 = vector1.cross.vector2
vector3 = vector1.dot.vector2

              

Well, you can! Exploit OPERATORS OVERLOADING (without penalties!)

The Solution
                      module vector_t
implicit none
private
public :: vector, operator (+), operator(.cross.)
type, public :: Vector
  real :: x = 0.0 ! Cartesian component in x direction.
  real :: y = 0.0 ! Cartesian component in y direction.
  real :: z = 0.0 ! Cartesian component in z direction.
end type Vector
interface operator (+)
  module procedure add
end interface
interface operator (.cross.)
  module procedure crossproduct
end interface
contains
  elemental function add(left, rigth) result(summ)
  type(Vector), intent(in) :: left
  type(Vector), intent(in) :: rigth
  type(Vector)             :: summ
  summ%x = left%x + rigth%x
  summ%y = left%y + rigth%y
  summ%z = left%z + rigth%z
  end function add
  elemental function crossproduct(vec1, vec2) result(cross)
  type(Vector), intent(in) :: vec1
  type(Vector), intent(in) :: vec2
  type(Vector)             :: cross
  cross%x = (vec1%y * vec2%z) - (vec1%z * vec2%y)
  cross%y = (vec1%z * vec2%x) - (vec1%x * vec2%z)
  cross%z = (vec1%x * vec2%y) - (vec1%y * vec2%x)
  end function crossproduct
end module vector_t

                    
VecFor, a concrete application
Vector algebra class for Fortran poor people

A KISS pure Fortran OOD class for computing Vectorial (3D) algebra

https://github.com/szaghi/VecFor

Bad Practice
                      subroutine print_char(this, header)
character(len=*),  intent(in) :: this
logical, optional, intent(in) :: header
if (present(header).and.header) then
  print *, 'This is the header'
endif
print *, this
end subroutine print_char

                    
Good Practice
                      subroutine print_char(this, header)
character(len=*),  intent(in) :: this
logical, optional, intent(in) :: header
if (present(header)) then
  if (header) print *, 'This is the header'
endif
print *, this
end subroutine print_char

                    
Why?
The compiler is allowed to evaluate the header argument before the present function is evaluated. If the header argument is not in fact present an out of bounds memory reference could occur, which could cause a failure.
Bad Practice
                      program intent_gotcha
type mytype
  integer :: x
  real    :: y
end type mytype
type (mytype) :: a
a%x = 1 ; a%y = 2.
call assign(a)
! a%y COULD BE UNDEFINED HERE
print *, a
contains
  subroutine assign(this)
  type (mytype), intent (out) :: this
  this%x = 2
  end subroutine assign
end program intent_gotcha

                    
Good Practice
                      program intent_gotcha
type mytype
  integer :: x
  real    :: y
end type mytype
type (mytype) :: a
a%x = 1 ; a%y = 2.
call assign(a)
print *, a
contains
  subroutine assign(this)
  type (mytype), intent (out) :: this
  this%x = 2 ; this%y = 2.
  end subroutine assign
end program intent_gotcha

                    
Why?
When intent(out) is used with a derived type, any component not assigned in a procedure could become undefined on exit. For example, even though a%y was defined on entry to this routine, it could become undefined on exit because it was never assigned within the routine.
Bad Practice
                      real function kinetic_energy(v)
real, dimension(:), intent(in) :: v
integer                        :: i
real                           :: ke = 0.0
do i = 1, size(v)
   ke = ke + v(i)**2
enddo
kinetic_energy = .5*ke
end function kinetic_energy

                    
Good Practice
                      real function kinetic_energy(v)
real, dimension(:), intent(in) :: v
integer                        :: i
real                           :: ke
ke = 0.
do i = 1, size(v)
   ke = ke + v(i)**2
enddo
kinetic_energy = .5*ke
end function kinetic_energy

                    
Why?
A local variable that is initialized when declared has an implicit SAVE attribute. ke is initialized only the first time the function is called. On subsequent calls the old value of ke is retained.
Paradigms… what?

As poor Fortran people we are used to simply translate formula, we do not care about paradigms… What Paradigm is the following?

                  subroutine solr(toll,Qm,gamma1,a1,u1,p1,rho1,gamma4,a4,u4,p4,rho4,u,p,a,fF)
  real*8 a2,p2,rho2,a3,p3,rho3,ps,us
  real*8 pn,pn_1
  real*8 a,u,p
  real*8 w1,w4,toll,Qm
  real*8 delta,gamma,gm1
  ...
  gm1=gamma-1.d0
  delta=gm1/2.d0
  alfa=(p1/p4)**(gm1/(2.d0*gamma))
  beta=alfa*delta/a1+delta/a4
  ...

                

Aside to be very ugly, the above example shows that data is HIGHLY DECOUPLED. This is typical of PROCEDURAL PROGRAMMING paradigm

  • TOP to down approach
    • focus on the operations, not on the data
      • data handling unsafe, GLOBALS are often necessary
    • hard to add new functionality or change the work flow without going back and modifying all other parts of the program
    • not easily reusable, so programmers must often recreate the wheel
    • difficult to maintain
Change your point of view!
  • down to TOP approach
  • key idea
                  The real world can be accurately described as a collection of OBJECTS that interact

                
  • the focus is on WHAT WE DO not in HOW WE DO:
    • real world if made of OBJETCS not of algorithms:
      • identify the base classes of objects that dominate the problem;
      • identify the rules of interaction between classes;
      • describe the problem by the definition of classes dynamics.
How-To-OOP, the Pillars
  • Encapsulation;
  • Inheritance;
  • Polymorphism.
What?
Encapsulation is about grouping of functionality (methods) and related data (members) together into a coherent data structure, CLASSES.

IMPLEMENTATION is hidden from the rest of the program, aka Application Program Interface.

Do you need to know exactly how every aspect of a car works (engine, carburettor, alternator…)? No - you need to know how to use the steering wheel, brakes, accelerator…

In practice
  • restrict access to data (and data hiding):
    • more control;
    • easy to debug;
      • easy to maintain;
    • avoid side effects;
      • easy to parallel.
The Consequence
  • design by-contract
    • modularize;
    • abstraction;
  • develop by tests.
Define a Contract

Define What is your Object and How it interacts

                      module gas_t
use vecfor, only : vector
implicit none
private
public :: gas ! Only the class GAS is public!
type :: gas
  private ! hide all members!
  ! the suffix '_' means internal implementation
  real         :: Cp_=0.0
  real         :: Cv_=0.0
  real         :: density_=0.0
  real         :: energy_=0.0
  type(vector) :: momentum_
  contains
    private
    ! the setter (constructor)
    procedure, public :: initialize
    ! return the members value (the getters)
    procedure, public :: density
    procedure, public :: momentum
    procedure, public :: energy
    ! interact with the rest of the World...
    procedure, public :: temperature
    procedure, public :: speed_of_sound
    ...
end type gas
contains
! follow the implementation
...
end module gas_t

                    

Respect the Contract!

Define your World using the Contract

                      use gas_t, only : gas
implicit none

type(gas) :: air     ! a 'gas' object (instance)
real      :: density ! an alert dentisyt value

! respect the contract, use only public methods
call air%initialize(Cp=1004.8, &
                    Cv=716.0,  &
                    density=0.125)

! well designed API is auto-explicative...
if (air%speed()>air%speed_of_sound()) then
  ! do your supersonic stuffs...
end if

                    

So, where is encapsulation? The user of GAS class

  • is not aware of how to compute the speed of sound..
    • he/she simply use the speed of sound;
  • cannot manipulate gas data directly…
    • a bad user cannot make density negative manually;
  • can re-use the same names defined into gas_t module:
    • encapsulation allow easy handling of names-collision (safety names space).

What?
Inheritance enables new objects to take on the properties of existing objects, the BASE CLASS. A child inherits visible properties and methods from its parent while adding additional properties and methods of its own.
In practice
  • re-use existing classes:
    • reduce code-duplication:
      • easy debug;
      • easy maintain;
    • inherit members/methods;
    • re-use implementation;
  • classes hierarchy;
    • abstract is better, but specials happen…
    • specialize only when/where is necessary.
The Consequence
  • design from down to UP…
    • from very down! Abstract all!
      • specialize => inherit from abstract to concrete cases
    • avoid to re-invent the wheel
Extend a Previous Contract

Extend a Previous Contract adding new behaviors

                      module gas_multifluid_t
use gas_t, only : gas
implicit none
private
public :: gas_multifluid
! inherit from gas
type, extends(gas) :: gas_multifluid
  private
  integer           :: species_number=0
  real, allocatable :: concentrations(:)
  contains
    private
    ! overridded methods
    procedure, public :: initialize
    ...
end type gas_multifluid
contains
! follow the implementation
...
end module gas_multifluid_t

                    

Always respect the Contract!

Define your new World using the new Contract

                      use gas_t, only : gas
use gas_multifluid_t, only : gas_multifluid
implicit none
type(gas)            :: air
type(gas_multifluid) :: mixture
! respect the contract, use only public methods
call air%initialize(Cp=1004.8, &
                    Cv=716.0,  &
                    density=0.125)
call mixture%initialize(Cp=1004.8,            &
                        Cv=716.0,             &
                        species_number=3,     &
                        concentrations=[1.0,  &
                                        0.0,  &
                                        0.0], &
                        density=0.125)
! well designed API is auto-explicative...
if (air%speed()>air%speed_of_sound()) then
  ! do your supersonic stuffs...
end if

                    

So, where is inheritance? The developer of MULTIFLUID GAS class

  • do not need to re-implement all the behaviors of gas class:
    • multifluid is a special gas… avoid-re-invent the wheel;

While the user of MULTIFLUID GAS class

  • found the same facility of the previous gas class:
    • avoid to re-learn from scratch;
  • could re-use his/her own applications designed for gas objects…

What?
Polymorphism is the concept that multiple types of objects might be able to work in a given situation.

If you needed to write a message on a piece of paper, you could use a pen, pencil, marker or even a crayon. You only require that the item you use can fit in your hand and can make a mark when pressed against the paper.

In other words it means, one method with multiple implementation, for a certain class of action. And which implementation to be used is decided at both runtime/compiletime (dynamic/static). Overloading is static polymorphism while, overriding is dynamic polymorphism.

In practice
  • design for family of classes (objects);
  • exploit inheritance;
  • exploit encapsulation.
The Consequence
  • be ABSTARCT!
Define a Polymorphic Contract

Develop your algorithms for family of classes… as you do for your daily-physics

                      module gas_fluxes_t
use gas_t, only : gas
implicit none
private
public :: fluxes
type :: fluxes
  real         :: mass_=0.0
  real         :: energy_=0.0
  type(vector) :: momentum_
  contains
 ...
   procedure, public :: compute
end type fluxes
contains
  subroutine compute(self, gas_cell)
  ! fluxes at interfaces to be computed
  class(fluxes), intent(inout) :: self
  ! gas into the cell from which compute fluxes
  class(gas),    intent(in)    :: gas_cell
  ...
  select type(gas_cell)
  class is (gas_multifluid)
  ...
  class is (gas)
  ...
  end select
  end subroutine compute
end module gas_fluxes_t

                    

Always respect the Contract… Polymorphically is even more easy!
                        use gas_t, only : gas
use gas_multifluid_t, only : gas_multifluid
use gas_library, only : compute_fluxes
implicit none
type(gas)            :: air
type(gas_multifluid) :: mixture
type(fluxes)         :: air_fluxes
type(fluxes)         :: mixture_fluxes
! respect the contract, use only public methods
call air%initialize(Cp=1004.8, &
                    Cv=716.0,  &
                    density=0.125)
call mixture%initialize(Cp=1004.8,            &
                        Cv=716.0,             &
                        species_number=3,     &
                        concentrations=[1.0,  &
                                        0.0,  &
                                        0.0], &
                        density=0.125)
! well designed API is auto-explicative…
if (air%speed()>air%speed_of_sound()) then
  call air_fluxes%compute(gas_cell=air)
  call mixture_fluxes%compute(gas_cell=mixture)
end if

                      

The Problems… are Abstract!
In Mathematical/Physical field problems are NATURALLY abstract… Let us consider the Ordinary Differential Equations (ODE) problem $$ \begin{matrix} U_t = R(t, U) \\ U_0 = F \\ \end{matrix} $$ where

  • Ut = dU/dt is time derivative of a field U;
  • U is the vector of state variables being a function of the time-like independent variable t;
  • R is the (vectorial) residual function;
  • F is the (vectorial) initial conditions function.

Such a mathematical formulation is ubiquitous in the mathematical modelling of physical problems (fluid dynamics, chemistry, biology, evolutionary-anthropology, etc…)

U, R and F have different forms for each problem: in a procedural paradigm programming YOU NEED TO DEVELOP A DIFFERENT IMPLEMENTATION FOR EACH PROBLEM!… no, we can be more smart…

ODE Problem is Abstract!

The mathematical formulation of ODE is already ABSTRACT:

  • the state-field U is an abstract concept:
    • it is required to be (optionally) a function of time;
    • it is required to be integrable;
  • the residual function R is an abstract concept:
    • it is required to be (optionally) a function of time;
    • it is required to be a function of U;
Bad Practice

Develop an ODE solver for each system of equations:

  • Oscillation Equations (ODE);
  • Lorenz Equations (ODE);
  • Burgers Equations (PDE);
  • Euler Equations (PDE);
  • Navier-Stokes Equations (PDE);

Good Practice
Develop an ODE solver for an Abstract ODE system!

One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them

(Very) High-Level programming, the AIM

We want to code a formula like the following (Euler solver) $$ U(t+\Delta t) = U(t) + \Delta t U_t $$ into a Fortran code like the following

                  u = u + Dt * u%t(t=t)

                

The code is very, very similar to the Math… this is our aim!

Abstract Calculus Pattern will save us!
Abstract Calculus Pattern (ACP) is based on the Abstract Data Type (ADT) concept that is the base class upon which solves the (abstract) ODE problem.
FOODIE, a real application

FOODIE, Fortran Object-Oriented Differential-equations Integration Environment

https://github.com/Fortran-FOSS-Programmers/FOODIE

Authors:

Type Abstract

How to define an Abstract Class in (modern) Fortran?

                  type, abstract :: integrand
  contains
    procedure(time_derivative), deferred, public :: t
    ...
end type integrand

                
Abstract keyword

The ABSTRACT keyword implies

  • you cannot instance a variable of type integrand, but only of its (concrete) extensions:
    • an abstract type is really a CONTRACT;
  • you can (and want) develop for the abstract type integrand, you can ignore the implementation:
    • you must be aware (and use) of only its public interface!

Deferred keyword

The DEFERRED keyword implies

  • you can (and want) ignore the implementation details of this method;
  • you can (and want) rely on only it public interface;
  • each (concrete) extension of the abstract type integrand must provide the implementation of each deferred method.

The Abstract Integrand

How look our (abstract) Integrand?

                  type, abstract :: integrand
  contains
    ! deferred methods
    procedure(time_derivative),      deferred, public :: t
    procedure(symmetric_operator),   deferred         :: integrand_multiply_integrand
    procedure(integrand_op_real),    deferred         :: integrand_multiply_real
    procedure(real_op_integrand),    deferred         :: real_multiply_integrand
    procedure(symmetric_operator),   deferred         :: add
    procedure(symmetric_operator),   deferred         :: sub
    procedure(assignment_integrand), deferred         :: assign_integrand
    ! public interface
    generic, public :: operator(+) => add
    generic, public :: operator(-) => sub
    generic, public :: operator(*) => integrand_multiply_integrand, &
                                      real_multiply_integrand, &
                                      integrand_multiply_real
    generic, public :: assignment(=) => assign_integrand
end type integrand

                
The Abstract Integrand Public Interface

The Public Interface is very, very simple, and matches the Math definition of U

  • 3 operators (+-x);
                      generic, public :: operator(+)
    generic, public :: operator(-)
    generic, public :: operator(*)

                
  • the assignment (=);
                      generic, public :: assignment(=)

                
  • a time derivative method (%t).
                      procedure(time_derivative), deferred, public :: t

                

With only the knowledge of the INTERFACE of these 4 ingredients we can build ALL ODE solvers… just remember the Euler formula…

                  u = u + Dt * u%t(t=t)

                
The Abstract Integrand Interfaces
How to define an ABSTRACT method (operator/assignment)? By means of abstract interfaces…
The Abstract Interfaces
                      abstract interface
  function real_op_integrand(lhs, rhs) &
    result(operator_result)
  import :: integrand
  real,             intent(in)  :: lhs
  class(integrand), intent(in)  :: rhs
  class(integrand), allocatable :: operator_result
  end function real_op_integrand

  function time_derivative(self, t) &
    result(dState_dt)
  import :: integrand
  class(integrand),  intent(in) :: self
  real,    optional, intent(in) :: t
  class(integrand), allocatable :: dState_dt
  end function time_derivative

  function integrand_op_real(lhs, rhs) &
    result(operator_result)
  import :: integrand
  class(integrand), intent(in)  :: lhs
  real,             intent(in)  :: rhs
  class(integrand), allocatable :: operator_result
  end function integrand_op_real
end interface

                    

The Abstract Integrand
                        type, abstract :: integrand
  contains
    procedure(symmetric_operator),   deferred :: &
      integrand_multiply_integrand
    procedure(integrand_op_real),    deferred :: &
      integrand_multiply_real
    procedure(real_op_integrand),    deferred :: &
      real_multiply_integrand
    procedure(symmetric_operator),   deferred :: add
    …
end type integrand

                      

For all DEFERRED methods you must provide an

EXPLICIT ABSTRACT INTERFACE

The Abstract Interfaces (continued)
                      abstract interface
  function symmetric_operator(lhs, rhs) &
    result(operator_result)
  import :: integrand
  class(integrand), intent(in)  :: lhs
  class(integrand), intent(in)  :: rhs
  class(integrand), allocatable :: operator_result
  endfunction symmetric_operator

  subroutine assignment_integrand(lhs, rhs)
  import :: integrand
  class(integrand), intent(inout) :: lhs
  class(integrand), intent(in)    :: rhs
  endsubroutine assignment_integrand
end interface

                    

The Whole Source
                      module foodie_adt_integrand
implicit none
private
public :: integrand

type, abstract :: integrand
  contains
    procedure(time_derivative),      deferred :: t
    procedure(symmetric_operator),   deferred :: &
      integrand_multiply_integrand
    procedure(integrand_op_real),    deferred :: &
      integrand_multiply_real
    procedure(real_op_integrand),    deferred :: &
      real_multiply_integrand
    procedure(symmetric_operator),   deferred :: add
    procedure(symmetric_operator),   deferred :: sub
    procedure(assignment_integrand), deferred :: &
      assign_integrand
    generic :: operator(+) => add
    generic :: operator(-) => sub
    generic :: operator(*) => integrand_multiply_integrand,&
                              real_multiply_integrand, &
                              integrand_multiply_real
    generic :: assignment(=) => assign_integrand
endtype integrand

abstract interface
  function time_derivative(self, t) result(dState_dt)
  import :: integrand, R_P
  class(integrand),       intent(IN) :: self
  real(R_P),    optional, intent(IN) :: t
  class(integrand), allocatable      :: dState_dt
  end function time_derivative
  ...

                    

The Whole Source (continued)
                          …
  function integrand_op_real(lhs, rhs) &
    result(operator_result)
  import :: integrand, R_P
  class(integrand), intent(IN)  :: lhs
  real(R_P),        intent(IN)  :: rhs
  class(integrand), allocatable :: operator_result
  end function integrand_op_real
  !
  function real_op_integrand(lhs, rhs) &
    result(operator_result)
  import :: integrand, R_P
  real(R_P),        intent(IN)  :: lhs
  class(integrand), intent(IN)  :: rhs
  class(integrand), allocatable :: operator_result
  end function real_op_integrand
  !
  function symmetric_operator(lhs, rhs) &
    result(operator_result)
  import :: integrand
  class(integrand), intent(IN)  :: lhs
  class(integrand), intent(IN)  :: rhs
  class(integrand), allocatable :: operator_result
  end function symmetric_operator
  !
  subroutine assignment_integrand(lhs, rhs)
  import :: integrand
  class(integrand), intent(INOUT) :: lhs
  class(integrand), intent(IN)    :: rhs
  end subroutine assignment_integrand
end interface
end module foodie_adt_integrand

                      

Using the Abstract Integrand
With the so simple abstract integrand we cannot do anything serious, you are thinking… WRONG! You can build OPERATIVE procedures on it. Let us see the Euler solver built on it
module foodie_integrator_euler_explicit
use foodie_adt_integrand, only : integrand
implicit none
private
public :: euler_explicit_integrator
!
type :: euler_explicit_integrator
  private
  contains
    private
    procedure, nopass, public :: integrate
end type euler_explicit_integrator
contains
  subroutine integrate(U, Dt, t)
  class(integrand),    intent(inout) :: U
  real(R_P),           intent(in)    :: Dt
  real(R_P), optional, intent(in)    :: t
  U = U + U%t(t=t) * Dt
  return
  end subroutine integrate
end module foodie_integrator_euler_explicit

Compare with the math $$ U(t+\Delta t) = U(t) + \Delta t U_t $$ It is almost the same…

  • no knowldge of the concrete implementation on U is assumed;
  • only the public API of U is used, e.g. U%t, U+U, U*Dt...;
  • the euler integrator accepts ALL integrand extensions!
use foodie_integrator_euler_explicit
use my_integrand
type(euler_explicit_integrator) :: solver
type(my_integrand)              :: U
...
call solver%integrate(U=U, Dt=0.1)

Well, 20 lines for an ODE solver, and it is ABSTRACT! It can solve ANY ODE!

Using the Abstract Integrand (continued)
The Euler solver is trivial, you are not able to do much more, you are thinking… WRONG AGAIN! Let us see the Low Storage Runge-Kutta solver
module foodie_integrator_low_storage_runge_kutta
use foodie_adt_integrand, only : integrand
implicit none
private
public :: ls_runge_kutta_integrator
!
type :: ls_runge_kutta_integrator
  private
  integer           :: stages=0 !< Number of stages.
  real, allocatable :: A(:)     !< Low storage *A* coefficients.
  real, allocatable :: B(:)     !< Low storage *B* coefficients.
  real, allocatable :: C(:)     !< Low storage *C* coefficients.
  contains
    private
    ...
    procedure, pass(self), public :: integrate
endtype ls_runge_kutta_integrator
contains
  ...
  subroutine integrate(self, U, stage, Dt, t)
  class(ls_runge_kutta_integrator), intent(in)    :: self
  class(integrand),                 intent(inout) :: U
  class(integrand),                 intent(inout) :: stage(1:)
  real(R_P),                        intent(in)    :: Dt
  real(R_P),                        intent(in)    :: t
  integer(I_P)                                    :: s
  select type(stage)
  class is(integrand)
    stage(1) = U
    stage(2) = U*0._R_P
    do s=1, self%stages
      stage(2) = stage(2) * self%A(s) + &
        stage(1)%t(t=t + self%C(s) * Dt) * Dt
      stage(1) = stage(1) + stage(2) * self%B(s)
    enddo
    U = stage(1)
  endselect
  end subroutine integrate
end module foodie_integrator_low_storage_runge_kutta

Compare with the math $$ \begin{matrix} K_1 = U^n \\ K_2 = 0 \\ \left.\begin{matrix} K_2 = A_s K_2 + \Delta t R(t^n + C_s \Delta t, K_1) \\ K_1 = K_1 + B_s K_2 \end{matrix}\right\} s=1,2,...N_s\\ U^{n+1} = K_1 \end{matrix} $$ It is almost the same…

  • no knowldge of the concrete implementation on U is assumed;
  • only the public API of U is used, e.g. U%t, U+U, U*Dt...;
  • the euler integrator accepts ALL integrand extensions!
use foodie_integrator_low_storage_runge_kutta
use my_integrand
type(ls_runge_kutta_integrator) :: solver
type(my_integrand)              :: U
...
call solver%integrate(U=U, Dt=0.1)

Well, 20 lines for a Low Storage RK ODE solver, and it is ABSTRACT! It can solve ANY ODE!

How Difficult is to define a Concrete ODE upon the abstract integrand?
It is really dead-simple, just RESPECT THE CONTRACT!
                      module oscillation_ode
use foodie, only : integrand
implicit none
private
public :: oscillation
type, extends(integrand) :: oscillation
  private
  real :: f=0.0
  real :: U(1:2)=[0.0, 0.0]
  contains
    procedure, pass(self), public :: t => &
      dOscillation_dt
    procedure, pass(lhs),  public :: &
      integrand_multiply_integrand => &
      oscillation_multiply_oscillation
    procedure, pass(lhs),  public :: &
      integrand_multiply_real => &
      oscillation_multiply_real
    procedure, pass(rhs),  public :: &
      real_multiply_integrand => &
      real_multiply_oscillation
    procedure, pass(lhs),  public :: add => &
      add_oscillation
    procedure, pass(lhs),  public :: sub => &
      sub_oscillation
    procedure, pass(lhs),  public :: &
      assign_integrand => &
      oscillation_assign_oscillation
end type oscillation
contains
  ...
end module oscillation_ode

                    

$$ \begin{matrix} U_t = R(U) \\ U = [v_1, v_2] R(U) = [-fv_2, fv_1] \end{matrix} $$

                        ...
  function dOscillation_dt(self, t) result(dState_dt)
  class(oscillation),     intent(in) :: self
  real(R_P),    optional, intent(in) :: t
  class(integrand),  allocatable     :: dState_dt
  allocate(oscillation :: dState_dt)
  select type(dState_dt)
  class is(oscillation)
    dState_dt = self
    dState_dt%U(1) = -self%f * self%U(2)
    dState_dt%U(2) =  self%f * self%U(1)
  endselect
  end function dOscillation_dt

  function oscillation_multiply_oscillation(lhs, rhs) &
    result(opr)
  class(oscillation), intent(in) :: lhs
  class(integrand),   intent(in) :: rhs
  class(integrand), allocatable  :: opr
  allocate(oscillation :: opr)
  select type(opr)
  class is(oscillation)
    opr = lhs
    select type(rhs)
    class is (oscillation)
      opr%U = lhs%U * rhs%U
    endselect
  endselect
  end function oscillation_multiply_oscillation

                    

Why bothering with Abstracts?

So, to integrate the Oscillation ODE we have passed through the EXTENSION of ABSTRACT INTEGRAND, this is an overhead. why bothering with it? you are thinking. Because:

  • for ODE schemes developers:
    • work with only one, standard, derived type without the necessity to care about what the type is actually modelling (it could be the velocity field of a turbulent flow, the magnetic field of a circuit, etc…);
    • express the ODE solver formula with very high-level language; this ensures:
      • clearness!
      • fast-developing!
    • rely only on the time-derivative function and on the operators overloaded definitions;
    • be completely agnostic on which is the actual implementation of the time-derivative residual function!
  • for FOODIE clients:
    • a simple, standard API for many ODE solvers:
      • fast-developing of new schemes;
      • roboustness: the same ODE solver is applied to differnt problems… cross-validation;

OpenMP benchmarks

Figure-images/openmp-strong-scaling-comparison.png
strong scaling
Figure-images/openmp-weak-scaling-comparison.png
weak scaling

CAF benchmarks

Figure-images/caf-strong-scaling-comparison.png
strong scaling
Figure-images/caf-weak-scaling-comparison.png
weak scaling
The Long List
  • Pure Fortran implementation;
  • KISS and user-friendly:
    • simple API, presently based on the Rouson’s Abstract Data Type Pattern;
    • easy building and porting on heterogeneous architectures;
  • comprehensive solvers set out-of-the-box:
    • explicit schemes:
      • Adams-Bashforth schemes:
        • 1 step, namely the forward explicit Euler scheme, 1st order accurate;
        • 2 to 16 steps, 2nd to 16th accurate, respectively;
      • Euler (forward explicit) scheme, 1st order accurate;
      • Leapfrog, 2nd order accurate:
        • unfiltered leapfrog, 2nd order accurate, mostly unstable;
        • Robert-Asselin filtered leapfrog, 1st order accurate;
        • Robert-Asselin-Williams filtered leapfrog, 3rd order accurate;
      • Runge-Kutta schemes:
        • low-storage schemes:
          • 1 stage, namely the forward explicit Euler scheme, 1st order accurate;
          • 5 stages, 4th order accurate, 2N registers;
          • 6 stages, 4th order accurate, 2N registers;
          • 7 stages, 4th order accurate, 2N registers;
          • 12 stages, 4th order accurate, 2N registers;
          • 14 stages, 4th order accurate, 2N registers;
The Long List (continued)
  • comprehensive solvers set out-of-the-box:
    • explicit schemes:
      • Runge-Kutta schemes:
        • TVD/SSP schemes:
          • 1 stage, namely the forward explicit Euler scheme, 1st order accurate;
          • 2 stages, 2nd order accurate;
          • 3 stages, 3rd order accurate;
          • 5 stages, 4th order accurate;
        • embedded (adaptive) schemes:
          • Heun-Euler, 2 stages, 2nd order accurate;
          • Runge-Kutta-Cash-Karp, 6 stages, 5th order accurate;
          • Prince-Dormand, 7 stages, 4th order accurate;
          • Calvo, 9 stages, 6th order accurate;
          • Feagin, 17 stages, 10th order accurate;
    • implicit schemes:
      • Adams-Moulton schemes:
        • 0 step, 1st order accurate;
        • 1 step, 2nd accurate;
        • 2 steps, 3rd accurate;
        • 3 steps, 4th accurate;
      • Backward Differentiation Formula schemes:
        • 1 step, namely the backward implicit Euler scheme, 1st order accurate;
        • 2 to 6 steps, 2nd to 6th accurate, respectively;
    • predictor-corrector schemes:
      • Adams-Bashforth-Moulton schemes:
        • 1 step, AB(1)-AM(0), 1st order accurate;
        • 2 steps, AB(2)-AM(1), 2nd accurate;
        • 3 steps, AB(3)-AM(2), 3rd accurate;
        • 4 steps, AB(4)-AM(3), 4th accurate;
What is OpenMP?
  • de-facto standard Application Program Interface (API) to write shared memory parallel applications in C, C++ and Fortran
    • base on compilers directives, run-time routines and environment variables;
  • mean Open specifications for Multi Processing maintained by the OpenMP Architecture Review Board, http://www.openmp.org;
  • base on fork-join model:
    • workers (thread) do the work in parallel regions;
    • they cooperate through shared memory;
  • on the contrary of MPI, base on memory accesses instead of explicit messages;
    • local model parallelization of the serial code;
    • allow an incremental parallelization.
Shared Memory
  • all processors access all memory as global address space;
  • multiple processors can operate independently but share the same memory resources;
    • changes in a memory location effected by one processor are visible to all other processors, do you say ray condition?;
  • historically, shared memory machines have been classified as UMA and NUMA, based upon memory access times.
Uniform Memory Access (UMA)
  • most commonly represented today by Symmetric Multiprocessor (SMP) machines;
  • identical processors;
  • equal access and access times to memory;

Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update. Cache coherency is accomplished at the hardware level.

Non-Uniform Memory Access (NUMA)
  • often made by physically linking two or more SMPs;
  • one SMP can directly access memory of another SMP;
  • not all processors have equal access time to all memories
    • memory access across link is slower

Figure-images/shared_mem.png
Shared Memory, courtesy of LLNL
Pros
  • user-friendly programming perspective to memory;
  • data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs;
Cons
  • lack of scalability between memory and CPUs;
  • programmer responsibility for synchronization constructs that ensure “correct” access of global memory.
Figure-images/distributed_mem.png
Distributed Memory, courtesy of LLNL
Pros
  • memory is scalable with the number of processors;
  • cost effectiveness: can use commodity, off-the-shelf processors and networking.
Cons
  • the programmer is responsible for many of the details associated with data communication between processors;
  • difficult to map existing data structures, based on global memory, to this memory organization;
  • non-uniform memory access times - data residing on a remote node takes longer to access than node local data.
Figure-images/fork_join2.gif
Fork-Join Model, courtesy of LLNL
The model
  • all OpenMP programs begin as a single process: the master thread. The master thread executes sequentially until the first parallel region construct is encountered;
  • FORK: the master thread then creates a team of parallel threads:
    • the statements in the program that are enclosed by the parallel region construct are then executed in parallel among the various team threads;
    • JOIN: when the team threads complete the statements in the parallel region construct, they synchronize and terminate, leaving only the master thread.

The number of parallel regions and the threads that comprise them are arbitrary.

3 Pillars
| Compiler Directives | Environment Variables | | Runtime Library Routines

It is not Fortran Standard!

Compiler Directives
  • comments in your source code and are ignored by compilers unless you tell them otherwise:
    • serial and parallel codes live togheter!
      • a serial compilation ignores the directives;
      • a compilation with OpenMP support takes them into account;
  • mark a block of code;
  • specify to the compiler how to run in parallel the code block:
    • Dividing blocks of code among threads
    • Distributing loop iterations between threads
    • Serializing sections of code
    • Synchronization of work among threads
                !$OMP some_directives
! fortran statements to be done in parallel
do i=1...

              
Compiler Directives are based on Caluses
  • syntax: directive [clause[ clause]...]
  • specify additional information to the directives:
    • variables handling:
      • What are shared among all threads (the default);
      • Which are private to each thread;
      • How to initialize the private ones;
      • What is the default;
    • execution control:
      • how many threads in the team;
      • how to distribute the work;

ATTENTION: they may alter code semantic: the code can be corrected in serial but not in parallel or viceversa.

The most used envs
  • OMP_NUM_THREADS: sets number of threads
  • OMP_STACKSIZE size [B|K|M|G]: size of the stack for threads
  • OMP_DYNAMIC {TRUE|FALSE}: dynamic thread adjustment
  • OMP_SCHEDULE "schedule[,chunk]: iteration scheduling scheme
  • OMP_PROC_BIND {TRUE|FALSE}: bound threads to processors
  • OMP_NESTED {TRUE|FALSE}: nested parallelism

To set them

  • In csh/tcsh:
                setenv OMP_NUM_THREADS 4

              
  • In sh/bash:
                export OMP_NUM_THREADS=4

              
The most used functions
  • Query/specify some specific feature or setting:
    • omp_get_thread_num(): get thread ID (0 for master thread)
    • omp_get_num_threads(): get number of threads in the team
    • omp_set_num_threads(int n): set number of threads
  • Allow you to manage fine-grained access (lock):
    • omp_init_lock(lock_var): initializes the OpenMP lock variable lock_var of type omp_lock_t
  • Timing functions:
    • omp_get_wtime(): returns elapsed wallclock time
    • omp_get_wtick(): returns timer precision

Functions interface

                use omp_lib

! or old-style include

include 'omp_lib.h'

              
Parallel construct
  • create a parallel region
    • a construct is the lexical extent to which an executable directive applies
    • a region is the dynamic extent to which an executable directive applies
    • a parallel region is a block of code executed by all threads in the team
                !$omp parallel
! some code to execute in parallel
print "(A)", 'Hello (parallel) World!'
!$omp end parallel

              
  • Inside a parallel region, the variables of the serial program can be essentially shared or private
    • shared: there is only one instance of the data
      • Data is accessible by all threads in the team
      • Threads can read and write the data simultaneously
      • All threads access the same address space
    • private: each thread has a copy of the data
      • No other thread can access this data
      • Changes are only visible to the thread owning the data
      • Values are undefined on entry and exit

Variables are shared by default but with the clause default(none) no implicit default, you have to scope all variables explicitly, sane way!

Data races, the critical construct
  • A data race is when two or more threads access the same(=shared) memory location
    • Asyncronously and
    • Without holding any common exclusive locks and
    • At least one of the accesses is a write/store
    • In this case the resulting values are undefined
  • The block of code inside a critical construct is executed by only one thread at time
  • It is a syncronization to avoid simultaneous access to shared data
                sum = 0
!$omp parallel private(i, MyThreadID, psum)
MyThreadID = omp_get_thread_num()
NumThreads = omp_get_num_threads()
psum =0
do i=MyThreadID*N/NumThreads+1, min((MyThreadID+1)*N/NumThreads, N)
  psum = psum + x(i)
end do
!$omp critical
sum = sum + psum;
!$omp end critical
!$omp end parallel

              
What we learned?
  • Essentially for a parallelization it could be enough:
    • the parallel construct
    • the critical construct
    • the omp_get_thread_num() function
    • and the omp_get_num_threads() function
  • But we need to distribute the serial work among threads
  • And doing it by hand is tiring
  • The worksharing constructs automate the process
Worksharing Constructs
  • distribute the execution of the associated parallel region over the threads
  • a worksharing region has no barrier on entry, but an implied barrier exists at its end
    • if a nowait clause is present, an implementation may omit the barrier at the end of the worksharing region
  • The OpenMP API defines the following worksharing constructs:
    • do loop construct
    • sections construct
    • single construct
    • workshare construct
The King of Constructs
  • The iterations of the loop are distributed over the threads that already exist in the team
  • The iteration variable of the loop is made private by default
  • The inner loops are executed sequentially by each thread
  • Beware the data-sharing attribute of the inner loop iteration variables
    • In Fortran they are private by default
                program doexample
integer, parameter:: n=10
integer:: i, a(n), b(n), c(n)
!$omp parallel
!$omp do
do i=1, n
  a(i) = i
  b(i) = i
  c(i) = 0
end do
!$omp end do
!$omp do
do i=1, n
  c(i) = a(i) + b(i);
end do
!$omp end do
!$omp end parallel
end program doexample

              
Sectioning your algorithm
  • distribute structured blocks of code among threads in the team
    • each thread receives a section
  • when a thread has finished to execute its section, it receives another section
    • if there are no other sections to execute, threads wait for others to end up
                program vec_add_sections
integer, parameter :: n=100
integer            :: i
real               :: a(n), b(n), c(n), d(n)
do i = 1, n
  a(i) = i * 1.5
  b(i) = i + 22.35
enddo
!$omp parallel shared(a,b,c,d), private(i)
!$omp sections

!$omp section
  do i = 1, n
    c(i) = a(i) + b(i)
  enddo

!$omp section
  do i = 1, n
     d(i) = a(i) * b(i)
  enddo

!$omp end sections nowait
!$omp end parallel
end

              
This is mine!
  • the first thread that reaches it executes the associated block
  • the other threads in the team wait at the implicit barrier at the end of the construct unless a nowait clause is specified
                integer, parameter :: n=100
integer            :: i
real               :: a(n), b(n), c(n), d(n)
...
!$omp parallel
!$omp single
  a = b / c
  print*, a
!$omp end single

!$omp do
do i = 1, n
  d(i) = a(i) * b(i)
enddo
!$omp end parallel

              
Array syntax all-together!
  • The structured block enclosed in the workshare construct is divided into units of work that are then assigned to the thread such that each unit is executed by one thread only once
  • It is only supported in Fortran in order to parallelize the array syntax
                program workshare
integer, parameter :: n=100
integer            :: i, j
real               :: aa(n,n), bb(n,n)
real               :: cc(n,n), dd(n,n)
real               :: first, last

do i = 1, n
  do j = 1, n
    aa(j,i) = i * 1.0
    bb(j,i) = j + 1.0
  enddo
enddo

!$omp parallel shared(aa,bb,cc,dd,first,last)

!$omp workshare
  cc = aa * bb
  dd = aa + bb
  first = cc(1,1) + dd(1,1)
  last = cc(n,n) + dd(n,n)
!$omp end workshare nowait

!$omp end parallel

end

              
Data-sharing Attributes
  • in a parallel construct the data-sharing attributes are implicitly determined by the DEFAULT clause, if present
    • if no default clause is present they are SHARED
  • Certain variables have a predetermined data-sharing attributes
    • Variables with automatic storage duration that are declared in a scope inside a construct are private
    • Objects with dynamic storage duration are shared
    • The loop iteration variable(s) in the associated for-loop(s) of a for construct is (are) private
    • A loop iteration variable for a sequential loop in a parallel construct is private in the innermost such construct that encloses the loop (only Fortran)
    • Variables with static storage duration that are declared in a scope inside the construct are shared
Data-sharing Attributes
  • Explicitly determined data-sharing attributes are those that are referenced in a given construct and are listed in a data-sharing attribute clause
  • shared(list): there is only one instance of the objects in the list accessible by all threads in the team
  • private(list): each thread has a copy of the variables in the list
  • firstprivate(list): same as private but all variables in the list are initialized with the value that the original object had before entering the parallel construct
  • lastprivate(list): same as private but the thread that executes the sequentially last iteration or section updates the value of the objects in the list
  • The default clause sets the implicit default
    • default(none|shared|private|firstprivate)
                    !$omp parallel do reduction(+:sum)
do i=1, n
  sum = sum + x(i)
end do
!$omp end parallel do

                  

Reduction

  • With the Data-Sharing attributes clause reduction(op:list)
    • For each list item, a private copy is created in each implicit task
    • The local copy is initialized appropriately according to the operator (for example, if op is + they are initialized to 0, if op is * they are initialized to 1)
    • After the end of the region, the original list item is updated with the values of the private copies using the specified operator
  • Supported operators for a reduction clause are:
    • +, *, -, .and., .or., .eqv., .neqv., max, min, iand, ior, ieor
  • Reduction variables must be SHARED variables
  • The reduction clause is valid on parallel, do loop and sections constructs

Advanced OpenMP
  • Synchronization functions
    • barrier construct;
    • atomic construct;
  • Task Parallelism (Main addition to OpenMP 3.0 enhanced in 3.1 and 4.0)
    • Allows to parallelize irregular problems
    • Unbounded loop
    • Recursive algorithms
    • Producer/consumer schemes
    • Multiblock grids, Adaptive Mesh Refinement

Not covered today!

Counting the Prime Numbers
                    subroutine prime_number(n, total)
! Return the number of primes between 1 and N.
implicit none
integer, intent(in)  :: n
integer, intent(out) :: total
integer              :: i
integer              :: j
integer              :: prime
total = 0
!$omp parallel &
!$omp shared(n) &
!$omp private(i, j, prime)
!$omp do reduction (+: total)
do i=2, n
  prime = 1
  do j=2, i - 1
    if (mod(i, j) == 0) then
      prime = 0
      exit
    end if
  end do
  total = total + prime
end do
!$omp end do
!$omp end parallel
return
end

                  
Matrix Multiplication
                    use omp_lib
real, parameter    :: pi = 3.141592653589793D+00
integer, parameter :: n=2000
real               :: a(n,n), b(n,n), c(n,n)
integer            :: i, j, k
real               :: s, angle
integer            ::
s = 1.0D+00 / sqrt(real(n, kind = 8))
!$omp parallel shared (a, b, c, n, s) &
!$omp private (angle, i, j, k)
!$omp do
do i = 1, n
  do j = 1, n
    angle = 2.0D+00 * pi * (i - 1) * (j - 1) / real(n)
    a(i,j) = s * (sin(angle) + cos(angle))
  end do
end do
!$omp end do
!$omp do
do i = 1, n
  do j = 1, n
    b(i,j) = a(i,j)
  end do
end do
!$omp end do
!$omp do
do i = 1, n
  do j = 1, n
    c(i,j) = 0.0D+00
    do k = 1, n
      c(i,j) = c(i,j) + a(i,k) * b(k,j)
    end do
  end do
end do
!$omp end do
!$omp end parallel

                  
My Zen of Fortran
  1. standard compliance is better than custom or extended;
  2. beautiful is better than ugly;
    1. readability counts;
    2. explicit is better than impl.;
    3. simple is better than CoMpleX;
    4. CoMpleX is better than c0mp1|c@ted;
    5. flat is better than nested;
    6. s p a r s e is better than dense;
  3. fast is better than slow:
    1. vector is better than loop;
    2. matrix is better than vector;
    3. strided is better than scattered;
    4. contiguous is better than strided;
    5. broadcasting is a great idea, use where possible;
My Zen of Fortran (continued)
  1. slow is better than unmaintainable;
  2. make it look like the math;
  3. special cases aren’t special enough to break rules…
  4. although practicality beats purity;
  5. pure procedure is better than impure
  6. although practicality beats purity again;
  7. private is better than public;
  8. errors should never pass silently
  9. unless errors are explicitly silenced;
  10. to be continued