Skip to content

Changelog

v1.0.6 (2026-03-02)

Full Changelog

New features

  • Add coverage analysis, install script, and project polish (1edd81a)

v1.0.5 (2026-02-20)

Full Changelog

Documentation

  • Update Giacomo Rossi's email to AMD address (c1aa19e)

v1.0.4 (2026-02-20)

Full Changelog

CI/CD

  • Switch to official github pages actions deployment (8442804)

v1.0.3 (2026-02-20)

Full Changelog

CI/CD

  • Fix broken docs deployment workflow (0fd44ca)

v1.0.2 (2026-02-20)

Full Changelog

CI/CD

  • Remove coverage analysis workflow and simplify CI pipeline (61f1d18)

v1.0.1 (2026-02-20)

Full Changelog

Bug fixes

  • Correct bug in device initialization (1b9e302)
  • Bug fix in dev_init (87124a8)

Documentation

  • Refactor README documentation, complete API description (1bccb22)

Miscellaneous

  • Update nvidia cc version for ADM (0556bb0)
  • Add c_int (4ee7dd9)
  • Remove unnecessary macro definitions (61cf3c1)
  • Add AMD compiler temporary workaround for has_device_add (41a2e90)
  • Add AMD Flang compiler and remove IFX compiler preprocessing macro (6b46a9a)
  • Remove workaround for has_device_addr clause not supported by AMD compiler (ae3c980)
  • Introduced map for mapped variable, has_device_addr for device pointers only (d82a3b3)
  • Remove DEVICEVAR clause from OpenMP loops: ifx and gfortran preprocessors cannot expand VA_ARGS macro (f997a23)
  • Add some compilers proofs tests (4830b2b)
  • Support transposed assign for all ranks (525d93f)
  • Exploit cpp preprocessor to reduce code verbosity (d99e29f)

New features

  • Add device memory get info method (0c12b07)
  • Add assign procedures, change memcpy API, add I2P support (f6be6ec)
  • Support transposed assign (b7b31ad)
  • Support transposed assign (bf7d7f3)
  • Add routine to save device memory status (c857163)
  • Add makefile to build static library (with only NVF) (32ea531)
  • Add CI pipelines, VitePress docs site, and release tooling (eb42cf9)

Refactoring

  • Refactor mpi handler object for adam (5a6c0d9)

v1.0.0-alpha (2024-09-12)

Documentation

  • Improve README documentation (0889028)

Miscellaneous

  • Start (fba31c2)
  • Update readme (2d35af2)
  • Clean memcpy test (4b50d7d)
  • Clean memcpy test comments (eb74517)
  • Add memcpy test for derived type memory

Add memcpy test for derived type memory (86dee45)

  • Use fptr_dev in alloc routine as storage_size argument (69ec34e)
  • Fix variable name (11f7326)
  • Add openacc specifications doc (2a65b9a)
  • Add all rank alloc for R8P and R4P

Add all rank alloc for R8P and R4P: add procedures for all ranks (up to 7) for R8P and R4P kinds, tested only alloc, memcpy to be tested. (ca573c2)

  • Complete memcpy for all real kinds and ranks

Complete memcpy for all real kinds and ranks (3708a5c)

  • Complete alloc/memcpy for reals and integers

Complete alloc/memcpy for reals and integers: complete implementation and test of acc_malloc/acc_memcpy_from_device/acc_memcpy_to_device wrappers for R8P, R4P, I8P, I4P, I1P of all rank (up to rank 7). (cc0c05e)

  • Update readme (8be29a0)
  • Complete free routines (52a4dbc)
  • Update readme (2330b8e)
  • Add some runtime routines for multi devices

Add some runtime routines for multi devices

Side effects: the acc_get_num_devices seems to not work on my laptop (b872b57)

  • Rename library before integrate OpenMP offloading

Rename library before integrate OpenMP offloading (cc767fc)

  • Refactor library in modules

Refactor library in modules: split library in modules and exploit pre-processor to switch between OpenACC/OpenMP. (3c369a5)

  • Adopt dst-src convention for memcpy

Adopt dst-src convention for memcpy: even if passing out-dummy before in-dummy sounds odd to me I have changed the memcpy API convention passing dst pointer (out) before src pointer (in). (dc98bd1)

  • OpenMP backend improvement

OpenMP backend improvement: add many runtime routines of the OpenMP backend.

Seamless Unified API improvement:

  • dev_alloc unified API
  • dev_free unified API
  • dev_memcpy NOT yet unified
  • device handling NOT yet unified

OpenMP backend has currently only one test (aside the trivial use test), the alloc_free test. (bc6af93)

  • Unified API almost complete (for the basic aims)

Unified API almost complete (for the basic aims): device handling, memory allocation and copy are now unified, the seamless integration of OpenACC and OpenMP backends is near to be completed.

A taste examples has been added and reported into the README (also improved with some documentation).

All the tests can now be compiled with both backends, but only OpenACC are also been executed and checked, the OpenMP compiled tests are not yet executed on Intel GPUs. (d1a5326)

  • Fix bugs in openmp backend

Fix bugs in openmp backend and add script and FoBiS rules to automatically run tests. (7b66e60)

  • Update readme (8f29a1c)
  • Partially support GNU gfortran and improve documentation

Partially support GNU gfortran and improve documentation (7ee2b2f)

  • Improve fobos, doc and laplace case study

Improve fobos, doc and laplace case study (983056e)

  • Add host fallback backend

Add host fallback backend: refactor library sources, all backends defined in single files without splitting for backends exploiting few cpp macros; added host fallback backend wichi allocate/deallocate/copy memory on host instead of device (with some support for OpenMP multi thread parallelization on host).

Many of the magic things happens including fundal.H macros definition. (2ead60c)

  • Add MPI test

Add MPI test: compile, but do not work as expected (at least completely). (4f8f02a)

  • Change tags mpi test (243d939)
  • Implement unstructured memory approach suggested by Mat Colgrove (NVIDIA DEV Mod)

Add unstructured memory approach as suggested by Mat: FUNDAL API expanded to enable the "unstructured memory approach" exploiting "acc data enter/exit/update" pragmas encapsulated in runtime routines equivalent to the "pointer memory approach".

Why:

Overcoming NVIDIA SDK bugs on MPI comm with pointer memory approach

Side effects:

Unified API incomplete: the equivalent OpenMP unstructured memory still missing. (0f343f3)

  • Add OpenMP directives to unstructured routines (9e854bb)
  • Fix variable names in some OpenMP pragmas (9bd16e9)
  • Add fobos oac-mpi-nvf-debug rules (3ed3408)
  • Add deviceptr multi GPU MPI test

Add deviceptr multi GPU MPI test.

Note: for RTX 4070 GPUs the following envinronment setting is necessary

export UCX_MEMTYPE_CACHE=n

See https://forums.developer.nvidia.com/t/mpi-send-openacc-acc-malloc-fail-with-nvfortran-but-work-with-c/305839/1 (4509d93)

  • Add dev_init method in device handling

Add dev_init method in device handling (305bbf7)