Appearance
Changelog
v1.0.6 (2026-03-02)
New features
- Add coverage analysis, install script, and project polish (
1edd81a)
v1.0.5 (2026-02-20)
Documentation
- Update Giacomo Rossi's email to AMD address (
c1aa19e)
v1.0.4 (2026-02-20)
CI/CD
- Switch to official github pages actions deployment (
8442804)
v1.0.3 (2026-02-20)
CI/CD
- Fix broken docs deployment workflow (
0fd44ca)
v1.0.2 (2026-02-20)
CI/CD
- Remove coverage analysis workflow and simplify CI pipeline (
61f1d18)
v1.0.1 (2026-02-20)
Bug fixes
Documentation
- Refactor README documentation, complete API description (
1bccb22)
Miscellaneous
- Update nvidia cc version for ADM (
0556bb0) - Add c_int (
4ee7dd9) - Remove unnecessary macro definitions (
61cf3c1) - Add AMD compiler temporary workaround for has_device_add (
41a2e90) - Add AMD Flang compiler and remove IFX compiler preprocessing macro (
6b46a9a) - Remove workaround for has_device_addr clause not supported by AMD compiler (
ae3c980) - Introduced map for mapped variable, has_device_addr for device pointers only (
d82a3b3) - Remove DEVICEVAR clause from OpenMP loops: ifx and gfortran preprocessors cannot expand VA_ARGS macro (
f997a23) - Add some compilers proofs tests (
4830b2b) - Support transposed assign for all ranks (
525d93f) - Exploit cpp preprocessor to reduce code verbosity (
d99e29f)
New features
- Add device memory get info method (
0c12b07) - Add assign procedures, change memcpy API, add I2P support (
f6be6ec) - Support transposed assign (
b7b31ad) - Support transposed assign (
bf7d7f3) - Add routine to save device memory status (
c857163) - Add makefile to build static library (with only NVF) (
32ea531) - Add CI pipelines, VitePress docs site, and release tooling (
eb42cf9)
Refactoring
- Refactor mpi handler object for adam (
5a6c0d9)
v1.0.0-alpha (2024-09-12)
Documentation
- Improve README documentation (
0889028)
Miscellaneous
- Start (
fba31c2) - Update readme (
2d35af2) - Clean memcpy test (
4b50d7d) - Clean memcpy test comments (
eb74517) - Add memcpy test for derived type memory
Add memcpy test for derived type memory (86dee45)
- Use fptr_dev in alloc routine as storage_size argument (
69ec34e) - Fix variable name (
11f7326) - Add openacc specifications doc (
2a65b9a) - Add all rank alloc for R8P and R4P
Add all rank alloc for R8P and R4P: add procedures for all ranks (up to 7) for R8P and R4P kinds, tested only alloc, memcpy to be tested. (ca573c2)
- Complete memcpy for all real kinds and ranks
Complete memcpy for all real kinds and ranks (3708a5c)
- Complete alloc/memcpy for reals and integers
Complete alloc/memcpy for reals and integers: complete implementation and test of acc_malloc/acc_memcpy_from_device/acc_memcpy_to_device wrappers for R8P, R4P, I8P, I4P, I1P of all rank (up to rank 7). (cc0c05e)
- Update readme (
8be29a0) - Complete free routines (
52a4dbc) - Update readme (
2330b8e) - Add some runtime routines for multi devices
Add some runtime routines for multi devices
Side effects: the acc_get_num_devices seems to not work on my laptop (b872b57)
- Rename library before integrate OpenMP offloading
Rename library before integrate OpenMP offloading (cc767fc)
- Refactor library in modules
Refactor library in modules: split library in modules and exploit pre-processor to switch between OpenACC/OpenMP. (3c369a5)
- Adopt dst-src convention for memcpy
Adopt dst-src convention for memcpy: even if passing out-dummy before in-dummy sounds odd to me I have changed the memcpy API convention passing dst pointer (out) before src pointer (in). (dc98bd1)
- OpenMP backend improvement
OpenMP backend improvement: add many runtime routines of the OpenMP backend.
Seamless Unified API improvement:
- dev_alloc unified API
- dev_free unified API
- dev_memcpy NOT yet unified
- device handling NOT yet unified
OpenMP backend has currently only one test (aside the trivial use test), the alloc_free test. (bc6af93)
- Unified API almost complete (for the basic aims)
Unified API almost complete (for the basic aims): device handling, memory allocation and copy are now unified, the seamless integration of OpenACC and OpenMP backends is near to be completed.
A taste examples has been added and reported into the README (also improved with some documentation).
All the tests can now be compiled with both backends, but only OpenACC are also been executed and checked, the OpenMP compiled tests are not yet executed on Intel GPUs. (d1a5326)
- Fix bugs in openmp backend
Fix bugs in openmp backend and add script and FoBiS rules to automatically run tests. (7b66e60)
- Update readme (
8f29a1c) - Partially support GNU gfortran and improve documentation
Partially support GNU gfortran and improve documentation (7ee2b2f)
- Improve fobos, doc and laplace case study
Improve fobos, doc and laplace case study (983056e)
- Add host fallback backend
Add host fallback backend: refactor library sources, all backends defined in single files without splitting for backends exploiting few cpp macros; added host fallback backend wichi allocate/deallocate/copy memory on host instead of device (with some support for OpenMP multi thread parallelization on host).
Many of the magic things happens including fundal.H macros definition. (2ead60c)
- Add MPI test
Add MPI test: compile, but do not work as expected (at least completely). (4f8f02a)
- Change tags mpi test (
243d939) - Implement unstructured memory approach suggested by Mat Colgrove (NVIDIA DEV Mod)
Add unstructured memory approach as suggested by Mat: FUNDAL API expanded to enable the "unstructured memory approach" exploiting "acc data enter/exit/update" pragmas encapsulated in runtime routines equivalent to the "pointer memory approach".
Why:
Overcoming NVIDIA SDK bugs on MPI comm with pointer memory approach
Side effects:
Unified API incomplete: the equivalent OpenMP unstructured memory still missing. (0f343f3)
- Add OpenMP directives to unstructured routines (
9e854bb) - Fix variable names in some OpenMP pragmas (
9bd16e9) - Add fobos oac-mpi-nvf-debug rules (
3ed3408) - Add deviceptr multi GPU MPI test
Add deviceptr multi GPU MPI test.
Note: for RTX 4070 GPUs the following envinronment setting is necessary
export UCX_MEMTYPE_CACHE=n
See https://forums.developer.nvidia.com/t/mpi-send-openacc-acc-malloc-fail-with-nvfortran-but-work-with-c/305839/1 (4509d93)
- Add dev_init method in device handling
Add dev_init method in device handling (305bbf7)
- Add MPI handler object (
3d35ad1)