PGI Workstation 提供三種語言版本：
- PGI Fortran Workstation—Fortran only
- PGI C/C++ Workstation—C and C++ only
- PGI Fortran/C/C++ Workstation—combined Fortran and C/C++
PGI Fortran Workstation包括The Portland Group 公司的parallelizing/optimizing Fortran 2003、FORTRAN 77 和HPF compilers等編譯器，用以支援Linux、Apple Mac OS X和Windows 工作站。PGI Fortran Workstation所提供的功能和品質，可以幫助發展出可靠與先進的科學應用程式。
PGI C/C++ Workstation包括The Portland Group 公司的parallelizing/optimizing OpenMP C++ 與ANSI C compilers編譯器。所有的C++功能與Fortran和C函數的功能是兼容的，所以您可以從上述的三種語言自由選擇編寫程式。
PGI Workstation 結合了 OpenMP 圖形化調試器 PGDBG 和 PGPROF 圖形化性能剖析器，允許使用者單獨在裝有 PGI 編譯器的工作站上進行編譯、調試和剖析。
另外PGI Workstation 還可以加選PGI Accelerator產品，詳細授權說明請參閱產品版本比較頁面。
PGI 2018 version 18.7
- PGI Accelerator Features and Enhancements
- OpenACC 3.0 true deep copy directives for aggregate data structures in Fortran, C and C++
- PGI Compiler Assisted Software Testing (PCAST) can detect when results diverge between CPU and GPU code versions
- Default CUDA version now matches installed CUDA Driver
- Changed the default NVIDIA GPU compute capability list
- Changed the default size for PGI_ACC_POOL_ALLOC_MINSIZE to 128B
All PGI Compilers
- Beta LLVM-based code generator for Linux/x86-64 and Linux/OpenPOWER platforms is now based on LLVM 6.0. Performance of generated executables using LLVM-based code generator average 15% faster on several important benchmarks.
- Interoperable with GNU versions up to and including GCC 8.1
- Fortran 2008 SUBMODULE support
- Assignments to allocatable variables changed to match Fortran 2003 semantics on both host and device
- When using LLVM, debug metadata generated for module variables
- Free-form source lines can now be up to 1000 characters
- All Fortran CHARACTER entities now 64-bit integers
OpenMP 4.5 (LLVM code generator only)
- Improved efficiency of combined "distribute parallel" loop constructs
- DWARF debug information for OpenMP thread private variables
- Changed the output of the pgaccelinfo utility
- Changed the output of the pgcpuid utility
Deprecations and Eliminations
- Dropped support for NVIDIA Fermi GPUs
- Last release to include bundled Microsoft toolchain components
Plus 33 user-requested enhancements and fixes
PGI 2018 Key Features
Accelerate Your HPC Applications with Tesla V100 GPUs
PGI OpenACC and CUDA Fortran now support CUDA 9.2 running on Tesla Volta GPUs. Tesla V100 offers more memory bandwidth, more streaming multiprocessors, next generation NVLink and new microarchitectural features that add up to better performance and programmability. For OpenACC and CUDA Fortran programmers, Tesla V100 offers improved hardware support and performance for CUDA Unified Memory features on both x86-64 and OpenPOWER processor-based systems. With PGI 2018, you get the best of both worlds — world-class CPU performance plus comprehensive GPU support.
PGI in the Cloud
PGI Community Edition compilers for Linux/x86-64 are now available as a container image on the NVIDIA GPU Cloud (NGC) and as an Amazon Machine Image (AMI) on the Amazon Web Services (AWS) Marketplace. These images provide OpenACC-enabled Fortran, C and C++ compilers supporting the latest multicore CPUs and NVIDIA GPUs including the Volta V100 family. NGC users can pull the PGI container to develop HPC applications on Alibaba Cloud, AWS, Google Cloud Platform, the Oracle Cloud Infrastructure or on local workstations and HPC systems. AWS users can run the PGI AMI on a variety of AWS-supported platforms. PGI in the Cloud is ideal for users who want to build, test, benchmark and run their own applications in the cloud using the latest NVIDIA GPUs, and for development and deployment of cloud-based parallel programming education and training.
PGI Auto-compare for OpenACC
Results can diverge between programs running on a CPU versus a GPU due to programming errors, precision of numerical intrinsics, or variations in compiler optimizations. This new compiler option in PGI 18.7 causes OpenACC compute regions to run redundantly on both the CPU and GPU. When data is copied from the GPU back to the CPU at data region boundaries, GPU results are compared with those computed on the CPU. Auto-compare works on both structured and unstructured data regions, with difference reports controlled by environment variables. With OpenACC Auto-compare you can quickly pinpoint where results start to diverge and adapt your program or compiler options as needed. Read more about the new auto-compare feature on the PGI Compiler Assisted Software Testing overview page.
OpenACC Deep Copy Directives
PGI 18.7 includes an implementation of the draft OpenACC 3.0 true deep copy directives in Fortran, C and C++. Many modern HPC applications make extensive use of deeply nested aggregate data structures - Fortran derived types, C++ classes and C structs. With true deep copy directives you can specify a subset of members to move between host and device memory within the declaration of an aggregate, including support for named policies that allow distinct sets of members to be copied at different points in a program. Once the deep copy pattern is defined, a single data clause (copy(a)) can be used to copy the selected members of the aggregate, including dynamically allocated members, some of which can themselves be aggregate structures with dynamically allocated members.
PGI 2018 compilers for Linux/x86-64 platforms include an optional LLVM-based code generator that delivers performance improvements of up to 15% on many HPC applications. OpenACC and CUDA Fortran are fully supported with the LLVM-based code generator, and it enables support for OpenMP 4.5 features on the latest multicore x86-64 and OpenPOWER CPUs. It can be invoked with a simple compiler command-line option, using compiler path settings, or using the environment modules commands included in PGI installations. The LLVM-based code generator will become the default on x86-64 targets in a future PGI release. Get started using it now to see improved multicore CPU performance, take advantage of the latest OpenMP features, and simplify migration to future PGI releases.
Support for the Latest CPUs
Multicore CPU performance remains one of the key strengths of the PGI compilers, which now support the latest generation of HPC CPUs including Intel Skylake, IBM POWER9 and AMD EPYC. PGI Fortran 2003, C11 and C++14 compilers deliver state-of-the-art SIMD vectorization and benefit from newly optimized single and double precision numerical intrinsic functions on Linux x86, Linux OpenPOWER, and macOS. See the benchmarks section for PGI 2018 performance results on a variety of HPC industry standard benchmarks.
Full OpenACC 2.6
All PGI compilers now support the latest OpenACC features on both Tesla GPUs and multicore CPUs. New OpenACC 2.6 features include manual deep copy directives, the serial compute construct, if_present clause in the host_data construct, no_create data clause, attach/detach clauses, acc_get_property API routines and improved support for Fortran optional arguments. Other OpenACC features added or enhanced include cache directive refinements and support for named constant arrays in Fortran modules.
OpenACC for CUDA Unified Memory
PGI compilers leverage Pascal and Volta GPU hardware features, NVLink and CUDA Unified Memory to simplify OpenACC programming on GPU-accelerated x86-64 and OpenPOWER processor-based servers. When OpenACC allocatable data is placed in CUDA Unified Memory, no explicit data movement or data directives are needed. This simplifies GPU acceleration of applications that make extensive use of allocatable data, and allows you to focus on parallelization and scalability of your algorithms. See the OpenACC and CUDA Unified Memory PGInsider post for details.
OpenMP 4.5 for Multicore CPUs
Previously available with PGI compilers for Linux/OpenPOWER, PGI 2018 introduces support for OpenMP 4.5 syntax and features in the PGI Fortran, C and C++ compilers on Linux/x86-64. You can now use PGI to compile OpenMP 4.5 programs for parallel execution across all the cores of a multicore CPU or server. TARGET regions are implemented with default support for the multicore host as the target, and PARALLEL and DISTRIBUTE loops are parallelized across all OpenMP threads.
New C++17 Features
Release 2018 of the PGI C++ compiler introduces partial support for the C++17 standard when compiling with ‑‑c++17 or ‑std=c++17. Supported C++17 core language features are available on all supported macOS versions and on Linux systems with GCC 5 or newer. New C++ language features include compile-time conditional statements (constexpr if), structured bindings, selection statements with initializers, fold expressions, inline variables, constexpr lambdas, and lambda capture of *this by value.
Intel AVX-512 CPU instructions available on the latest generation Skylake CPUs enable twice the number of floating point operations compared to the previous generation AVX2 SIMD instructions. At 512 bits wide, AVX-512 doubles both the register width and the total number of registers, and can help improve the performance of HPC applications.
PGI Unified Binary for Tesla and Multicore
Use OpenACC to build applications for both GPU acceleration and parallel execution across all the cores of a multicore server. When you run the application on a GPU-enabled system, the OpenACC regions will offload and execute on the GPU. When the same application executable is run on a system without GPUs installed, the OpenACC regions will be executed in parallel across all CPU cores in the system. If you develop commercial or production applications, now you can accelerate your code with OpenACC and deploy a single binary usable on any system, with or without GPUs.
Use C++14 Lambdas with Capture in OpenACC Regions
C++ lambda expressions provide a convenient way to define anonymous function objects at the location where they are invoked or passed as arguments. The auto type specifier can be applied to lambda parameters to create a polymorphic lambda-expression. With PGI compilers you can use lambdas in OpenACC compute regions in your C++ programs. Using lambdas with OpenACC is useful for a variety of reasons. One example is to drive code generation customized to different programming models or platforms. C++14 has opened up doors for more and more lambda use cases, especially for polymorphic lambdas, and all of those capabilities are now usable in your OpenACC programs.
Enhanced Profiling Features
New CPU Detail View shows a breakdown of the time spent on the CPU for each thread. Three call tree options allow you to profile based on caller, callee or by file and line number. View time for all threads together or individually, quickly sort events by min or max time, and more. Other new features include an option to adjust program counter sampling frequency, and an enhanced display showing the NVLink version of the NVLink topology.
- PGFORTRAN™ native OpenMP and auto-parallel Fortran 2003 compiler with CUDA extensions
- PGF77® native OpenMP and auto-parallel FORTRAN 77 compiler
- PGHPF® native data parallel compiler with full HPF language support (Linux only)
- PGCC® OpenMP and auto-parallel ANSI and K&R C99 compiler
- PGC++® OpenMP and auto-parallel C++ compiler with CUDA-x86 extensions
- PGDBG® OpenMP and MPI parallel graphical debugger
- PGPROF® OpenMP and MPI parallel graphical performance profiler
- Full support for the PGI Accelerator™ programming model on x64+GPU (PGFORTRAN and PGCC only)
- Full 64-bit support on multi-core AMD64 and Intel 64
- Intel 64 and AMD Opteron optimizations including SSE4.2/AVX, SSE4a/ABM, prefetching, use of extended register sets, and 64-bit addressing
- PGI Unified Binary™ technology combines into a single executable or object file code optimized for multiple AMD64 processors, Intel 64 processors or NVIDIA GPUs.
- Complete uniform development environment across 64-bit and 32-bit AMD and Intel processor-based systems running Linux, Mac OS X or Windows
- Full support for Fortran 2003
- Full support for ANSI C99
- Full support for OpenMP 3.0 on up to 256 cores
- Support for 64-bit integers (-r8/-i8 compilation flags)
- Highly tuned Intel MMX and SSE intrinsics library routines (C/C++ only)
- One pass interprocedural analysis (IPA)
- Interprocedural optimization of libraries
- Profile feedback optimization
- Function inlining including library functions
- Vectorization, loop interchange, loop splitting
- Memory hierarchy and memory allocation optimizations including huge pages support
- Loop unrolling, loop fusion, and cache tiling
- Enhanced auto-parallelization of loops specifically optimized for multi-core processors
- Concurrent subroutine call support
- Extensive vectorization/optimization directives/pragmas support
- State-of-the-art dependence analysis and global optimization
- Invariant conditional removal
- Tuning for non-uniform memory access (NUMA) architectures
- Process/CPU affinity support in SMP/OpenMP applications
Support for creating shared objects on Linux, dynamic libraries on Mac OS X and DLLs on Windows
- Tracking ANSI C++ Standard—EDG 4.1 C++ front-end
- C++ Class member templates
- C++ partial specialization and ordering
- C++ explicit template qualification
- C/C++ extended asm support
- GNU style template instantiation
- GNU linkonce support
- Integrated cpp pre-processing
- Cray/DEC/IBM extensions (including Cray POINTERs & DEC STRUCTURES/UNIONS)
- Support for SGI-compatible DOACROSS in PGF77 and PGF95
- Threads-based auto-parallelization using Fortran
- Threads-based auto-parallelization of FOR loops in C/C++
- Full native OpenMP parallelization directives in Fortran
- Full native OpenMP parallelization pragmas in C/C++
- Byte swapping I/O for RISC/UNIX interoperability
- Full support for Common Compiler Feedback Format compiler optimization listings
- User modules support simplifies switching between multiple compiler environments/versions
- Includes optimized ACML (LAPACK/BLAS/FFT) math library supported on all targets
- Supports multi-threaded execution with Intel Math Kernel Libraries (MKL) 10.1 and later
- Optional PGI compiled IMSL Fortran numerical library available
- UNIX-compatible build/edit environment for Windows, including the BASH shell, vi editor, make, tar, gzip, sed, grep, awk, and over 100 other shell commands!
- Pre-validated de facto standard support libraries including NetCDF, F95 OpenGL, ATLAS, ScaLAPACK, FFTW, MPICH, MPICH2 and LAM MPI
- Interoperable with TotalView* (Linux only) and Allinea DDT.
- Fully interoperable with gcc, g77, and gdb
- Unconditional 30 day money back guarantee
|64-bit OpenPOWER, 64-bit x86 (including AMD64 and Intel 64) processor-based workstation or server with one or more single core or multi-core microprocessors.|
NVIDIA CUDA-enabled GPU with compute capability 2.0 or later.
16 MB or more.
1.5 GB during installation, 700 MB to hold installed software.
Mouse or compatible pointing device for use of optional graphical user interfaces.
Adobe Acrobat Reader for viewing documentation.
|Feature||PGI Community Edition||PGI Professional Edition|
|Duration||1 year for each release||Perpetual|
|Releases per Year||1–2||6–9|
|Archive Release Access|
|GNU Compatible C++14|
|MPI Processes||16 local||up to 256 local or remote|
|Restricted Content Access||With sign up|
|Premier Service Option|
Note: The PGI Debugger and the PGI Profiler support up to 64 OpenMP threads.
* Effective with the PGI 2017 release, GPU support is no longer included with PGI products for macOS.
|Feature||PGI Workstation||PGI Server||PGI CDK||PGI Visual Fortran||Notes|
|Fortran & C/C++||2|
|OS X only|
|Multi-platform||volume packs only|
|Debugging with PGDBG®||16||16||64||256||16||16||3|
|Node-locked Single User|
|Multi-user Network Floating|
- On Windows, begining with the PGI 2016 release, the C/C++ only language option is not available.
- On Windows, begining with the PGI 2016 release, C++ is not included in the combined language package.
- All PGI products support debugging and profiling of up to 64 OpenMP threads.
The PGI Workstation, PGI Server and PGI CDK Cluster Development Kit license differences are;
- PGI Workstation—For deployment on a single node-locked system. This license allows up to 16 MPI processes to be debugged usingPGDBG as in mpirun -np 16 -dbg=pgdbg foo. All 16 processes must be running on the same system as PGDBG.
- PGI Server—For deployment on any number of network nodes. Options available for up to 2, 5, 10, 25 or 50 simultaneous users. This license also allows up to 16 MPI processes to be debugged using PGDBG (for example mpirun -np 16 -dbg=pgdbg foo). All 16 processes must be running on one node.
- PGI CDK—For deployment on any number of networked nodes. Options available for up to 2, 5, 10, 25 or 50 simultaneous users. This license allows for up to 64 or 256 MPI processes to be debugged using PGDBG and the processes can run on different cluster nodesas designated by the machines.LINUX file but only if the PGI CDK daemons have been installed.
Note that PGI CDK installer installs daemons on the cluster slave nodes that PGDBG uses for cluster debugging. The daemons are not included in the PGI Workstation or PGI Server installation packages.
Fortran and C/C++ for 64-bit x64 and 32-bit x86 processor-based systems.
PGI Server is PGI's multi-user scientific and engineering compiler and tool bundle for multi-user systems and workgroups. PGI Server is available in three language versions:
- PGI Fortran Server — Fortran only
- PGI C/C++ Server — C and C++ only
- PGI Fortran/C/C++ Server — combined Fortran and C/C++
PGI CDK Cluster Development Kit
Parallel Fortran, C and C++ Compilers & Tools for Programming HPC Clusters
In combination with the Linux or Windows HPC Server 2008 operating systems, the PGI CDK® Cluster Development Kit® compilers and development tools enable use of networked clusters of AMD or Intel x64 processor-based workstations and servers to tackle the largest scientific computing applications. For Linux, the PGI CDK includes pre-configured versions of MPI for Ethernet or InfiniBand, and a pre-configured batch queueing system. On Windows HPC Server 2008, the PGI CDK integrates with MSMPI and the job scheduler to enable development, debugging and tuning of high-performance MPI or hybrid MPI/OpenMP applications written in Fortran, C or C++.
PGI Visual Fortran
Parallel Fortran Compilers and Tools for Microsoft Windows
PGI Visual Fortran® (PVF®) brings the PGI suite of high-performance 64-bit and 32-bit parallel Fortran compilers to Microsoft Windows developers using Microsoft Visual Studio.
Using PGI Accelerator™ compilers, programmers can accelerate applications on x64+accelerator platforms by adding OpenACC compiler directives to existing high-level standard-compliant Fortran and C programs and then recompiling with appropriate compiler options.
IMSL Fortran Library
PGI offers the IMSL Fortran Numerical Library from Visual Numerics, Inc. (VNI) of PGI Visual Fortran and the Windows versions of PGI Workstation. The PGI IMSL Fortran library is restricted to using a maximum of four multi-core processors. IMSL libraries for other PGI languages, operating systems, release versions, seat and processor counts are also available.