enter search term and/or author name
The Parallel Unstructured Mesh Infrastructure (PUMI) is designed to support the representation of, and operations on, unstructured meshes as needed for the execution of mesh-based simulations on massively parallel computers. In PUMI, the mesh...
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
Ahmad Abdelfattah, David Keyes, Hatem Ltaief
Article No.: 18
KBLAS is an open-source, high-performance library that provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs. Since performance of dense matrix-vector multiplication is hindered by the overhead of memory...
A Radix-Independent Error Analysis of the Cornea-Harrison-Tang Method
Article No.: 19
Assuming floating-point arithmetic with a fused multiply-add operation and rounding to nearest, the Cornea-Harrison-Tang method aims to evaluate expressions of the form ab + cd with high relative accuracy. In this article, we provide...
Matrix Multiplication Over Word-Size Modular Rings Using Approximate Formulas
Brice Boyer, Jean-Guillaume Dumas
Article No.: 20
Bini-Capovani-Lotti-Romani approximate formula (or border rank) for matrix multiplication achieves a better complexity than Strassen’s matrix multiplication formula. In this article, we show a novel way to use the approximate formula in the...
A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure
Shen Wang, Xiaoye S. Li, François-Henry Rouet, Jianlin Xia, Maarten V. De Hoop
Article No.: 21
We present a structured parallel geometry-based multifrontal sparse solver using hierarchically semiseparable (HSS) representations and exploiting the inherent low-rank structures. Parallel strategies for nested dissection ordering (taking low...
An Efficient Hybrid Algorithm for the Separable Convex Quadratic Knapsack Problem
Timothy A. Davis, William W. Hager, James T. Hungerford
Article No.: 22
This article considers the problem of minimizing a convex, separable quadratic function subject to a knapsack constraint and a box constraint. An algorithm called NAPHEAP has been developed to solve this problem. The algorithm solves the...
Algorithm 960: POLYNOMIAL: An Object-Oriented Matlab Library of Fast and Efficient Algorithms for Polynomials
Jorge Delgado, Juan Manuel Peña
Article No.: 23
The design and implementation of a Matlab object-oriented software library for working with polynomials is presented. The construction and evaluation of polynomials in Bernstein form are motivated and justified. Efficient constructions for the...
Algorithm 961: Fortran 77 Subroutines for the Solution of Skew-Hamiltonian/Hamiltonian Eigenproblems
Peter Benner, Vasile Sima, Matthias Voigt
Article No.: 24
Skew-Hamiltonian/Hamiltonian matrix pencils λ S &mins; H appear in many applications, including linear-quadratic optimal control problems, H&infty;-optimization, certain multibody systems, and many other areas in applied mathematics,...
Algorithm 962: BACOLI: B-spline Adaptive Collocation Software for PDEs with Interpolation-Based Spatial Error Control
Jack Pew, Zhi Li, Paul Muir
Article No.: 25
BACOL and BACOLR are (Fortran 77) B-spline adaptive collocation packages for the numerical solution of 1D parabolic Partial Differential Equations (PDEs). The packages have been shown to be superior to other similar packages, especially for...
Remark on “Algorithm 916: Computing the Faddeyeva and Voigt Functions”: Efficiency Improvements and Fortran Translation
Mofreh R. Zaghloul
Article No.: 26
This remark describes efficiency improvements to Algorithm 916 [Zaghloul and Ali 2011]. It is shown that the execution time required by the algorithm, when run at its highest accuracy, may be improved by more than a factor of 2. A better...