Search ACM DL

Search Issue

enter search term and/or author name

**PUMI**: Parallel Unstructured Mesh Infrastructure

Daniel A. Ibanez, E. Seegyoung Seol, Cameron W. Smith, Mark S. Shephard

Article No.: 17

DOI: 10.1145/2814935

The Parallel Unstructured Mesh Infrastructure (PUMI) is designed to support the representation of, and operations on, unstructured meshes as needed for the execution of mesh-based simulations on massively parallel computers. In PUMI, the mesh...

**KBLAS**: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators

Ahmad Abdelfattah, David Keyes, Hatem Ltaief

Article No.: 18

DOI: 10.1145/2818311

KBLAS is an open-source, high-performance library that provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs. Since performance of dense matrix-vector multiplication is hindered by the overhead of memory...

**A Radix-Independent Error Analysis of the Cornea-Harrison-Tang Method**

Claude-Pierre Jeannerod

Article No.: 19

DOI: 10.1145/2824252

Assuming floating-point arithmetic with a fused multiply-add operation and rounding to nearest, the Cornea-Harrison-Tang method aims to evaluate expressions of the form *ab* + *cd* with high relative accuracy. In this article, we provide...

**Matrix Multiplication Over Word-Size Modular Rings Using Approximate Formulas**

Brice Boyer, Jean-Guillaume Dumas

Article No.: 20

DOI: 10.1145/2829947

Bini-Capovani-Lotti-Romani approximate formula (or border rank) for matrix multiplication achieves a better complexity than Strassen’s matrix multiplication formula. In this article, we show a novel way to use the approximate formula in the...

**A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure**

Shen Wang, Xiaoye S. Li, François-Henry Rouet, Jianlin Xia, Maarten V. De Hoop

Article No.: 21

DOI: 10.1145/2830569

We present a structured parallel geometry-based multifrontal sparse solver using hierarchically semiseparable (HSS) representations and exploiting the inherent low-rank structures. Parallel strategies for nested dissection ordering (taking low...

**An Efficient Hybrid Algorithm for the Separable Convex Quadratic Knapsack Problem**

Timothy A. Davis, William W. Hager, James T. Hungerford

Article No.: 22

DOI: 10.1145/2828635

This article considers the problem of minimizing a convex, separable quadratic function subject to a knapsack constraint and a box constraint. An algorithm called NAPHEAP has been developed to solve this problem. The algorithm solves the...

**Algorithm 960**: POLYNOMIAL: An Object-Oriented Matlab Library of Fast and Efficient Algorithms for Polynomials

Jorge Delgado, Juan Manuel Peña

Article No.: 23

DOI: 10.1145/2814567

The design and implementation of a Matlab object-oriented software library for working with polynomials is presented. The construction and evaluation of polynomials in Bernstein form are motivated and justified. Efficient constructions for the...

**Algorithm 961**: Fortran 77 Subroutines for the Solution of Skew-Hamiltonian/Hamiltonian Eigenproblems

Peter Benner, Vasile Sima, Matthias Voigt

Article No.: 24

DOI: 10.1145/2818313

Skew-Hamiltonian/Hamiltonian matrix pencils λ S &mins; H appear in many applications, including linear-quadratic optimal control problems, H&infty;-optimization, certain multibody systems, and many other areas in applied mathematics,...

**Algorithm 962**: BACOLI: B-spline Adaptive Collocation Software for PDEs with Interpolation-Based Spatial Error Control

Jack Pew, Zhi Li, Paul Muir

Article No.: 25

DOI: 10.1145/2818312

BACOL and BACOLR are (Fortran 77) B-spline adaptive collocation packages for the numerical solution of 1D parabolic Partial Differential Equations (PDEs). The packages have been shown to be superior to other similar packages, especially for...

**Remark on “Algorithm 916: Computing the Faddeyeva and Voigt Functions”**: Efficiency Improvements and Fortran Translation

Mofreh R. Zaghloul

Article No.: 26

DOI: 10.1145/2806884

This remark describes efficiency improvements to Algorithm 916 [Zaghloul and Ali 2011]. It is shown that the execution time required by the algorithm, when run at its highest accuracy, may be improved by more than a factor of 2. A better...