ACM Transactions on

Mathematical Software (TOMS)

Latest Articles

High-performance Implementation of Elliptic Curve Cryptography Using Vector Instructions

Elliptic curve cryptosystems are considered an efficient alternative to conventional systems such as DSA and RSA. Recently, Montgomery and Edwards... (more)

Adjoint Code Design Patterns

Adjoint methods have become fundamental ingredients of the scientific computing toolbox over the past decades. Large-scale parameter sensitivity analysis, uncertainty quantification, and nonlinear optimization would otherwise turn out computationally infeasible. The symbolic derivation of adjoint mathematical models for relevant problems in science... (more)

Enclosing Chebyshev Expansions in Linear Time

We consider the problem of computing rigorous enclosures for polynomials represented in the Chebyshev basis. Our aim is to compare and develop algorithms with a linear complexity in terms of the polynomial degree. A first category of methods relies on a direct interval evaluation of the given Chebyshev expansion in which Chebyshev polynomials are... (more)

The Implementation of the Colored Abstract Simplicial Complex and Its Application to Mesh Generation

We introduce the Colored Abstract Simplicial Complex library (CASC): a new, modern, and header-only... (more)

Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators

We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with... (more)

Computing Hypergeometric Functions Rigorously

We present an efficient implementation of hypergeometric functions in arbitrary-precision interval arithmetic. The functions 0F1, 1F1, 2F1, and 2F0 (or the Kummer U-function) are supported for unrestricted complex parameters and argument, and, by extension, we cover exponential and trigonometric integrals, error functions, Fresnel integrals,... (more)

Tree Partitioning Reduction: A New Parallel Partition Method for Solving Tridiagonal Systems

Solving tridiagonal linear-equation systems is a fundamental computing kernel in a wide range of scientific and engineering applications, and its computation can be modeled with parallel algorithms. These parallel solvers are typically designed to compute problems whose data fit in a common shared-memory space where all the cores taking part in the... (more)

Improving the Flexibility and Robustness of Model-based Derivative-free Optimization Solvers

We present two software packages for derivative-free optimization (DFO): DFO-LS for nonlinear least-squares problems and Py-BOBYQA for general... (more)

Algorithm 995: An Efficient Parallel Anisotropic Delaunay Mesh Generator for Two-Dimensional Finite Element Analysis

A bottom-up approach to parallel anisotropic mesh generation is presented by building a mesh generator starting from the basic operations of vertex insertion and Delaunay triangles. Applications focusing on high-lift design or dynamic stall, or numerical methods and modeling test cases, still focus on two-dimensional domains. This automated... (more)

Algorithm 996: BBCPOP: A Sparse Doubly Nonnegative Relaxation of Polynomial Optimization Problems With Binary, Box, and Complementarity Constraints

The software package BBCPOP is a MATLAB implementation of a hierarchy of sparse doubly nonnegative relaxations of a class of polynomial optimization (minimization) problems (POPs) with binary, box, and complementarity (BBC) constraints. Given a POP in the class and a relaxation order, BBCPOP constructs a simple conic optimization problem (COP),... (more)

Algorithm 997: pySDC—Prototyping Spectral Deferred Corrections

In this article, we present the Python framework pySDC for solving collocation problems with spectral deferred correction (SDC) methods and their time-parallel variant PFASST, the parallel full approximation scheme in space and time. pySDC features many implementations of SDC and PFASST, from simple implicit timestepping to high-order... (more)

Algorithm 998: The Robust LMI Parser—A Toolbox to Construct LMI Conditions for Uncertain Systems

The ROLMIP (Robust LMI Parser) is a toolbox specialized in control theory for uncertain linear systems, built to work under MATLAB jointly with YALMIP, to ease the programming of sufficient Linear Matrix Inequality (LMI) conditions that, if feasible, assure the validity of parameter-dependent LMIs in the entire set of uncertainty considered. This... (more)


About TOMS

The purpose of the ACM Transactions on Mathematical Software (TOMS) is to communicate important research results addressing the development, evaluation and use of mathematical software...


TOMS Replicated Computational Results (RCR)

ACM TOMS has a process called Replicated Computational Results (RCR) in which replicating the computational data shown in an article is part of the review process. This effort is intended to assist in improving the quality and reproducibility of scientific publication for both TOMS and for the computational science community as a whole. Manuscripts that successfully complete the RCR process receive the RCR designation when published.

If you are interested in participating in this initiative, either as an author or reviewer, please contact the TOMS Editor-in-Chief. Details of the TOMS RCR Initiative are available here.

Welcome Video for Associate Editors

ACM has produced a short video explaining the role of new Associate Editors. Even if you aren't new to the job; it's still worth a view!


Forthcoming Articles
A Tight I/O Lower Bound for Matrix Multiplication

A tight lower bound for required I/O when computing a matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of phases needed to perform C:=AB, where each phase is a series of operations involving S reads and writes to and from fast memory, and S is the size of fast memory. A lower bound on the number of phases was then determined by obtaining an upper bound on the number of scalar multiplications performed per phase. This paper follows the same high level approach, but improves the lower bound by considering C:=AB+C instead of C:=AB, and obtains the maximum number of scalar fused multiply-adds (FMAs) per phase instead of scalar additions. Key to obtaining the new result is the decoupling of the per-phase I/O from the size of fast memory. The new lower bound is 2mnk/S - 2S where S is the size of fast memory. The constant for the leading term is an improvement of a factor 4/2. A theoretical algorithm that attains the lower bound is given, and how the state-of-the-art Goto's algorithm also in some sense meets the lower bound is discussed.

Algorithm XXX: Mongoose, A Graph Coarsening and Partitioning Library

Partitioning graphs is a common and useful operation in many areas, from parallel computing to VLSI design to sparse matrix algorithms. In this paper, we introduce Mongoose, a multilevel hybrid graph par- titioning algorithm and library. Building on previous work in multilevel partitioning frameworks and com- binatoric approaches, we introduce novel stall-reducing and stall-free coarsening strategies, as well as an efficient hybrid algorithm leveraging 1) traditional combinatoric methods and 2) continuous quadratic pro- gramming formulations. We demonstrate how this new hybrid algorithm outperforms either strategy in isolation, and we also compare Mongoose to METIS and demonstrate its effectiveness on large and social networking (power law) graphs.

Replicated Computational Results (RCR) Report for ?Code Generation for Generally Mapped Finite Elements?

?Code Generation for Generally Mapped Finite Elements? includes performance results for the finite element methods discussed in that manuscript. The authors provided a Zenodo archive with the Firedrake components and dependencies used, as well as the scripts that generated the results. The software was installed on two similar platforms; then, new results were gathered and compared to the original results. After completing this process, the results have been deemed replicable by the reviewer.

High-Performance Derivative Computations using CoDiPack

There are several AD tools available, which all implement different strategies for the reverse mode of AD. The major strategies are primal value taping (implemented e.g. by ADOL-c) and Jacobi taping (implemented e.g. by adept and dco/c++). Especially for Jacobi taping, recent advances by using expression templates make this approach very attractive for large scale software. The current implementations are either closed source or miss essential features and flexibility. Therefore, we present the new AD tool CoDiPack (Code Differentiation Package) in this paper. It is specifically designed for a minimal memory consumption and optimal runtime, such that it can be used for the differentiation of large scale software. An essential part of the design of CoDiPack is the modular layout and the recursive data structures, which do not only allow the efficient implementation of the Jacobi taping approach, but will also enable other approaches like the primal value taping or new research ideas. We will also present the performance value of CoDiPack on a generic PDE example and on the SU2 code.

On Kummer Lines With Full Rational 2-torsion and Their Usage in Cryptography

A paper by Karati and Sarkar at Asiacrypt'17 has pointed out the potential for Kummer lines in genus one, by observing that its SIMD-friendly arithmetic is competitive with the status quo. A more recent preprint explores the connection with (twisted) Edwards curves. In this paper we extend this work and significantly simplify their treatment. We show that their Kummer line is the \(x\)-line of a Montgomery curve translated by a point of order two, and exhibit a natural isomorphism to a twisted Edwards curve. Moreover, we show that the Kummer line presented by Gaudry and Lubicz can be obtained via the action of a point of order two on the \(y\)-line of an Edwards curve. The maps connecting these curves and lines are all very simple. As a result, a cryptographic implementation can use the arithmetic that is optimal for its instruction set at negligible cost.

IPscatt¿a MATLAB Toolbox for the Inverse Medium Problem in Scattering

IPscatt is a free toolbox in MATLAB that solves the inverse medium problem in scattering in two and three dimensions. It uses the Helmholtz equation as a physical model of scattering and provides point sources or plane waves as incident fields as well as simulated near or far field data with noise. Further, it can deal with real-world data in two dimensions from Institute Fresnel. A rapid variational regularization scheme minimizes the defect, i.e. the difference between the predicted data from the model and the given data, as well as penalty terms promoting sparsity, total variation and physical bounds. These regularization scheme relies on a primal-dual algorithm and was introduced in [F. Bürgel, K. S. Kazimierski, and A. Lechleiter 2017 Journal of Computational Physics, 339:130]. This paper provides a survey on the mathematical concepts, connects them with the implementation and shows the practical usage of the toolbox via guides.

Graph Coloring Based Parallel Push-Relabel Algorithm for the Maximum Flow Problem

The maximum flow problem is one of the most common network flow problems. This problem involves finding the maximum possible amount of flow between two designated nodes on a network with arcs having flow capacities. The push-relabel algorithm is one of the fastest algorithms to solve this problem. We present a shared memory parallel push-relabel algorithm. Graph coloring is used to avoid collisions between threads for concurrent push and relabel operations. In addition, excess values of target nodes are updated using atomic instructions to prevent race conditions. The experiments show that our algorithm is competitive for wide graphs with low diameters. Results from two different data sets are included, computer vision problems and DIMACS challenge problems. These are compared with existing push-relabel and pseudoflow implementations. We show that high speedup rates are possible using our coloring based parallelization technique on sparse networks. However, we also observe that the pseudoflow algorithm runs faster than the push-relabel algorithm on dense and long networks.

Product algebras for Galerkin discretisations of boundary integral operators and their applications

Operator products occur naturally in a range of regularized boundary integral equation formulations. However, while a Galerkin discretisation only depends on the domain space and the test (or dual) space of the operator, products require a notion of the range. In the boundary element software package Bempp we have implemented a complete operator algebra that depends on knowledge of the domain, range and test space. The aim was to develop a way of working with Galerkin operators in boundary element software that is as close to working with the strong form on paper as possible while hiding the complexities of Galerkin discretisations. In this paper, we demonstrate the implementation of this operator algebra and show, using various Laplace and Helmholtz example problems, how it significantly simplifies the definition and solution of a wide range of typical boundary integral equation problems.

Algorithm xxx: Fast and accurate evaluation of a generalized incomplete gamma function

We present a computational procedure to evaluate the integral ?xy sp-1 e-?s ds, for 0 ? x < y ?+?, ? = ±1, p > 0, which generalizes the lower (x=0) and upper (y=+?) incomplete gamma functions. To allow for large values of x, y, and p while avoiding under/overflow issues in the standard double precision floating point arithmetic, we use an explicit normalization that is much more efficient than the classical ratio with the complete gamma function. The generalized incomplete gamma function is estimated with continued fractions, integrations by parts, or, when x ? y, with Romberg numerical integration algorithm. We show that the accuracy reached by our algorithm improves a recent state-of-the-art method by two orders of magnitude, and is essentially optimal considering the limitations imposed by the floating point arithmetic. Moreover, the admissible parameter range of our algorithm (0 ? p,x,y ? 1015) is much larger than competing algorithms and its robustness is assessed through massive usage in an image processing application.

Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem

In this paper, we present an algorithm for the singular value decomposition (SVD) of a bidiagonal matrix by means of the eigenpairs of an associated symmetric tridiagonal matrix. The algorithm is particularly suited for the computation of a subset of singular values and corresponding vectors. We focus on a sequential version of the algorithm, and discuss special cases and implementation details. We use a large set of bidiagonal matrices to assess the accuracy of the implementation in single and double precision, as well as to identify potential shortcomings. We show that the algorithm can be up to three orders of magnitude faster than existing algorithms, which are limited to the computation of a full SVD. We also show time comparisons of an implementation that uses the strategy discussed in the paper as a building block for the computation of the SVD of general matrices.

A Wolfe line search algorithm for vector optimization

In a recent paper, Lucambio Pérez and Prudente extended the Wolfe conditions for the vector-valued optimization. Here, we propose a line search algorithm for finding a step size satisfying the strong Wolfe conditions in the vector optimization setting. Well definedness and finite termination results are provided. We discuss practical aspects related to the algorithm and present some numerical experiments illustrating its applicability. Codes supporting this paper are written in Fortran 90 and are freely available for download.

Code generation for generally mapped finite elements

Many classical finite elements such as the Argyris and Bell elements have long been absent from high-level PDE software. Building on recent theoretical work, we describe how to implement very general finite element transformations in FInAT and hence into the Firedrake finite element system. Numerical results evaluate the new elements, comparing them to existing methods for classical problems. For a second order model problem, we find that new elements give smooth solutions at a mild increase in cost over standard Lagrange elements. For fourth-order problems, however, the newly-enabled methods significantly outperform interior penalty formulations. We also give some advanced use cases, solving the nonlinear Cahn-Hilliard equation and some biharmonic eigenvalue problems (including Chladni plates) using $C^1$ discretizations.

Error analysis of some operations involved in the Cooley-Tukey Fast Fourier Transform

We are interested in obtaining error bounds for the classical FFT algorithm in floating-point arithmetic, for the 2-norm as well as for the infinity norm. For that purpose we also give some results on the relative error of the complex multiplication by a root of unity, and on the largest value that can take the real or imaginary part of one term of the FFT of a vector $x$, assuming that all terms of $x$ have real and imaginary parts less than some value $b$.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

We present FloatX (Float eXtended), a \CC framework to investigate the effect of leveraging customized floating-point formats in numerical applications. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back end to preserve efficiency, and incurs no storage overhead. The paper discusses in detail the design principles, programming interface and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX's usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.

PETSc DMNetwork: A Scalable Network PDE-Based Multiphysics Simulator

A scientific framework for simulations of large-scale networks, such as is required for the analysis of critical infrastructure interaction and interdependencies, is needed for applications on exascale computers. Such a framework must be able to manage heterogeneous physics and unstructured topology, and must be reusable. To this end we have developed DMNetwork, a class in PETSc that provides data and topology management and migration for network problems, along with multiphysics solvers to exploit the problem structure. It eases the application development cycle by providing the necessary infrastructure through simple abstrac- tions to define and query the network. This paper presents the design of the DMNetwork, illustrates its user interface, and demonstrates its ability to solve large network problems through the numerical simulation of a water pipe network with more than 2 billion variables on extreme-scale computers using up to 30,000 processor cores.

All ACM Journals | See Full Journal Index

Search TOMS
enter search term and/or author name