Pseudo-random number generators (PRNGs) have been playing an important role in both areas of computer simulation and computer security. Currently, there appears to be a huge divide between the types of PRNGs used in these two areas. For PRNGs in computer security applications, the security concern (in terms of forward and backward predictability) is extremely important. For PRNGs in computer simulation applications, the property of HELP is important; here HELP stands for high-dimensional equi-distribution, efficiency, long period-length, and portability. In recent years, there have been many PRNGs proposed in the area of computer simulation satisfying the nice property of HELP. However, most of them are linear generators, thus sharing the same weakness in predictability. The major aim of this paper is to propose a general class of secure generators, called SAFE (secure and fast encryption) generators, by properly mixing two baseline generators with the HELP property into a secure generator that still retains the HELP property. Specifically, we propose a general mutual-shuffling method applying to certain linear generators, such as the currently most popular MT19937 generator and large-order multiple recursive generators to construct secure PRNGS by breaking their linearity structure and masking the generated values.
We present high performance GPU implementations of matvec and compression operations for the H2-variant of hierarchical matrices. H2 matrices, an algebraic generalization of FMM, are space and time efficient representations of dense matrices that exploit the low rank structure of matrix blocks at different levels of granularity and employ both a hierarchical block partitioning and hierarchical bases for the block representations. These two operations are at the core of algebraic operations for hierarchical matrices, the matvec being a ubiquitous operation in numerical algorithms and compression representing a key building block for algebraic operations which require periodic recompression during execution. The difficulties in developing efficient GPU algorithms come primarily from the irregular tree data structures that underlie the hierarchical representations, and the key to performance is to expose fine grained parallelism by recasting the computations on flattened trees and marshaling the irregularly laid out data in ways that allow batched linear algebra operations to be performed. Our numerical results on covariance matrices from 2D and 3D problems from spatial statistics show the high efficiency our routines achieve: over 550 GB/s for the bandwidth-limited matrix-vector operation and over 850 GFLOPS/s for the compression operation on the P100 Pascal GPU.
The EASAL software implements a suite of algorithms for understanding the structure and geometric properties of rigid point sets that are pairwise constrained by distance intervals. The algorithms explore the configuration spaces of these constraint systems using classical results for stratification of configuration spaces, new results for efficient sampling by convex parametrization, and intuitive visualization. This paper provides a high level description of the state-of-the-art, key theoretical underpinnings, major algorithms and their implementation and of major applications such as free energy and kinetics of assembly of supramolecular structures or of clusters in colloidal and soft materials.
In this work we develop a validated numerics method for the solution of linear ordinary differential equations (LODEs). A wide range of algorithms (i.e., Runge-Kutta, collocation, spectral methods) exist for numerically computing approximations of the solutions. Most of these come with proofs of asymptotic convergence, but usually, provided error bounds are non-constructive. However, in some domains like critical systems and computer-aided mathematical proofs, one needs validated effective error bounds. We focus on both the theoretical and practical complexity analysis of a so-called a posteriori quasi-Newton validation method, which mainly relies on a fixed-point argument of a contracting map. Specifically, given a polynomial approximation, obtained by some numerical algorithm and expressed in Chebyshev basis, our algorithm efficiently computes an accurate and rigorous error bound. For this, we study theoretical properties like compactness, convergence, invertibility of associated linear integral operators and their truncations in a suitable coefficient space of Chebyshev series. Then, we analyze the almost-banded matrix structure of these operators, which allows for very efficient numerical algorithms for both numerical solutions of LODEs and rigorous computation of the approximation error. Finally, several representative examples show the advantages of our algorithms as well as their theoretical and practical limits.
Sparse matrix vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. Previous SpMV libraries are optimized by either application-specific or architecture-specific approach, and they're complicated to be used in real applications. In this work we develop an auto-tuning system (SMATER) to bridge the gap between specific optimizations and general-purpose use. SMATER provides programmers a unified interface based on the compressed sparse row (CSR) format by implicitly choosing the best format and the fastest implementation for any input sparse matrix in the runtime. SMATER leverages a machine learning model and a retargetable backend library to quickly predict the best combination. Performance parameters are extracted from 2386 matrices in UF sparse matrix collection. The experiments show that SMATER achieves the impressive performance while being portable on the state-of-the-art x86 multi-core processors, NVIDIA GPU and Intel Xeon Phi accelerators. Compared with Intel MKL library, SMATER runs faster by more than 3 times. We demonstrate its adaptivity in an algebraic multigrid solver from hypre library and report above 20% performance improvement.
BLASFEO is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization. A key difference with respect to existing high-performance implementations of BLAS is that the computational performance is optimized for small to medium scale matrices, i.e., for sizes up to a few hundred. BLASFEO comes with three different implementations: a high-performance implementation aiming at providing the highest performance for matrices itting in cache, a reference implementation providing portability and embeddability and optimized for very small matrices, and a rapper to standard BLAS and LAPACK providing high-performance on large matrices. The three implementations of BLASFEO ogether provide high-performance dense linear algebra routines for matrices ranging from very small to large. Compared to both pen-source and proprietary highly-tuned BLAS libraries, for matrices of size up to about one hundred the high-performance mplementation of BLASFEO is about 20-30% faster than the corresponding level 3 BLAS routines and 2-3 times faster than the corresponding LAPACK routines.
Riemannian optimization is the task of finding an optimum of a real-valued function defined on a Riemannian manifold. Riemannian optimization has been a topic of much interest over the past few years due to many applications including computer vision, signal processing, and numerical linear algebra. The substantial background required to successfully design and apply Riemannian optimization algorithms is a significant impediment for many potential users. Therefore, multiple packages, such as Manopt (in Matlab) and Pymanopt (in Python), have been developed. This paper describes ROPTLIB, a C++ library for Riemannian optimization. Unlike prior packages, ROPTLIB simultaneously achieves the following goals: i) it has user friendly interfaces in Matlab, Julia and C++; ii) users do not need to implement manifold- and algorithm related objects; iii) it provides efficient computational time due to its C++ core; iv) it implements state-of-the-art generic Riemannian optimization algorithms, including quasi-Newton algorithms; and v) it is based
on object-oriented programming, allowing users to rapidly add new algorithms and manifolds. The code and a manual can be downloaded from http://www.math.fsu.edu/
Permutation problems are combinatorial optimization problems whose solutions are naturally codified as permutations. Due to their complexity, motivated principally by the factorial cardinality of the search space of solutions, they have been a recurrent topic for the artificial intelligence and operations research community. Recently, among the vast number of metaheuristic algorithms, new advances on estimation of distribution algorithms (EDAs) have shown outstanding performance when solving some permutation problems. These novel EDAs implement distance-based exponential probability models such as the Mallows and Generalized Mallows models. In this paper, we present a Matlab package, perm_mateda, for estimation of distribution algorithms on permutation problems, which has been implemented as an extension to the Mateda-2.0 toolbox of EDAs. Particularly, we provide implementations of the Mallows and Generalized Mallows EDAs under the Kendall's-tau, Cayley, and Ulam distances. In addition, four classical permutation problems have been also implemented: Traveling Salesman Problem, Permutation Flowshop Scheduling Problem, Linear Ordering Problem, and Quadratic Assignment Problem.