org.apfloat.internal (Apfloat Documentation)

package org.apfloat.internal

Default implementations of the apfloat Service Provider Interface (SPI).

The org.apfloat.internal package contains four different implementations of the apfloat SPI, each based on a different primitive element type:

LongBuilderFactory, based on element type long: This is the default implementation used by apfloat. It uses the 64-bit long integer as the elementary type for all data storage and manipulation. It usually is faster than the int version on 64-bit JVMs, which is mostly the case today. In some places it uses also double arithmetic, so the processor should be able to perform double-precision floating point operations as well as convert between double and long, for decent performance. For example, on x86-64 and SPARC the 64-bit long version is faster than the 32-bit int version. You can use the long implementation on 32-bit platforms too, however the performance per element is less than half of the int version, even if roughly twice as much data is processed per element. The upside is that this implementation can do much bigger calculations: up to about 3.5 * 10¹⁵ digits in radix 10.
IntBuilderFactory, based on element type int: It works well for 32-bit platforms that perform integer operations fast (including integer multiplication), and can multiply doubles and convert between double and int with adequate performance. This applies to most workstations today (Intel x86 processors and compatibles, in particular processors with SSE2 support, and most RISC architectures). You can do calculations up to roughly 226 million digits (in radix 10) with this implementation, which should be enough for most purposes.
DoubleBuilderFactory, based on element type double: This implementation exists generally only as a curiosity. It will typically perform worse than the long version, and it's only able to do calculations with about 1/20 of its maximum digit length. The only situation where using the double version might make sense is on a platform that performs floating-point arithmetic well, but performs integer arithmetic extremely badly. Finding such a platform today might be difficult, so generally it's advisable to use the long version instead, if you have a 64-bit platform or need the most extreme precision.
FloatBuilderFactory, based on element type float: This version is also only a curiosity. The main downside is that it can only perform calculations up to about 1.3 million radix-10 digits. The per-digit performance is also typically less than that of the int version. Unless you have a computer that performs floating-point arithmetic extraordinarily well compared to integer arithmetic, it's always advisable to use the long or int version instead.

For example, the relative performance of the above implementations on some CPUs is as follows (bigger percentage means better performance):

Relative performance of implementations
Type	Pentium 4	Athlon XP	Athlon 64 (32-bit)	Athlon 64 (64-bit)	UltraSPARC II
Int	100%	100%	100%	100%	100%
Long	40%	76%	59%	95%	132%
Double	45%	63%	59%	94%	120%
Float	40%	43%	46%	42%	82%

(Test was done with apfloat 1.1 using Sun's Java 5.0 server VM calculating π to one million digits with no disk storage.)

Compared to the java.math.BigInteger class with different digit sizes, the apfloat relative performance with the same CPUs is as follows:

Apfloat and BigInteger comparison

(Test was done with apfloat 1.1 using Sun's Java 5.0 server VM calculating 3ⁿ and converting the result to decimal.)

This benchmark suggests that for small numbers – less than roughly 200 decimal digits in size – the BigInteger / BigDecimal classes are probably faster, even by an order of magnitude. Using apfloats is only beneficial for numbers that have at least a couple hundred digits, or of course if some mathematical functions are needed that are not available for BigIntegers or BigDecimals. The results can be easily explained by the smaller overhead that BigIntegers have due to their simpler implementation. When the size of the mantissa grows, the O(n log n) complexity of apfloat's FFT-based multiplication makes apfloat considerably faster than the steady O(n²) implementation of the BigInteger class. For numbers with millions of digits, multiplication using BigIntegers would be simply unfeasible, whereas for apfloat it would not be a problem at all.

All of the above apfloat implementations have the following features (some of the links point to the int version, but all four versions have similar classes):

Depending on the size, numbers can be stored in memory (IntMemoryDataStorage) or on disk (IntDiskDataStorage).
Multiplication can be done in an optimized way if one multiplicand has size 1 (IntShortConvolutionStrategy), using a simple O(n²) long multiplication algorithm for small numbers, with low overhead (IntMediumConvolutionStrategy), using the Karatsuba multiplication algorithm for slightly larger numbers, with some more overhead (IntKaratsubaConvolutionStrategy), or using a Number Theoretic Transform (NTT) done using three different moduli, and the final result calculated using the Chinese Remainder Theorem (ThreeNTTConvolutionStrategy), for big numbers.
Different NTT algorithms for different transform lengths: basic fast NTT (IntTableFNTStrategy) when the entire transform fits in the processor cache, "six-step" NTT when the transform fits in the main memory (SixStepFNTStrategy), and a disk-based "two-pass" NTT strategy when the whole transform doesn't fit in the available memory (TwoPassFNTStrategy).

The apfloat implementation-specific exceptions being thrown by the apfloat library all extend the base class ApfloatInternalException. This exception, or various subclasses can be thrown in different situations, for example:

Backing storage failure. For example, if a number is stored on disk, an IOException can be thrown in any of the disk operations, if e.g. a file can't be created, or written to if the disk is full.
Operands have different radixes. This is a limitation allowed by the specification.
Other internal limitation, e.g. the maximum transform length mathematically possible for the implementation, is exceeded.

Note in particular that numbers, which take a lot of space are stored on disk in temporary files. These files have by default the extension *.ap and they are by default created in the current working directory. When the objects are garbage collected, the temporary files are deleted. However, garbage collection may not work perfectly at all times, and in general there are no guarantees that it will happen at all. So, depending on the program being executed, it may be beneficial to explicitly call System.gc() at some point to ensure that unused temporary files are deleted. However, VM vendors generally warn against doing this too often, since it may seriously degrade performance. So, figuring out how to optimally call it may be difficult. If the file deletion fails for some reason, some temporary files may be left on disk after the program exits. These files can be safely removed after the program has terminated.

Many parts of the program are parallelized i.e. are processed with multiple threads in parallel. Parallelization is done where it has been easy to implement and where it is efficient. E.g. the "six-step" NTT is parallelized, because the data is in matrix form in memory and it's easy and highly efficient to process the rows of the matrix in parallel. Other places where parallelization is implemented are the in-place multiplication of transform results and the carry-CRT operation. However in both of these algorithms the process is parallelized only if the data is in memory - if the data was stored on disk then the irregular disk seeking could make the parallel algorithm highly inefficient.

Many sections of the code are not parallelized, where it's obvious that parallelization would not bring any benefits. Examples of such cases are addition, subtraction and matrix transposition. While parallel algorithms for these operations could certainly be implemented, they would not bring any performance improvement. The bottleneck in these operations is memory or I/O bandwidth and not CPU processing time. The CPU processing in addition and subtraction is highly trivial; in matrix transposition it's outright nonexistent - the algorithm only moves data from one place to another. Even if all the data was stored in memory, the memory bandwidth would be the bottleneck. E.g. in addition, the algorithm only needs a few CPU cycles per element to be processed. However moving the data from main memory to CPU registers and back to main memory needs likely significantly more CPU cycles than the addition operation itself. Parallelization would therefore not improve efficiency at all - the total CPU load might appear to increase but when measured in wall-clock time the execution would not be any faster.

Since the core functionality of the apfloat implementation is based on the original C++ version of apfloat, no significant new algorithms have been added (although the architecture has been otherwise greatly beautified e.g. by separating the different implementations behind a SPI, and applying all kinds of patterns everywhere). Thus, there are no different implementations for e.g. using a floating-point FFT instead of a NTT, as the SPI (org.apfloat.spi) might suggest. However the default implementation does implement all the patterns suggested by the SPI – in fact the SPI was designed for the default implementation.

The class diagram for an example apfloat that is stored on disk is shown below. Note that all the aggregate classes can be shared by multiple objects that point to the same instance. For example, multiple Apfloats can point to the same ApfloatImpl, multiple ApfloatImpls can point to the same DataStorage etc. This sharing happens in various situations, e.g. by calling floor(), multiplying by one etc:

Implementation class diagram

The sequence diagram for creating a new apfloat that is stored on disk is as follows. Note that the FileStorage class is a private inner class of the DiskDataStorage class:

New Apfloat sequence diagram

The sequence diagram for multiplying two apfloats is as follows. In this case a NTT based convolution is used, and the resulting apfloat is stored in memory:

Multiplication sequence diagram

Most of the files in the apfloat implementations are generated from templates where a template tag is replaced by int/long/float/double or Int/Long/Float/Double. Also the byte size of the element type is templatized and replaced by 4/8/4/8. The only files that are individually implemented for each element type are:

BaseMath.java
CRTMath.java
ElementaryModMath.java
ModConstants.java

See Also:

Related Packages

Package

Description

org.apfloat

The apfloat Application Programming Interface (API).

org.apfloat.spi

The apfloat Service Provider Interface (SPI).
Class

Description

AbstractConvolutionBuilder

Abstract base class for creating convolutions of suitable type for the specified length.

AbstractDataStorageBuilder

Abstract base class for a data storage creation strategy.

AbstractNTTBuilder

Abstract base class for creating Number Theoretic Transforms suitable for the specified length, based on available memory configured in the ApfloatContext.

AbstractStepFNTStrategy

Abstract superclass for step-based FNT strategies.

ApfloatInternalException

Exception indicating some unexpected apfloat implementation specific error situation.

BackingStorageException

Exception indicating a backing storage failure.

DiskDataStorage

Abstract base class for disk-based data storage, containing the common functionality independent of the element type.

DoubleAdditionBuilder

Creates additions for the specified radix and the double element type.

DoubleAdditionStrategy

Basic addition strategy for the double element type.

DoubleApfloatBuilder

Builder class for building ApfloatImpl implementations with the double data element type.

DoubleApfloatImpl

Immutable apfloat implementation class for the double data element type.

DoubleBaseMath

Mathematical operations on numbers in a base.

DoubleBuilderFactory

Factory class for getting instances of the various builder classes needed to build an ApfloatImpl with the double data element type.

DoubleCarryCRTBuilder

Creates carry-CRT related objects, for the double type.

DoubleCarryCRTStepStrategy

Class for performing the final steps of a three-modulus Number Theoretic Transform based convolution.

DoubleConstants

Constants needed for various algorithms for the double type.

DoubleConvolutionBuilder

Creates convolutions of suitable type for the double type.

DoubleCRTMath

Basic arithmetic for calculating the Chinese Remainder Theorem.

DoubleDataStorageBuilder

Default data storage creation strategy for the double data type.

DoubleDiskDataStorage

Disk-based data storage for the double element type.

DoubleElementaryModMath

Elementary modulo arithmetic functions for double data.

DoubleFactor3NTTStepStrategy

Steps for the factor-3 NTT.

DoubleKaratsubaConvolutionStrategy

Convolution strategy using the Karatsuba algorithm.

DoubleMatrixBuilder

Creates matrix operations objects, for the double type.

DoubleMatrixStrategy

Optimized matrix transposition methods for the double type.

DoubleMediumConvolutionStrategy

Medium-length convolution strategy.

DoubleMemoryArrayAccess

Array access class based on a double[].

DoubleMemoryDataStorage

Memory based data storage implementation for the double element type.

DoubleModConstants

Constants needed for various modular arithmetic operations for the double type.

DoubleModMath

Modulo arithmetic functions for double data.

DoubleNTTBuilder

Creates Number Theoretic Transforms for the double type.

DoubleNTTConvolutionStepStrategy

Steps of a three-NTT convolution for the double type.

DoubleNTTStepStrategy

Common methods to calculate Fast Number Theoretic Transforms in parallel using multiple threads.

DoubleRadixConstants

Constants related to different radixes for the double data type.

DoubleScramble

Functions to perform bit-reverse ordering of double data.

DoubleShortConvolutionStrategy

Short convolution strategy.

DoubleTableFNT

Fast Number Theoretic Transform that uses lookup tables for powers of n:th root of unity and permutation indexes.

DoubleTableFNTStrategy

Fast Number Theoretic Transform strategy that uses lookup tables for powers of n:th root of unity and permutation indexes.

DoubleWTables

Helper class for generating and caching tables of powers of the n:th root of unity.

Factor3NTTStrategy

A transform that implements a 3-point transform on top of another Number Theoretic Transform that does transforms of length 2ⁿ.

FloatAdditionBuilder

Creates additions for the specified radix and the float element type.

FloatAdditionStrategy

Basic addition strategy for the float element type.

FloatApfloatBuilder

Builder class for building ApfloatImpl implementations with the float data element type.

FloatApfloatImpl

Immutable apfloat implementation class for the float data element type.

FloatBaseMath

Mathematical operations on numbers in a base.

FloatBuilderFactory

Factory class for getting instances of the various builder classes needed to build an ApfloatImpl with the float data element type.

FloatCarryCRTBuilder

Creates carry-CRT related objects, for the float type.

FloatCarryCRTStepStrategy

Class for performing the final steps of a three-modulus Number Theoretic Transform based convolution.

FloatConstants

Constants needed for various algorithms for the float type.

FloatConvolutionBuilder

Creates convolutions of suitable type for the float type.

FloatCRTMath

Basic arithmetic for calculating the Chinese Remainder Theorem.

FloatDataStorageBuilder

Default data storage creation strategy for the float data type.

FloatDiskDataStorage

Disk-based data storage for the float element type.

FloatElementaryModMath

Elementary modulo arithmetic functions for float data.

FloatFactor3NTTStepStrategy

Steps for the factor-3 NTT.

FloatKaratsubaConvolutionStrategy

Convolution strategy using the Karatsuba algorithm.

FloatMatrixBuilder

Creates matrix operations objects, for the float type.

FloatMatrixStrategy

Optimized matrix transposition methods for the float type.

FloatMediumConvolutionStrategy

Medium-length convolution strategy.

FloatMemoryArrayAccess

Array access class based on a float[].

FloatMemoryDataStorage

Memory based data storage implementation for the float element type.

FloatModConstants

Constants needed for various modular arithmetic operations for the float type.

FloatModMath

Modulo arithmetic functions for float data.

FloatNTTBuilder

Creates Number Theoretic Transforms for the float type.

FloatNTTConvolutionStepStrategy

Steps of a three-NTT convolution for the float type.

FloatNTTStepStrategy

Common methods to calculate Fast Number Theoretic Transforms in parallel using multiple threads.

FloatRadixConstants

Constants related to different radixes for the float data type.

FloatScramble

Functions to perform bit-reverse ordering of float data.

FloatShortConvolutionStrategy

Short convolution strategy.

FloatTableFNT

Fast Number Theoretic Transform that uses lookup tables for powers of n:th root of unity and permutation indexes.

FloatTableFNTStrategy

Fast Number Theoretic Transform strategy that uses lookup tables for powers of n:th root of unity and permutation indexes.

FloatWTables

Helper class for generating and caching tables of powers of the n:th root of unity.

ImplementationMismatchException

Exception indicating a different implementation of the apfloat SPI being used in two operands of a calculation.

IntAdditionBuilder

Creates additions for the specified radix and the int element type.

IntAdditionStrategy

Basic addition strategy for the int element type.

IntApfloatBuilder

Builder class for building ApfloatImpl implementations with the int data element type.

IntApfloatImpl

Immutable apfloat implementation class for the int data element type.

IntBaseMath

Mathematical operations on numbers in a base.

IntBuilderFactory

Factory class for getting instances of the various builder classes needed to build an ApfloatImpl with the int data element type.

IntCarryCRTBuilder

Creates carry-CRT related objects, for the int type.

IntCarryCRTStepStrategy

Class for performing the final steps of a three-modulus Number Theoretic Transform based convolution.

IntConstants

Constants needed for various algorithms for the int type.

IntConvolutionBuilder

Creates convolutions of suitable type for the int type.

IntCRTMath

Basic arithmetic for calculating the Chinese Remainder Theorem.

IntDataStorageBuilder

Default data storage creation strategy for the int data type.

IntDiskDataStorage

Disk-based data storage for the int element type.

IntElementaryModMath

Elementary modulo arithmetic functions for int data.

IntFactor3NTTStepStrategy

Steps for the factor-3 NTT.

IntKaratsubaConvolutionStrategy

Convolution strategy using the Karatsuba algorithm.

IntMatrixBuilder

Creates matrix operations objects, for the int type.

IntMatrixStrategy

Optimized matrix transposition methods for the int type.

IntMediumConvolutionStrategy

Medium-length convolution strategy.

IntMemoryArrayAccess

Array access class based on a int[].

IntMemoryDataStorage

Memory based data storage implementation for the int element type.

IntModConstants

Constants needed for various modular arithmetic operations for the int type.

IntModMath

Modulo arithmetic functions for int data.

IntNTTBuilder

Creates Number Theoretic Transforms for the int type.

IntNTTConvolutionStepStrategy

Steps of a three-NTT convolution for the int type.

IntNTTStepStrategy

Common methods to calculate Fast Number Theoretic Transforms in parallel using multiple threads.

IntRadixConstants

Constants related to different radixes for the int data type.

IntScramble

Functions to perform bit-reverse ordering of int data.

IntShortConvolutionStrategy

Short convolution strategy.

IntTableFNT

Fast Number Theoretic Transform that uses lookup tables for powers of n:th root of unity and permutation indexes.

IntTableFNTStrategy

Fast Number Theoretic Transform strategy that uses lookup tables for powers of n:th root of unity and permutation indexes.

IntWTables

Helper class for generating and caching tables of powers of the n:th root of unity.

LongAdditionBuilder

Creates additions for the specified radix and the long element type.

LongAdditionStrategy

Basic addition strategy for the long element type.

LongApfloatBuilder

Builder class for building ApfloatImpl implementations with the long data element type.

LongApfloatImpl

Immutable apfloat implementation class for the long data element type.

LongBaseMath

Mathematical operations on numbers in a base.

LongBuilderFactory

Factory class for getting instances of the various builder classes needed to build an ApfloatImpl with the long data element type.

LongCarryCRTBuilder

Creates carry-CRT related objects, for the long type.

LongCarryCRTStepStrategy

Class for performing the final steps of a three-modulus Number Theoretic Transform based convolution.

LongConstants

Constants needed for various algorithms for the long type.

LongConvolutionBuilder

Creates convolutions of suitable type for the long type.

LongCRTMath

Basic arithmetic for calculating the Chinese Remainder Theorem.

LongDataStorageBuilder

Default data storage creation strategy for the long data type.

LongDiskDataStorage

Disk-based data storage for the long element type.

LongElementaryModMath

Elementary modulo arithmetic functions for long data.

LongFactor3NTTStepStrategy

Steps for the factor-3 NTT.

LongKaratsubaConvolutionStrategy

Convolution strategy using the Karatsuba algorithm.

LongMatrixBuilder

Creates matrix operations objects, for the long type.

LongMatrixStrategy

Optimized matrix transposition methods for the long type.

LongMediumConvolutionStrategy

Medium-length convolution strategy.

LongMemoryArrayAccess

Array access class based on a long[].

LongMemoryDataStorage

Memory based data storage implementation for the long element type.

LongModConstants

Constants needed for various modular arithmetic operations for the long type.

LongModMath

Modulo arithmetic functions for long data.

LongNTTBuilder

Creates Number Theoretic Transforms for the long type.

LongNTTConvolutionStepStrategy

Steps of a three-NTT convolution for the long type.

LongNTTStepStrategy

Common methods to calculate Fast Number Theoretic Transforms in parallel using multiple threads.

LongRadixConstants

Constants related to different radixes for the long data type.

LongScramble

Functions to perform bit-reverse ordering of long data.

LongShortConvolutionStrategy

Short convolution strategy.

LongTableFNT

Fast Number Theoretic Transform that uses lookup tables for powers of n:th root of unity and permutation indexes.

LongTableFNTStrategy

Fast Number Theoretic Transform strategy that uses lookup tables for powers of n:th root of unity and permutation indexes.

LongWTables

Helper class for generating and caching tables of powers of the n:th root of unity.

MessagePasser<K,V>

Message passing helper class for parallel codes.

ParallelExecutionBuilder

Returns an execution strategy using the ParallelRunner.

ParallelExecutionStrategy

Execution strategy using the ParallelRunner.

Parallelizable

Any task that can use a ParallelRunner to execute operations in parallel.

ParallelRunnable

Abstract class for a Runnable that can be run in parallel by multiple threads.

ParallelRunner

Class for running ParallelRunnable objects in parallel using multiple threads.

ParallelThreeNTTConvolutionStrategy

Convolution using three Number Theoretic Transforms and the CRT to get the final result, using multiple threads in parallel.

RadixMismatchException

Exception indicating a different radix being used in two operands of a calculation.

Scramble

Functions to perform bit-reverse ordering of data.

SixStepFNTStrategy

Fast Number Theoretic Transform that uses a "six-step" algorithm to calculate a long transform more efficiently on cache-based memory architectures.

StepCarryCRTStrategy

Class for performing the final step of a three-modulus Number Theoretic Transform based convolution.

ThreeNTTConvolutionStrategy

Convolution using three Number Theoretic Transforms and the Chinese Remainder Theorem to get the final result.

TransformLengthExceededException

Exception indicating that the "size" of the numbers used in a multiplication is too large.

TwoPassFNTStrategy

Fast Number Theoretic Transform that uses a "two-pass" algorithm to calculate a very long transform on data that resides on a mass storage device.

Package org.apfloat.internal