FABE13-HX is a high-performance C math library that delivers ultra-fast trigonometric functions (sin
, cos
, sincos
) using advanced SIMD vectorization. Powered by the innovative Ξ¨-Hyperbasis algorithm, it outperforms traditional math libraries by up to 8.4Γ while maintaining high precision.
FABE13-HX revolutionizes trigonometric computation for:
- Machine Learning & AI Acceleration - Optimize neural network performance
- Scientific Simulations & HPC - Accelerate physics, engineering, and computational modeling
- Real-time Signal Processing - Enhance DSP, audio, and sensor data analysis
- Graphics & Visualization Systems - Improve rendering performance
- Embedded Computing - Efficient performance on resource-constrained systems
- β‘ Up to 8.4Γ Faster Than Standard Math Libraries across various platforms and input sizes
- π Cross-Architecture Optimization with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
- π― High Precision with maximum error β€ 2e-11 compared to standard libm
- π§ Novel Rational-Function Architecture based on Ξ¨-Hyperbasis instead of traditional polynomials
- π’ Extreme-Range Support accurate up to |x| β 1e308 via advanced PayneβHanek reduction
- π§© Unified API for both scalar and vectorized operations
- π‘οΈ Robust Error Handling with proper NaN/Inf/0 behavior
Designed for numerical computing, AI acceleration, and scientific simulation, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.
fabe13/ # Core source
βββ fabe13.c # HX implementation
βββ fabe13.h # Public API
βββ benchmark_fabe13.c # Benchmark main
tests/
βββ test_fabe13.c # Optional unit tests
CMakeLists.txt # Cross-platform CMake
Makefile # Minimalist legacy build
build.sh # Recommended build script (cross-platform)
./build.sh
This script:
- Cleans and configures the build (Release mode)
- Enables both benchmarking and testing
- Compiles using aggressive
-Ofast
,-ffast-math
,-march=native
flags - Runs all unit tests and benchmarks automatically
mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmark
make all
make run-benchmark
FABE13-HX delivers consistent speedups over standard libm
, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.
- π¨ FABE13-HX: SIMD-accelerated (
AVX2+FMA
, Ξ¨-core) - π΄ libm: Standard C math (
math.h
) - π§ Input size:
N β [10 ... 1,000,000,000]
doubles - βοΈ Timing: Full-array
sincos()
throughput - π Aligned memory: 64 bytes
- π― Accuracy: β€ 2e-11 max diff (sin/cos)
β FABE13-HX is consistently faster than libm β up to 8.4Γ for large inputs.
- Platform: Replit Linux
- SIMD: AVX2 + FMA
- Compiler: Clang 14 (nix)
- libm: GNU
math.h
π¨ FABE13-HX outperforms libm with up to 8.4Γ higher throughput on AppleClang (AVX2).
- Platform: macOS 14.x (MacBook Pro 16")
- SIMD: AVX2 + FMA
- Compiler: AppleClang 16.0
- libm: macOS system
math.h
FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes
8.4Γ throughput improvement for large array processing compared to standard libm
Array Size | FABE13 (sec) | Libm (sec) | FABE13 (M ops/sec) | Libm (M ops/sec) | Speedup |
---|---|---|---|---|---|
10 | 0.0000 | 0.0000 | 50.00 | 50.00 | 1.00x |
100 | 0.0000 | 0.0000 | 166.67 | 71.43 | 2.33x |
1,000 | 0.0000 | 0.0000 | 185.19 | 72.46 | 2.56x |
10,000 | 0.0001 | 0.0001 | 173.01 | 71.02 | 2.44x |
100,000 | 0.0006 | 0.0009 | 177.12 | 115.82 | 1.53x |
1,000,000 | 0.0016 | 0.0072 | 614.85 | 138.34 | 4.44x |
10,000,000 | 0.0164 | 0.0720 | 611.30 | 138.95 | 4.40x |
100,000,000 | 0.1673 | 0.7296 | 597.63 | 137.07 | 4.36x |
1,000,000,000 | 1.8044 | 10.4989 | 554.19 | 95.25 | 5.82x |
FABE13: 0.0016 sec | 614.85 M ops/sec
libm: 0.0072 sec | 138.34 M ops/sec
Speedup: 4.44x
Memory: Allocated 0.04 GB
Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU: 100.0% utilization for both implementations
Max diff vs libm: sin=1.224e-11, cos=1.225e-11
- All test cases maintain acceptable numerical accuracy compared to libm
- Maximum difference observed: ~10β»ΒΉΒΉ for both sin and cos operations
- Properly handles edge cases (0, inf, nan) with correct behavior
// Core rational transformation
Ξ¨(x) = x / (1 + (3/8)xΒ²)
// sin(x) approximation
sin(x) β Ξ¨ β
(1 - a1β
Ψ² + a2β
Ξ¨β΄ - a3β
Ξ¨βΆ)
// cos(x) approximation
cos(x) β 1 - b1β
Ψ² + b2β
Ξ¨β΄ - b3β
Ξ¨βΆ
This allows both functions to share a unified base, optimizing performance and memory access.
#include "fabe13/fabe13.h"
// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x); // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x); // [-1, 1]
double fabe13_acos(double x); // [-1, 1]
// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);
- β Branchless Quadrant Correction
- β NaN/Inf/0-safe logic
- β Prefetch-friendly & unrolled scalar fallback
- β SIMD-ready backend design (NEON / AVX2 / AVX512)
- β Precision-preserving range reduction
- Extended SIMD Ξ¨-Hyperbasis implementation (AVX2 / NEON / AVX512)
- Additional functions:
cosm1
,expm1
,log1p
with Ξ¨-Hyperbasis optimization - Single-precision
float32
support (fabe13_sinf
, etc.) - Ultra-fast LUT-based variants for performance-critical applications
- Language bindings for Python, Rust, and C++
- Documentation and examples for common use cases
MIT License Β© 2025 Faruk Alpay
See LICENSE
Faruk Alpay
https://Frontier2075.com
https://lightcap.ai
FABE13-HX is part of the Lightcap Initiative β building the most precise and elegant math primitives in open source.