Skip to content

High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.

Notifications You must be signed in to change notification settings

farukalpay/FABE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

License Build Platform SIMD

FABE13-HX is a high-performance C math library that delivers ultra-fast trigonometric functions (sin, cos, sincos) using advanced SIMD vectorization. Powered by the innovative Ξ¨-Hyperbasis algorithm, it outperforms traditional math libraries by up to 8.4Γ— while maintaining high precision.

πŸš€ Why Choose FABE13-HX for Your Numerical Computing Needs

FABE13-HX revolutionizes trigonometric computation for:

  • Machine Learning & AI Acceleration - Optimize neural network performance
  • Scientific Simulations & HPC - Accelerate physics, engineering, and computational modeling
  • Real-time Signal Processing - Enhance DSP, audio, and sensor data analysis
  • Graphics & Visualization Systems - Improve rendering performance
  • Embedded Computing - Efficient performance on resource-constrained systems

πŸ’‘ Key Features & Performance Benefits

  • ⚑ Up to 8.4Γ— Faster Than Standard Math Libraries across various platforms and input sizes
  • πŸ”„ Cross-Architecture Optimization with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
  • 🎯 High Precision with maximum error ≀ 2e-11 compared to standard libm
  • 🧠 Novel Rational-Function Architecture based on Ξ¨-Hyperbasis instead of traditional polynomials
  • πŸ”’ Extreme-Range Support accurate up to |x| β‰ˆ 1e308 via advanced Payne–Hanek reduction
  • 🧩 Unified API for both scalar and vectorized operations
  • πŸ›‘οΈ Robust Error Handling with proper NaN/Inf/0 behavior

Designed for numerical computing, AI acceleration, and scientific simulation, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.


πŸ“‚ Project Structure

fabe13/                 # Core source
β”œβ”€β”€ fabe13.c            # HX implementation
β”œβ”€β”€ fabe13.h            # Public API
β”œβ”€β”€ benchmark_fabe13.c  # Benchmark main

tests/
└── test_fabe13.c       # Optional unit tests

CMakeLists.txt          # Cross-platform CMake
Makefile                # Minimalist legacy build
build.sh                # Recommended build script (cross-platform)

βš™οΈ Build Instructions

βœ… Recommended: build.sh

./build.sh

This script:

  • Cleans and configures the build (Release mode)
  • Enables both benchmarking and testing
  • Compiles using aggressive -Ofast, -ffast-math, -march=native flags
  • Runs all unit tests and benchmarks automatically

πŸ› οΈ Manual CMake

mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmark

🧱 Makefile (Legacy)

make all
make run-benchmark

πŸš€ FABE13-HX vs libm β€” Performance Benchmarks

FABE13-HX delivers consistent speedups over standard libm, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.

πŸ“Š Performance Overview

  • 🟨 FABE13-HX: SIMD-accelerated (AVX2+FMA, Ξ¨-core)
  • πŸ”΄ libm: Standard C math (math.h)
  • 🧠 Input size: N ∈ [10 ... 1,000,000,000] doubles
  • βš™οΈ Timing: Full-array sincos() throughput
  • πŸ“ Aligned memory: 64 bytes
  • 🎯 Accuracy: ≀ 2e-11 max diff (sin/cos)

🌐 Replit (Cloud / Linux, AVX2 Clang)

FABE13-HX vs libm β€” Replit

βœ… FABE13-HX is consistently faster than libm β€” up to 8.4Γ— for large inputs.

  • Platform: Replit Linux
  • SIMD: AVX2 + FMA
  • Compiler: Clang 14 (nix)
  • libm: GNU math.h

🍎 MacBook Pro (macOS AVX2, AppleClang)

FABE13-HX vs libm β€” macOS

🟨 FABE13-HX outperforms libm with up to 8.4Γ— higher throughput on AppleClang (AVX2).

  • Platform: macOS 14.x (MacBook Pro 16")
  • SIMD: AVX2 + FMA
  • Compiler: AppleClang 16.0
  • libm: macOS system math.h

πŸ“Š Performance Overview

FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes

πŸ“ˆ Scaling with Array Size

8.4Γ— throughput improvement for large array processing compared to standard libm

ARM64/AArch64 Performance (NEON)

Array Size FABE13 (sec) Libm (sec) FABE13 (M ops/sec) Libm (M ops/sec) Speedup
10 0.0000 0.0000 50.00 50.00 1.00x
100 0.0000 0.0000 166.67 71.43 2.33x
1,000 0.0000 0.0000 185.19 72.46 2.56x
10,000 0.0001 0.0001 173.01 71.02 2.44x
100,000 0.0006 0.0009 177.12 115.82 1.53x
1,000,000 0.0016 0.0072 614.85 138.34 4.44x
10,000,000 0.0164 0.0720 611.30 138.95 4.40x
100,000,000 0.1673 0.7296 597.63 137.07 4.36x
1,000,000,000 1.8044 10.4989 554.19 95.25 5.82x

πŸ” Detailed Benchmark Snapshot (N = 1,000,000)

FABE13:  0.0016 sec  |  614.85 M ops/sec
libm:    0.0072 sec  |  138.34 M ops/sec
Speedup: 4.44x

Memory: Allocated 0.04 GB
        Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU:    100.0% utilization for both implementations

Max diff vs libm: sin=1.224e-11, cos=1.225e-11

πŸ”¬ Precision Analysis

  • All test cases maintain acceptable numerical accuracy compared to libm
  • Maximum difference observed: ~10⁻¹¹ for both sin and cos operations
  • Properly handles edge cases (0, inf, nan) with correct behavior

πŸ”¬ Core Algorithm (Ξ¨-Hyperbasis)

// Core rational transformation
Ξ¨(x) = x / (1 + (3/8)xΒ²)

// sin(x) approximation
sin(x) β‰ˆ Ξ¨ β‹… (1 - a1⋅Ψ² + a2⋅Ψ⁴ - a3⋅Ψ⁢)

// cos(x) approximation
cos(x) β‰ˆ 1 - b1⋅Ψ² + b2⋅Ψ⁴ - b3⋅Ψ⁢

This allows both functions to share a unified base, optimizing performance and memory access.


πŸ“Š Public API

#include "fabe13/fabe13.h"

// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x);  // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x);  // [-1, 1]
double fabe13_acos(double x);  // [-1, 1]

// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);

🧠 Design Highlights

  • βœ… Branchless Quadrant Correction
  • βœ… NaN/Inf/0-safe logic
  • βœ… Prefetch-friendly & unrolled scalar fallback
  • βœ… SIMD-ready backend design (NEON / AVX2 / AVX512)
  • βœ… Precision-preserving range reduction

πŸ”­ Future Development Roadmap

  • Extended SIMD Ξ¨-Hyperbasis implementation (AVX2 / NEON / AVX512)
  • Additional functions: cosm1, expm1, log1p with Ξ¨-Hyperbasis optimization
  • Single-precision float32 support (fabe13_sinf, etc.)
  • Ultra-fast LUT-based variants for performance-critical applications
  • Language bindings for Python, Rust, and C++
  • Documentation and examples for common use cases

πŸ“œ License

MIT License Β© 2025 Faruk Alpay
See LICENSE


🧬 Author

Faruk Alpay
https://Frontier2075.com
https://lightcap.ai

FABE13-HX is part of the Lightcap Initiative β€” building the most precise and elegant math primitives in open source.

About

High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published