FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

FABE13-HX is a high-performance C math library that delivers ultra-fast trigonometric functions (sin, cos, sincos) using advanced SIMD vectorization. Powered by the innovative Ψ-Hyperbasis algorithm, it outperforms traditional math libraries by up to 8.4× while maintaining high precision.

🚀 Why Choose FABE13-HX for Your Numerical Computing Needs

FABE13-HX revolutionizes trigonometric computation for:

Machine Learning & AI Acceleration - Optimize neural network performance
Scientific Simulations & HPC - Accelerate physics, engineering, and computational modeling
Real-time Signal Processing - Enhance DSP, audio, and sensor data analysis
Graphics & Visualization Systems - Improve rendering performance
Embedded Computing - Efficient performance on resource-constrained systems

💡 Key Features & Performance Benefits

⚡ Up to 8.4× Faster Than Standard Math Libraries across various platforms and input sizes
🔄 Cross-Architecture Optimization with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
🎯 High Precision with maximum error ≤ 2e-11 compared to standard libm
🧠 Novel Rational-Function Architecture based on Ψ-Hyperbasis instead of traditional polynomials
🔢 Extreme-Range Support accurate up to |x| ≈ 1e308 via advanced Payne–Hanek reduction
🧩 Unified API for both scalar and vectorized operations
🛡️ Robust Error Handling with proper NaN/Inf/0 behavior

Designed for numerical computing, AI acceleration, and scientific simulation, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.

📂 Project Structure

fabe13/                 # Core source
├── fabe13.c            # HX implementation
├── fabe13.h            # Public API
├── benchmark_fabe13.c  # Benchmark main

tests/
└── test_fabe13.c       # Optional unit tests

CMakeLists.txt          # Cross-platform CMake
Makefile                # Minimalist legacy build
build.sh                # Recommended build script (cross-platform)

⚙️ Build Instructions

✅ Recommended: `build.sh`

./build.sh

This script:

Cleans and configures the build (Release mode)
Enables both benchmarking and testing
Compiles using aggressive -Ofast, -ffast-math, -march=native flags
Runs all unit tests and benchmarks automatically

🛠️ Manual CMake

mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmark

🧱 Makefile (Legacy)

make all
make run-benchmark

🚀 FABE13-HX vs libm — Performance Benchmarks

FABE13-HX delivers consistent speedups over standard libm, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.

📊 Performance Overview

🟨 FABE13-HX: SIMD-accelerated (AVX2+FMA, Ψ-core)
🔴 libm: Standard C math (math.h)
🧠 Input size: N ∈ [10 ... 1,000,000,000] doubles
⚙️ Timing: Full-array sincos() throughput
📐 Aligned memory: 64 bytes
🎯 Accuracy: ≤ 2e-11 max diff (sin/cos)

🌐 Replit (Cloud / Linux, AVX2 Clang)

✅ FABE13-HX is consistently faster than libm — up to 8.4× for large inputs.

Platform: Replit Linux
SIMD: AVX2 + FMA
Compiler: Clang 14 (nix)
libm: GNU math.h

🍎 MacBook Pro (macOS AVX2, AppleClang)

🟨 FABE13-HX outperforms libm with up to 8.4× higher throughput on AppleClang (AVX2).

Platform: macOS 14.x (MacBook Pro 16")
SIMD: AVX2 + FMA
Compiler: AppleClang 16.0
libm: macOS system math.h

📊 Performance Overview

FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes

📈 Scaling with Array Size

8.4× throughput improvement for large array processing compared to standard libm

ARM64/AArch64 Performance (NEON)

Array Size	FABE13 (sec)	Libm (sec)	FABE13 (M ops/sec)	Libm (M ops/sec)	Speedup
10	0.0000	0.0000	50.00	50.00	1.00x
100	0.0000	0.0000	166.67	71.43	2.33x
1,000	0.0000	0.0000	185.19	72.46	2.56x
10,000	0.0001	0.0001	173.01	71.02	2.44x
100,000	0.0006	0.0009	177.12	115.82	1.53x
1,000,000	0.0016	0.0072	614.85	138.34	4.44x
10,000,000	0.0164	0.0720	611.30	138.95	4.40x
100,000,000	0.1673	0.7296	597.63	137.07	4.36x
1,000,000,000	1.8044	10.4989	554.19	95.25	5.82x

🔍 Detailed Benchmark Snapshot (N = 1,000,000)

FABE13:  0.0016 sec  |  614.85 M ops/sec
libm:    0.0072 sec  |  138.34 M ops/sec
Speedup: 4.44x

Memory: Allocated 0.04 GB
        Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU:    100.0% utilization for both implementations

Max diff vs libm: sin=1.224e-11, cos=1.225e-11

🔬 Precision Analysis

All test cases maintain acceptable numerical accuracy compared to libm
Maximum difference observed: ~10⁻¹¹ for both sin and cos operations
Properly handles edge cases (0, inf, nan) with correct behavior

🔬 Core Algorithm (Ψ-Hyperbasis)

// Core rational transformation
Ψ(x) = x / (1 + (3/8)x²)

// sin(x) approximation
sin(x) ≈ Ψ ⋅ (1 - a1⋅Ψ² + a2⋅Ψ⁴ - a3⋅Ψ⁶)

// cos(x) approximation
cos(x) ≈ 1 - b1⋅Ψ² + b2⋅Ψ⁴ - b3⋅Ψ⁶

This allows both functions to share a unified base, optimizing performance and memory access.

📊 Public API

#include "fabe13/fabe13.h"

// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x);  // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x);  // [-1, 1]
double fabe13_acos(double x);  // [-1, 1]

// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);

🧠 Design Highlights

✅ Branchless Quadrant Correction
✅ NaN/Inf/0-safe logic
✅ Prefetch-friendly & unrolled scalar fallback
✅ SIMD-ready backend design (NEON / AVX2 / AVX512)
✅ Precision-preserving range reduction

🔭 Future Development Roadmap

Extended SIMD Ψ-Hyperbasis implementation (AVX2 / NEON / AVX512)
Additional functions: cosm1, expm1, log1p with Ψ-Hyperbasis optimization
Single-precision float32 support (fabe13_sinf, etc.)
Ultra-fast LUT-based variants for performance-critical applications
Language bindings for Python, Rust, and C++
Documentation and examples for common use cases

📜 License

🧬 Author

Faruk Alpay
https://Frontier2075.com
https://lightcap.ai

FABE13-HX is part of the Lightcap Initiative — building the most precise and elegant math primitives in open source.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
fabe13-old		fabe13-old
fabe13		fabe13
img		img
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
MakeFile		MakeFile
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

🚀 Why Choose FABE13-HX for Your Numerical Computing Needs

💡 Key Features & Performance Benefits

📂 Project Structure

⚙️ Build Instructions

✅ Recommended: `build.sh`

🛠️ Manual CMake

🧱 Makefile (Legacy)

🚀 FABE13-HX vs libm — Performance Benchmarks

📊 Performance Overview

🌐 Replit (Cloud / Linux, AVX2 Clang)

🍎 MacBook Pro (macOS AVX2, AppleClang)

📊 Performance Overview

📈 Scaling with Array Size

ARM64/AArch64 Performance (NEON)

🔍 Detailed Benchmark Snapshot (N = 1,000,000)

🔬 Precision Analysis

🔬 Core Algorithm (Ψ-Hyperbasis)

📊 Public API

🧠 Design Highlights

🔭 Future Development Roadmap

📜 License

🧬 Author

About

Releases

Packages

Languages

farukalpay/FABE

Folders and files

Latest commit

History

Repository files navigation

FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

🚀 Why Choose FABE13-HX for Your Numerical Computing Needs

💡 Key Features & Performance Benefits

📂 Project Structure

⚙️ Build Instructions

✅ Recommended: build.sh

🛠️ Manual CMake

🧱 Makefile (Legacy)

🚀 FABE13-HX vs libm — Performance Benchmarks

📊 Performance Overview

🌐 Replit (Cloud / Linux, AVX2 Clang)

🍎 MacBook Pro (macOS AVX2, AppleClang)

📊 Performance Overview

📈 Scaling with Array Size

ARM64/AArch64 Performance (NEON)

🔍 Detailed Benchmark Snapshot (N = 1,000,000)

🔬 Precision Analysis

🔬 Core Algorithm (Ψ-Hyperbasis)

📊 Public API

🧠 Design Highlights

🔭 Future Development Roadmap

📜 License

🧬 Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

✅ Recommended: `build.sh`

Packages