Skip to content
/ FFAccel Public

A high-performance CUDA implementation of finite field arithmetic operations over prime fields F_p, optimized for cryptographic and mathematical computing applications.

License

Notifications You must be signed in to change notification settings

Kars07/FFAccel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CUDA Finite Field Operations

A high-performance CUDA implementation of finite field arithmetic operations over prime fields F_p, optimized for cryptographic and mathematical computing applications.

Features

  • Complete Finite Field Arithmetic: Addition, subtraction, multiplication, division, exponentiation, and modular inverse
  • GPU-Accelerated Operations: Massively parallel computation using CUDA kernels
  • Multiple Inverse Algorithms: Both Extended Euclidean Algorithm and Fermat's Little Theorem implementations
  • Batch Operations: Efficient array-based operations with optimized memory management
  • Matrix Operations: GPU-accelerated matrix multiplication and addition in finite fields
  • Polynomial Evaluation: Horner's method implementation for efficient polynomial computation
  • Performance Profiling: Integrated NVTX markers for detailed performance analysis
  • Production Ready: Comprehensive error handling and memory management

Requirements

  • CUDA Toolkit: Version 11.0 or higher
  • GPU: NVIDIA GPU with Compute Capability 3.5+
  • Compiler: nvcc with C++11 support
  • Libraries: NVTX3 for profiling (optional)

Installation & Building

# Clone the repository
git clone https://github.com/yourusername/cuda-finite-field.git
cd cuda-finite-field

# Compile with nvcc
nvcc -o finite_field finitefieldFp.cu -lnvToolsExt

# For debug build
nvcc -g -G -o finite_field_debug finitefieldFp.cu  -lnvToolsExt

# For optimized build
nvcc -O3 -o finite_field_optimized finitefieldFp.cu  -lnvToolsExt

Quick Start

#include "finitefieldFp.h"

int main() {
    const fp_t p = 2147483647; // Prime field F_p
    FiniteFieldFp ff(p);
    
    const int n = 1000000;
    fp_t *a = new fp_t[n];
    fp_t *b = new fp_t[n];
    fp_t *result = new fp_t[n];
    
    // Initialize your data
    for (int i = 0; i < n; i++) {
        a[i] = i % p;
        b[i] = (i * 2) % p;
    }
    
    // Perform batch operations
    ff.add_arrays(a, b, result, n);        // Addition
    ff.mul_arrays(a, b, result, n);        // Multiplication
    ff.inv_arrays(a, result, n);           // Modular inverse
    
    // Matrix operations
    ff.matrix_multiply(A, B, C, m, n, k);
    
    delete[] a; delete[] b; delete[] result;
    return 0;
}

Mathematical Operations

Basic Arithmetic

  • Addition: (a + b) mod p
  • Subtraction: (a - b) mod p
  • Multiplication: (a × b) mod p
  • Division: a × b⁻¹ mod p
  • Negation: (-a) mod p

Advanced Operations

  • Modular Exponentiation: a^b mod p (binary method)
  • Modular Inverse: a⁻¹ mod p (Extended Euclidean + Fermat's Little Theorem)
  • Polynomial Evaluation: Horner's method in F_p
  • Matrix Operations: Addition and multiplication in F_p

Performance Features

GPU Optimization

  • Coalesced Memory Access: Optimized memory access patterns
  • Constant Memory: Prime stored in constant memory for fast access
  • Optimal Thread Configuration: 256 threads per block for most operations
  • Memory Management: Efficient allocation and deallocation patterns

Profiling Integration

// Built-in NVTX profiling markers
nvtxRangePush("add_arrays");
// ... operation code ...
nvtxRangePop();

Use with NVIDIA Nsight Compute/Systems for detailed performance analysis.

Applications

Cryptography

  • Elliptic Curve Cryptography: Point operations on elliptic curves
  • RSA Operations: Modular exponentiation for encryption/decryption
  • Digital Signatures: DSA, ECDSA signature verification
  • Key Exchange: Diffie-Hellman key generation

Blockchain & Cryptocurrency

  • Hash Functions: Finite field operations in cryptographic hashes
  • Zero-Knowledge Proofs: zk-SNARKs and zk-STARKs implementations
  • Consensus Algorithms: Cryptographic verification at scale

Scientific Computing

  • Error-Correcting Codes: Reed-Solomon, BCH codes
  • Signal Processing: Finite field transforms
  • Computational Number Theory: Research applications

Benchmarks

Performance on NVIDIA RTX 4090:

  • 1M Element Addition: ~0.5ms
  • 1M Element Multiplication: ~0.8ms
  • 1M Element Inverse: ~15ms (Extended Euclidean)
  • 1M Element Inverse: ~12ms (Fermat's Little Theorem)

Benchmarks include memory transfer overhead

🔧 Configuration

Supported Data Types

typedef uint32_t fp_t;      // Field elements (32-bit)
typedef uint64_t fp_wide_t; // Intermediate calculations

Prime Selection

  • Default: 2147483647 (2³¹ - 1, Mersenne prime)
  • Supports any 32-bit prime
  • Ensure prime fits in fp_t data type

🐛 Error Handling

Comprehensive error checking with CUDA_CHECK macro:

#define CUDA_CHECK(call) do { \
    cudaError_t err = call; \
    if (err != cudaSuccess) { \
        printf("CUDA error at %s:%d - %s\n", __FILE__, __LINE__, cudaGetErrorString(err)); \
        exit(1); \
    } \
} while(0)

Testing

# Run the included test suite
./finite_field

# Expected output:
# Addition test: 0 + 0 = 0 (mod 2147483647)
# Multiplication test: 0 * 0 = 0 (mod 2147483647)
# Inverse test: inv(1) = 1 (mod 2147483647)

Performance Profiling

# Profile with Nsight Compute
ncu --set full ./finite_field

# Profile with Nsight Systems
nsys profile ./finite_field

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow CUDA best practices
  • Add NVTX markers for new operations
  • Include comprehensive error checking
  • Write unit tests for new functionality
  • Update documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • NVIDIA CUDA Toolkit documentation
  • "A Handbook of Integer Sequences" by Sloane
  • Cryptographic research community
  • CUDA programming community

Contact

Future Enhancements

  • Support for larger prime fields (64-bit, 128-bit)
  • Montgomery ladder for elliptic curve operations
  • Batch inversion algorithms
  • Multi-GPU support
  • Python bindings
  • Integration with cryptographic libraries
  • Benchmarking suite with multiple GPU architectures

Star this repository if you find it useful!

About

A high-performance CUDA implementation of finite field arithmetic operations over prime fields F_p, optimized for cryptographic and mathematical computing applications.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages