A utility for benchmarking NumPy performance, particularly useful for testing GPU and Neural Engine acceleration on Apple Silicon.
numpybench
is a benchmarking tool designed to test NumPy performance by running a series of matrix operations. It's particularly useful for:
- Verifying if NumPy is properly utilizing hardware acceleration (GPU/Neural Engine on Apple Silicon)
- Comparing performance between different NumPy builds
- Testing the impact of the Accelerate Framework on macOS
- Measuring performance improvements from hardware optimizations
When NumPy is properly compiled and linked against the Accelerate Framework on Apple Silicon, you can expect to see an 8-9x improvement in performance compared to pre-compiled binaries.
numpybench [-o OUTPUT] [-s] [-c COUNT]
-o, --output
: Optional output file for detailed results-s, --skip-tests
: Skip time-consuming tests-c, --count
: Number of iterations for each test (default: 1)
The script runs the following matrix operations on 2500x2500 matrices:
- Matrix multiplication (np.dot)
- Matrix transposition (np.transpose)
- Eigenvalue computation (np.linalg.eigvals)
- Fourier transformation (np.fft.fft)
- Summation (np.sum)
The output includes:
- Timestamp for each operation
- NumPy configuration details
- Execution time for each matrix operation
- Virtual environment information
-
Basic benchmark:
numpybench
-
Run tests with 5 iterations:
numpybench -c 5
-
Save results to a file:
numpybench -o results.txt
-
Skip time-consuming tests:
numpybench -s
On Apple Silicon:
- With standard NumPy: Baseline performance
- With Accelerate Framework: 8-9x performance improvement
- Key operations like matrix multiplication and FFT show the most significant improvements
To get maximum performance on Apple Silicon, NumPy should be compiled with:
CFLAGS="-I/System/Library/Frameworks/vecLib.framework/Headers -Wl,-framework -Wl,Accelerate -framework Accelerate" pip install numpy==1.26.* --force-reinstall --no-deps --no-cache --no-binary :all: --no-build-isolation --compile -Csetup-args=-Dblas=accelerate -Csetup-args=-Dlapack=accelerate -Csetup-args=-Duse-ilp64=true
- Uses NumPy's random number generation for test matrices
- Measures wall clock time for operations
- Includes NumPy configuration information in output
- Supports virtual environment detection