ParaGraph: Accelerating Graph Indexing through GPU-CPU Parallel Processing for Efficient Cross-modal ANNS
This repository includes the codes for the SIGMOD's Workshop DaMoN 2025 paper ParaGraph.
The master
branch contains the codebase for the ParaGraph paper.
This section guides you through setting up the project and reproducing the experiments presented in our paper.
All ~bin
data files adhere to the following structure (consistent with the Big-ANN competition format):
- Number of vectors:
uint32
(4 bytes) - Dimension of vectors:
uint32
(4 bytes) - Vector data: Sequentially listed vector components.
- You can obtain the required datasets from the RoarGraph repository.
- We utilize Python scripts for the necessary data transformations.
- For the index construction process, we exclusively use the base vector set (
base
) and the corresponding ground truth data (gt
).
The base vector data (base_data
) is structured as an num
x dim
matrix, where:
num
: Signifies the total number of vectors.dim
: Denotes the dimensionality of each vector.
Note: These two parameters,
num
anddim
, must be pre-defined within the source code.
To ensure efficient GPU memory management, the handling of the ground truth (gt
) data is modified. Specifically, the number of ground truth neighbors recorded for each query vector (often denoted as gt_num
or top_k
) is limited or adjusted to 128.
- CMake
v3.24
or newer - g++
v9.4
or newer - CPU with AVX-512 support
- Python
v3.8
or newer - Required Python packages:
numpy
- NVIDIA GPU
- CUDA Toolkit
- cuDNN
Install the following system libraries:
sudo apt update
sudo apt install -y libaio-dev libgoogle-perftools-dev clang-format libboost-all-dev libmkl-full-dev
You can refer to additional resources to configure the GPU environment.
git clone https://github.com/9p6p/ParaGraph.git
cd ParaGraph
Follow these steps to compile the project:
mkdir -p build
cd build
cmake .. && make -j
To build the ParaGraph index, run the provided script:
bash run_paragraph.sh
This project is licensed under the MIT License.
For questions or inquiries, feel free to reach out to me at dev@alayadb.ai