SPTAG: A library for fast approximate nearest neighbor search

Last update: Jan 01, 2023

Overview

SPTAG: A library for fast approximate nearest neighbor search

SPTAG

SPTAG (Space Partition Tree And Graph) is a library for large scale vector approximate nearest neighbor search scenario released by Microsoft Research (MSR) and Microsoft Bing.

Introduction

This library assumes that the samples are represented as vectors and that the vectors can be compared by L2 distances or cosine distances. Vectors returned for a query vector are the vectors that have smallest L2 distance or cosine distances with the query vector.

SPTAG provides two methods: kd-tree and relative neighborhood graph (SPTAG-KDT) and balanced k-means tree and relative neighborhood graph (SPTAG-BKT). SPTAG-KDT is advantageous in index building cost, and SPTAG-BKT is advantageous in search accuracy in very high-dimensional data.

How it works

SPTAG is inspired by the NGS approach [WangL12]. It contains two basic modules: index builder and searcher. The RNG is built on the k-nearest neighborhood graph [WangWZTG12, WangWJLZZH14] for boosting the connectivity. Balanced k-means trees are used to replace kd-trees to avoid the inaccurate distance bound estimation in kd-trees for very high-dimensional vectors. The search begins with the search in the space partition trees for finding several seeds to start the search in the RNG. The searches in the trees and the graph are iteratively conducted.

Highlights

Fresh update: Support online vector deletion and insertion
Distributed serving: Search over multiple machines

Build

Requirements

swig >= 3.0
cmake >= 3.12.0
boost >= 1.67.0

Fast clone

set GIT_LFS_SKIP_SMUDGE=1
git clone https://github.com/microsoft/SPTAG

OR

git config --global filter.lfs.smudge "git-lfs smudge --skip -- %f"
git config --global filter.lfs.process "git-lfs filter-process --skip"

Install

For Linux:

mkdir build
cd build && cmake .. && make

It will generate a Release folder in the code directory which contains all the build targets.

For Windows:

mkdir build
cd build && cmake -A x64 ..

It will generate a SPTAGLib.sln in the build directory. Compiling the ALL_BUILD project in the Visual Studio (at least 2019) will generate a Release directory which contains all the build targets.

For detailed instructions on installing Windows binaries, please see here

Using Docker:

docker build -t sptag .

Will build a docker container with binaries in /app/Release/.

Verify

Run the SPTAGTest (or Test.exe) in the Release folder to verify all the tests have passed.

Usage

The detailed usage can be found in Get started. There is also an end-to-end tutorial for building vector search online service using Python Wrapper in Python Tutorial. The detailed parameters tunning can be found in Parameters.

References

Please cite SPTAG in your publications if it helps your research:

@inproceedings{ChenW21,
  author = {Qi Chen and 
            Bing Zhao and 
            Haidong Wang and 
            Mingqin Li and 
            Chuanjie Liu and 
            Zengzhong Li and 
            Mao Yang and 
            Jingdong Wang},
  title = {SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search},
  booktitle = {35th Conference on Neural Information Processing Systems (NeurIPS 2021)},
  year = {2021}
}

@manual{ChenW18,
  author    = {Qi Chen and
               Haidong Wang and
               Mingqin Li and 
               Gang Ren and
               Scarlett Li and
               Jeffery Zhu and
               Jason Li and
               Chuanjie Liu and
               Lintao Zhang and
               Jingdong Wang},
  title     = {SPTAG: A library for fast approximate nearest neighbor search},
  url       = {https://github.com/Microsoft/SPTAG},
  year      = {2018}
}

@inproceedings{WangL12,
  author    = {Jingdong Wang and
               Shipeng Li},
  title     = {Query-driven iterated neighborhood graph search for large scale indexing},
  booktitle = {ACM Multimedia 2012},
  pages     = {179--188},
  year      = {2012}
}

@inproceedings{WangWZTGL12,
  author    = {Jing Wang and
               Jingdong Wang and
               Gang Zeng and
               Zhuowen Tu and
               Rui Gan and
               Shipeng Li},
  title     = {Scalable k-NN graph construction for visual descriptors},
  booktitle = {CVPR 2012},
  pages     = {1106--1113},
  year      = {2012}
}

@article{WangWJLZZH14,
  author    = {Jingdong Wang and
               Naiyan Wang and
               You Jia and
               Jian Li and
               Gang Zeng and
               Hongbin Zha and
               Xian{-}Sheng Hua},
  title     = {Trinary-Projection Trees for Approximate Nearest Neighbor Search},
  journal   = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume    = {36},
  number    = {2},
  pages     = {388--403},
  year      = {2014
}

Contribute

This project welcomes contributions and suggestions from all the users.

We use GitHub issues for tracking suggestions and bugs.

License

The entire codebase is under MIT license

SPTAG: A library for fast approximate nearest neighbor search

Related tags

Overview

SPTAG: A library for fast approximate nearest neighbor search

SPTAG

Introduction

How it works

Highlights

Build

Requirements

Fast clone

Install

Verify

Usage

References

Contribute

License

Owner

Microsoft

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

CN24 is a complete semantic segmentation framework using fully convolutional networks

Genetic feature selection module for scikit-learn

Geometric Sensitivity Decomposition

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

A framework for joint super-resolution and image synthesis, without requiring real training data

Code for the paper "M2m: Imbalanced Classification via Major-to-minor Translation" (CVPR 2020)

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

A Topic Modeling toolbox

In the AI for TSP competition we try to solve optimization problems using machine learning.

Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

Unsupervised Pre-training for Person Re-identification (LUPerson)

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021

This repository contains code from the paper "TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network"