Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Overview

Pypi version Downloads Downloads Build Status Windows Build Status Join the chat at https://gitter.im/nmslib/Lobby

Non-Metric Space Library (NMSLIB)

Important Notes

  • NMSLIB is generic but fast, see the results of ANN benchmarks.
  • A standalone implementation of our fastest method HNSW also exists as a header-only library.
  • All the documentation (including using Python bindings and the query server, description of methods and spaces, building the library, etc) can be found on this page.
  • For generic questions/inquiries, please, use the Gitter chat: GitHub issues page is for bugs and feature requests.

Objectives

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.

The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.

NMSLIB is an extendible library, which means that is possible to add new search methods and distance functions. NMSLIB can be used directly in C++ and Python (via Python bindings). In addition, it is also possible to build a query server, which can be used from Java (or other languages supported by Apache Thrift (version 0.12). Java has a native client, i.e., it works on many platforms without requiring a C++ library to be installed.

Authors: Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak. With contributions from Ben Frederickson, Lawrence Cayton, Wei Dong, Avrelin Nikita, Dmitry Yashunin, Bob Poekert, @orgoro, @gregfriedland, Scott Gigante, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

Brief History

NMSLIB started as a personal project of Bilegsaikhan Naidan, who created the initial code base, the Python bindings, and participated in earlier evaluations. The most successful class of methods--neighborhood/proximity graphs--is represented by the Hierarchical Navigable Small World Graph (HNSW) due to Malkov and Yashunin (see the publications below). Other most useful methods, include a modification of the VP-tree due to Boytsov and Naidan (2013), a Neighborhood APProximation index (NAPP) proposed by Tellez et al. (2013) and improved by David Novak, as well as a vanilla uncompressed inverted file.

Credits and Citing

If you find this library useful, feel free to cite our SISAP paper [BibTex] as well as other papers listed in the end. One crucial contribution to cite is the fast Hierarchical Navigable World graph (HNSW) method [BibTex]. Please, also check out the stand-alone HNSW implementation by Yury Malkov, which is released as a header-only HNSWLib library.

License

The code is released under the Apache License Version 2.0 http://www.apache.org/licenses/. Older versions of the library include additional components, which have different licenses (but this does not apply to NMLISB 2.x):

Older versions of the library included the following components:

  • The LSHKIT, which is embedded in our library, is distributed under the GNU General Public License, see http://www.gnu.org/licenses/.
  • The k-NN graph construction algorithm NN-Descent due to Dong et al. 2011 (see the links below), which is also embedded in our library, seems to be covered by a free-to-use license, similar to Apache 2.
  • FALCONN library's licence is MIT.

Funding

Leonid Boytsov was supported by the Open Advancement of Question Answering Systems (OAQA) group and the following NSF grant #1618159: "Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond". Bileg was supported by the iAd Center.

Related Publications

Most important related papers are listed below in the chronological order:

Comments
  • Add support to build aarch64 wheels

    Add support to build aarch64 wheels

    Travis-CI allows for the creation of aarch64 wheels.

    Build: https://travis-ci.com/github/janaknat/nmslib/builds/205780637

    There are 8-9 failures when testing hnsw. Any suggestions on how to fix these? A majority of the failures are due to expected=0.99 and calculated=~0.98.

    Tagging @jmazanec15 since he added ARM compatibility.

    opened by janaknat 33
  • Speed up pip install

    Speed up pip install

    Currently pip installing is slow, since there is a compile step. Is there any way to speed it up? On my macbook:

    time pip install --no-cache nmslib
    Collecting nmslib
      Downloading https://files.pythonhosted.org/packages/e1/95/1f7c90d682b79398c5ee3f9296be8d2640fa41de24226bcf5473c801ada6/nmslib-1.7.3.6.tar.gz (255kB)
        100% |████████████████████████████████| 256kB 8.8MB/s 
    Requirement already satisfied: pybind11>=2.0 in .../virtualenv/python3.6/lib/python3.6/site-packages (from nmslib) (2.2.4)
    Requirement already satisfied: numpy in .../virtualenv/python3.6/lib/python3.6/site-packages (from nmslib) (1.15.4)
    Installing collected packages: nmslib
      Running setup.py install for nmslib ... -
    done
    Successfully installed nmslib-1.7.3.6
    
    real	3m11.091s
    

    would it be a good idea to provide pre-compiled wheels over pip? That would also simplify the process of finding the pybind11 headers (I had to do something special to copy them in for pip when running with a --target dir)

    opened by matthen 33
  • Can't load index?

    Can't load index?

    Hi, this might me more of a question than problem in the library. I have created an index with NAPP and saved it using saveIndex. However when I load it with loadIndex I get the following error:

    Check failed: A previously saved index is apparently used with a different data set, a different data set split, and/or a different gold standard file! (detected an object index >= #of data points

    Am I doing something wrong?

    Thanks for the help.

    EDIT: The message doesn't make sense to me because I'm not "using the index with a data set", I'm just loading it.

    EDIT2: I'm using the Python interface.

    enhancement 
    opened by zommerfelds 31
  • Custom Metrics

    Custom Metrics

    Hello,

    I wanted to perform NN search on a dataset of genomes. For this task, the distance between 2 datapoints is calculated by a custom script? Is there I can incorporate this without having to create the entire NN search algorithm myself and only modify some parts of your code?

    opened by Chokerino 30
  • Python process crashes: 'pybind11::error_already_set'

    Python process crashes: 'pybind11::error_already_set'

    nmslib is the only lib in our project that relies on pybind11 and we could narrow it down to the Dask nodes that use nmslib. When we disable the nodes that use nmslib it doesn't crash.

    terminate called after throwing an instance of 'pybind11::error_already_set'
      what():  TypeError: '>=' not supported between instances of 'int' and 'NoneType'
    
    At:
      /opt/conda/envs/jobnet-env/lib/python3.6/logging/__init__.py(1546): isEnabledFor
      /opt/conda/envs/jobnet-env/lib/python3.6/logging/__init__.py(1293): debug
    
    /usr/local/bin/entrypoint.sh: line 46:    21 Aborted                 (core dumped) python scripts/cli.py "${@:2}"```
    

    Version:

    - nmslib~=1.7.2
    - pybind11=2.2
    
    opened by lukin0110 28
  • Make failed in linking Boost library

    Make failed in linking Boost library

    Hello,

    I am facing an error in this step:

    [ 75%] Linking CXX executable ../release/experiment

    All of errors liked that:

    undefined reference to `boost::program_options:

    I install latest libraries version and checked that libboost 1.58 is compatible with g++ 4.9. I think maybe it related with C++11, however It returns error in both g++ 4.9 and 4.7.

    This is my system information:

    -- The C compiler identification is GNU 4.9.3 -- The CXX compiler identification is GNU 4.9.3 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Build type: Release -- GSL using gsl-config /usr/bin/gsl-config -- Using GSL from /usr -- Found GSL. -- Found Eigen3: /usr/include/eigen3 (Required is at least version "3") -- Found Eigen3. -- Boost version: 1.58.0 -- Found the following Boost libraries: -- system -- filesystem -- program_options -- Found BOOST.

    I also install Clang and LLDB 3.6. I tried search many possible solution but can not fix that :(.

    opened by nguyenv7 26
  • Python wrapper crashes while retrieving nearest neighbors when M>100

    Python wrapper crashes while retrieving nearest neighbors when M>100

    Hi, I am working on a problem where I need to retrieve ~500 nearest neighbors out of a million points. I am using the python wrapper for HNSW method. The code works perfectly well if I set the value of parameter M <=100 but setting it greater than 100, the code crashes during retrieving nearest neighbors (no issues while building the model) with an "invalid next size" error. Any idea why this might be happening? Thanks Himanshu

    bug 
    opened by hjain689 25
  • Incorrect distances returned for all-zero query

    Incorrect distances returned for all-zero query

    An all-zero query vector will result in NMSLib incorrectly reporting a distance of zero for its nearest neighbours (see example below). Is this related to #187? Is there a suggested workaround?

    # Training set (CSR sparse matrix)
    X.todense()
    # Out:
    # matrix([[4., 2., 3., 1., 0., 0., 0., 0., 0.],
    #         [2., 1., 0., 0., 3., 0., 1., 2., 1.],
    #         [4., 2., 0., 0., 3., 1., 0., 0., 0.]], dtype=float32)
    
    # Query vector (CSR sparse matrix)
    r.todense()
    # Out:
    # matrix([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
    
    # Train and query
    import nmslib
    index = nmslib.init(
        method='hnsw',
        space='cosinesimil_sparse_fast',
        data_type=nmslib.DataType.SPARSE_VECTOR,
        dtype=nmslib.DistType.FLOAT)
    index.addDataPointBatch(X)
    index.createIndex()
    index.knnQueryBatch(r, k=3)
    # Out:
    # [(array([2, 1, 0], dtype=int32), array([0., 0., 0.], dtype=float32))]
    
    # Note that distances are all 0, which is incorrect!
    # Same result for dense training & query vectors.
    
    bug 
    opened by lsorber 24
  • Jaccard to method HSNW for sparse features

    Jaccard to method HSNW for sparse features

    Hi,

    I want to know if HSNW provides Jaccard (similarity or distance, does not matter), besides cosine, for sparse features. There are scenarios in which Jaccard outperforms.

    Python notebooks provided show the following metrices: l2, l2sqr_sift, cosinesimil_sparse.

    According to space_sparse_scalar.h, the following metrices seem to be implemented, or in preparation, to sparse features: #define SPACE_SPARSE_COSINE_SIMILARITY "cosinesimil_sparse" #define SPACE_SPARSE_ANGULAR_DISTANCE "angulardist_sparse" #define SPACE_SPARSE_NEGATIVE_SCALAR "negdotprod_sparse" #define SPACE_SPARSE_QUERY_NORM_NEGATIVE_SCALAR "querynorm_negdotprod_sparse"

    What does each of these metrices mean? I also saw cosinesimil_sparse_fast in a few files. What is it, and how is it compared to cosinesimil_sparse? Is it ready for use?

    I can provide a Jaccard implementation for sparse vectors, given 2 vectors implemented as hash tables, but I haven't found out how to integrate it to the code. It would also be preferable to check which metrices are already available. The closest clue I got was to expand the following files: distcomp_scalar.cc, hnsw.cc and hnsw_distfunc_opt.cc, but I am not sure which steps to make. I saw some mentions to Jaccard in space_sparse_jaccard.cc and distcomp.h. But no examples are given.

    Thanks in advance.

    opened by icarocd 24
  • pybind11.h not found when installing using pip

    pybind11.h not found when installing using pip

    I'm trying to install python bindings on Ubuntu 16.04 machine:

    $ pip3 install pybind11 nmslib
    Collecting nmslib
      Using cached https://files.pythonhosted.org/packages/de/eb/28b2060bb1750426c5618e3ad6ce830ac3cfd56cb3eccfb799e52d6064db/nmslib-1.7.2.tar.gz
    Requirement already satisfied: pybind11>=2.0 in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (2.2.2)
    Requirement already satisfied: numpy in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (1.14.2)
    Building wheels for collected packages: nmslib
      Running setup.py bdist_wheel for nmslib ... error
      Complete output from command /homes/alexandrov/.virtualenvs/pytorch/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-0y71oxa4/nmslib/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-916r1rr9 --python-tag cp35:
      running bdist_wheel
      running build
      running build_ext
      creating tmp
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpwekdswov.cpp -o tmp/tmpwekdswov.o -std=c++14
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpyyphh022.cpp -o tmp/tmpyyphh022.o -fvisibility=hidden
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      building 'nmslib' extension
      creating build
      creating build/temp.linux-x86_64-3.5
      creating build/temp.linux-x86_64-3.5/nmslib
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/method
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/space
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I./nmslib/similarity_search/include -Iinclude -Iinclude -I/homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c nmslib.cc -o build/temp.linux-x86_64-3.5/nmslib.o -O3 -march=native -fopenmp -DVERSION_INFO="1.7.2" -std=c++14 -fvisibility=hidden
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      nmslib.cc:16:31: fatal error: pybind11/pybind11.h: No such file or directory
      compilation terminated.
      error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    

    Clearly, pybind11 headers were not installed on my machine. This library is not packaged for apt-get (at least not for Ubuntu 16.04), so I needed to manually install from source.

    Would be nice if nmslib install script took care of this.

    opened by taketwo 23
  • Optimized index raises RuntimeError on load when saved with `negdotprod` space

    Optimized index raises RuntimeError on load when saved with `negdotprod` space

    Basically, this is what I am trying to do

    import nmslib
    
    space = 'negdotprod'
    
    vectors = [[1, 2], [3, 4], [5, 6]]
    
    index = nmslib.init(space=space, method='hnsw')
    index.addDataPointBatch(vectors)
    index.createIndex(
        {'M': 15, 'efConstruction': 200, 'skip_optimized_index': 0, 'post': 0}
    )
    index.saveIndex('test.index')
    
    new_index = nmslib.init(space=space, method='hnsw')
    new_index.loadIndex('test.index')
    

    and it raises

    Check failed: totalElementsStored_ == this->data_.size() The number of stored elements 3 doesn't match the number of data points ! Did you forget to re-load data?
    Traceback (most recent call last):
      File "8.py", line 15, in <module>
        new_index.loadIndex('test.index')
    RuntimeError: Check failed: The number of stored elements 3 doesn't match the number of data points ! Did you forget to re-load data?
    

    If I change space variable to cosinesimil, it works just fine. It seems that data points are not stored, even though hnsw method with skip_optimized_index=0 is used.

    opened by chomechome 22
  • Unable to pip install nmslib, including historic versions

    Unable to pip install nmslib, including historic versions

    Hey sorry to bother you,

    I've been trying to download scispacy via pip on windows 10 using python 3.10.0 today and it keeps failing due to errors about nmslib I've tried pip installing nmslib versions: 1.7.3.6 1.8 2.1.1

    None of them have worked though, curiously. I've had a long look around scispacys github and yours but nothing I've read has given me any solutions.

    I've also flagged it with scispacy on their github. Anyway I have no idea what's going on but just thought I'd let you know. Cheers Kind regards, Chris

    opened by Cbezz 5
  • Strict typing is needed: Using wrong input can cause distances to be all one, e.g., with cosinesimil_sparse/HNSW when calling knnQueryBatch on a dense array

    Strict typing is needed: Using wrong input can cause distances to be all one, e.g., with cosinesimil_sparse/HNSW when calling knnQueryBatch on a dense array

    Hey, I'm trying to use nmslib's HNSW with a csr_matrix containing sparse vectors.

    Creating the index works fine, adding the data and setting query time params too:

        items = ["foo is a kind of thing", "bar is another one", "this bar is a real one!", "I prefer to use a foo"] # etc, len=3000
        similar_items_index = nmslib.init(
            space="cosinesimil_sparse",
            method="hnsw",
            data_type=nmslib.DataType.SPARSE_VECTOR,
            dtype=nmslib.DistType.FLOAT,
        )
        vectorizer = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
        embeddings: csr_matrix = vectorizer.fit_transform(items)
        similar_items_index.addDataPointBatch(embeddings)
        similar_items_index.createIndex({"M": 128, "efConstruction": 32, "post": 2}, print_progress=False)
        similar_items_index.setQueryTimeParams({"ef": 512})
    

    But when I search with knnQueryBatch, all the returned distances are equal to 1:

    similar_items_index.knnQueryBatch([query_embedding], 5)[0]
    

    -> Knn results: ids, with distances all set to 1

    Am I missing something in the proper usage of HNSW with sparse vector data?

    Setup for reproduction
    • This uses the text-similarity data from Kaggle, downloaded in /tmp/. Any other text dataset should be fine, as computing similarity scores is not required to see the problem with returned distances.
    
    import csv
    from typing import Dict
    
    import nmslib
    import numpy as np
    from implicit.evaluation import csr_matrix
    from sklearn.feature_extraction.text import TfidfVectorizer
    
    CSV_PATH = "/tmp/data/"
    
    
    def main():
        similar_items_index = nmslib.init(
            space="cosinesimil_sparse",
            method="hnsw",
            data_type=nmslib.DataType.SPARSE_VECTOR,
            dtype=nmslib.DistType.FLOAT,
        )
        items = set()
        ids: Dict[str, int] = {}
        rids: Dict[int, str] = {}
        similarities = {}
        for file in [
            f"{CSV_PATH}/similarity-test.csv",
            f"{CSV_PATH}/similarity-train.csv",
        ]:
            with open(file) as f:
                reader = csv.reader(f, delimiter=",", quotechar="|")
                header = next(reader)
                for i, l in enumerate(reader):
                    desc_x = l[header.index("description_x")]
                    desc_y = l[header.index("description_y")]
                    similar = bool(l[header.index("same_security")])
                    id = len(items)
                    if desc_x not in items:
                        items.add(desc_x)
                        ids[desc_x] = id
                        rids[id] = desc_x
                        id_x = id
                        id += 1
                    else:
                        id_x = ids[desc_x]
                    if desc_y not in items:
                        items.add(desc_y)
                        ids[desc_y] = id
                        rids[id] = desc_y
                        id_y = id
                        id += 1
                    else:
                        id_y = ids[desc_y]
                    if similar:
                        similarities[id_x] = id_y
                        similarities[id_y] = id_x
             print(f"Loaded {len(items)}, total {len(similarities)/2} pairs of similar queries.")
             vectorizer = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
        embeddings: csr_matrix = vectorizer.fit_transform(items)
        print("Embedded items, adding datapoints..")
        similar_items_index.addDataPointBatch(embeddings)
        print("Creating index..")
        similar_items_index.createIndex({"M": 128, "efConstruction": 32, "post": 2}, print_progress=False)
        print("Setting index query params..")
        similar_items_index.setQueryTimeParams({"ef": 512})
        print("Searching...")
        score = 0
        total_similar = 0
        for item_id, item in enumerate(items):
            query_embedding = vectorizer.transform([item]).getrow(0).toarray()
            top_50, distances = similar_items_index.knnQueryBatch([query_embedding], 50)[0]
            top_50_texts = [rids[t] for t in top_50]
            try:
                expected = similarities[item_id]
                expected_text = rids[expected]
                if expected:
                    score += 1 if expected in top_50 else 0
            except KeyError:
                continue  # No similar noted on this item.
            total_similar += 1
        print(
            f"After querying {len(items)} of which {total_similar}, we found the similar item in the top50 {score} times."
        )
    
    
    if __name__ == "__main__":
        main()
    
    opened by PLNech 6
  • More encompassing approach for Mac M1 chips

    More encompassing approach for Mac M1 chips

    On a Mac architecture, platform.processor may return i386 even when on a Mac M1. The code below should be more accurate. See stack overflow comment, another stack overflow comment and stack overflow post for some more information / validation that the uname approach is more all encompassing.

    I was personally running into this problem and the following fix solved it for me.

    This PR is a slightly edited solution to what is contained in https://github.com/nmslib/nmslib/pull/485 with many thanks to @netj for getting this started.

    opened by JewlsIOB 3
  • Calling setQueryTimeParams results in a SIGSEGV

    Calling setQueryTimeParams results in a SIGSEGV

    Hi there! Trying to perform knnQuery on an indexed csr_matrix, I got the issue reported in #480 from this code:

            model = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
            embeddings = model.fit_transform(corpus_tfidf)
            logger.info(f"Creating vector index from a {len(corpus_tfidf)} corpus embedded as {embeddings.shape}...")
            index = nmslib.init(method="hnsw", space="cosinesimil_sparse", data_type=nmslib.DataType.SPARSE_VECTOR, dtype=nmslib.DistType.FLOAT)
            logger.info("Adding datapoints to index...")
            index.addDataPointBatch(embeddings)
            logger.info("Creating final index...")
            index.createIndex()
    
            logger.info(f"Search neightbors for first embedding {embeddings[0]})
            index.knnQuery(embeddings[0])
    

    As described in #480, this results in an IndexError: tuple index out of range.

    When trying to apply the index.setQueryTimeParams({'efSearch': efS, 'algoType': 'old'}) workaround mentioned in another issue , it results in a segmentation fault.

    I can reproduce it with the following minimal example, looks like even without arguments the call errors:

    index = nmslib.init(method="hnsw", space="cosinesimil_sparse", data_type=nmslib.DataType.SPARSE_VECTOR, dtype=nmslib.DistType.FLOAT)
    print("Setting index queryParams...")
    index.setQueryTimeParams()
    print("Adding datapoints to index...")
    

    ->

    Setting index queryParams...
    Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
    

    Env info

    • python -V -> Python 3.7.11
    • pip freeze | grep nmslib -> nmslib==2.1.1
    opened by PLNech 3
  • NMSLIB doesn't work on Windows 11

    NMSLIB doesn't work on Windows 11

    Hello,

    We use nmslib as default engine for TensorFlow Similarity due to it's broad compatibility with various OSes. We got multiple reports, and I was able to confirm it, that nmslib don't install on Windows 11, potentially related to the issue #498.

    Do you have any idea if/when you will be able to take a look at this? With the increased adoption of Win11 it become problematic for us.

    Thanks :)

    opened by ebursztein 15
Releases(v2.1.1)
  • v2.1.1(Feb 3, 2021)

    Note: We unfortunately had deployment issues. As a result we had to delete several versions between 2.0.6 and 2.1.1. If you installed one of these versions, please, delete them and install a more recent version (>=2.1.1).

    The current build focuses on:

    1. Providing more efficient ("optimized") implementations for spaces: negdotprod, l1, linf.
    2. Binaries for ARM 64 (aarch64).
    Source code(tar.gz)
    Source code(zip)
  • v2.0.6(Apr 16, 2020)

  • v2.0.5(Nov 7, 2019)

    The main objective of this release to provide binary wheels. For compatibility reasons, we need to stick to basic SSE2 instructions. However, when the Python library is being imported, it prints a message suggesting that a more efficient version can be installed from sources (and tells how to do this).

    Furthermore, this release removes a lot of old code, which speeds up compilation by 70%:

    1. Non-performing methods
    2. Double-indices

    This is a step towards more lightweight NMSLIB library.

    Source code(tar.gz)
    Source code(zip)
  • v1.8.1(Jun 23, 2019)

  • v1.8(Jun 6, 2019)

    This is a clean-up release focusing on several important issues:

    1. Fixing a bug with knnQuery #370
    2. Added a possibility to save/load data efficiently from the Python bindings (and the query server) #356 Python notebooks are updated accordingly
    3. We have bit Jaccard space (many thanks @gregfriedland)
    4. Upgraded the query server to use a recent Apache Thrift
    5. Importantly the documentation is reorganized quite a bit: 5.1 There is now a single entry point for all the docs 5.2 Most of the docs are now online and only fairly technical description of search spaces and methods is in the PDF manual.
    Source code(tar.gz)
    Source code(zip)
  • v1.7.3.6(Oct 4, 2018)

  • v1.7.3.4(Aug 6, 2018)

  • v1.7.3.2(Jul 13, 2018)

  • v1.7.3.1(Jul 9, 2018)

  • v1.7.2(Feb 20, 2018)

    1. Improving concurrency in Python (preventing hanging in a certain situation https://github.com/searchivarius/nmslib/issues/291)
    2. Improving ParallelFor : passing thread ID and not starting threads in a single-thread mode.
    Source code(tar.gz)
    Source code(zip)
  • v1.7(Feb 4, 2018)

  • v1.6(Dec 15, 2016)

    Here are the list of changes for the version 1.6 (manual isn't updated yet):

    We especially thank the following people for the fixes:

    • Bileg Naidan (@bileg)
    • Bob Poekert (@bobpoekert)
    • @orgoro
    1. We simplified the build by excluding the code that required 3rd party code from the core library. In other words, the core library does not have any 3rd party dependencies (not even boost). To build the full version of library you have to run cmake as follows: cmake . -DWITH_EXTRAS=1
    2. It should now be possible to build on MAC.
    3. We improve Python bindings (thanks to @bileg) and their installation process (thanks to @bobpoekert):
      1. We merged our generic and vector bindings into a single module. We upgraded to a more standard installation process via distutils. You can run: python setup.py build and then sudo python setup.py install.
      2. We improved our support for sparse spaces: you can pass data in the form of a numpy sparse array!
      3. There are now batch multi-threaded querying and addition of data.
      4. addDataPoint* functions return a position of an inserted entry. This can be useful if you use function getDataPoint
      5. For examples of using Python API, please, see *.py files in the folder python_bindings.
      6. Note that to execute unit tests you need: python-numpy, python-scipy, and python-pandas.
    4. Because we got rid of boost, we, unfortunately, do not support command-line options WITHOUT arguments. Instead, you have pass values 0 or 1.
    5. However, the utility experiment (experiment.exe) now accepts the option recallOnly. If this option has argument 1, then the only effectiveness metric computed is recall. This is useful for evaluation of HNSW, because (for efficiency reasons) HNSW does not return proper distance values (e.g., for L2 it's a squared distance, not the original one). This makes it impossible to compute effectiveness metrics other than recall (returning wrong distance values would also lead to experiment terminating with an error message).
    6. Additional spaces:
      1. negdotprod_sparse: negative inner (dot) product. This is a sparse space.
      2. querynorm_negdotprod_sparse: query-normalized inner (dot) product, which is the dot product divded by the query norm.
      3. renyi_diverg: Renyi divergence. It has the parameter alpha.
      4. ab_diverg: α-β-divergence. It has two parameters: alpha and beta.
    7. Additional search methods:
      1. simple_invindx: A classical inverted index with a document-at-a-time processing (via a prirority queue). It doesn't have parameters, but works only with the sparse space negdotprod_sparse.
      2. falconn: we ported (created a wrapper for) a June 2016's version of FALCONN library.
        1. Unlike the original implementation, our wrapper works directly with sparse vector spaces as well as with dense vector spaces.
        2. However, our wrapper has to duplicate data twice: so this method is useful mostly as a benchmark.
        3. Our wrapper directly supports a data centering trick, which can boost performance sometimes.
        4. Most parameters (hash_family, cross_polytope, hyperplane, storage_hash_table, num_hash_bits, num_hash_tables, num_probes, num_rotations, seed, feature_hashing_dimension) merely map to FALCONN parameters.
        5. Setting additional parameters norm_data and center_data tells us to center and normalize data. Our implementation of the centering (which is done unfortunately before the hashing trick is applied) for sparse data is horribly inefficient, so we wouldn't recommend using it. Besides, it doesn't seem to improve results. Just in case, the number of sprase dimensions used for centering is controlled by the parameter max_sparse_dim_to_center.
        6. Our FALCONN wrapper would normally use the distance provided by NMSLIB, but you can force using FALCONN's distance function implementation by setting: use_falconn_dist to 1.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Jul 11, 2016)

  • v1.5.2(Jul 2, 2016)

  • v1.5.1(Jun 1, 2016)

  • v1.5(May 20, 2016)

    1. A new efficient method: a hierarchical (navigable) small-world graph (HNSW), contributed by Yury Malkov (@yurymalkov). Works with g++, Visual Studio, Intel Compiler, but doesn't work with Clang yet.
    2. A query server, which can have clients in C++, Java, Python, and other languages supported by Apache Thrift
    3. Python bindings for vector and non-vector spaces
    4. Improved performance of two core methods SW-graph and NAPP
    5. Better handling of the gold standard data in the benchmarking utility experiment
    6. Updated API that permits search methods to serialize indices
    7. Improved documentation (e.g., we added tuning guidelines for best methods)
    Source code(tar.gz)
    Source code(zip)
Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

AOS: Airborne Optical Sectioning Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique that employs manned or unmanned airc

JKU Linz, Institute of Computer Graphics 39 Dec 09, 2022
Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

CorrNet This project provides the code and results for 'Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation'

Gongyang Li 13 Nov 03, 2022
Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

Flybird | English Version 行为驱动开发(Behavior-driven development,缩写BDD),是一种软件过程的思想或者

Ctrip, Inc. 706 Dec 30, 2022
The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble

Wordle RL The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble I know there are more deterministic

Aditya Arora 3 Feb 22, 2022
The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

A mini-scale reproduction code of the AlphaStar program. Note: the original AlphaStar is the AI proposed by DeepMind to play StarCraft II.

Ruo-Ze Liu 216 Jan 04, 2023
Deep Learning (with PyTorch)

Deep Learning (with PyTorch) This notebook repository now has a companion website, where all the course material can be found in video and textual for

Alfredo Canziani 6.2k Jan 07, 2023
Cycle Consistent Adversarial Domain Adaptation (CyCADA)

Cycle Consistent Adversarial Domain Adaptation (CyCADA) A pytorch implementation of CyCADA. If you use this code in your research please consider citi

Hyunwoo Ko 2 Jan 10, 2022
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily Abstract Graph Neural Networks (GNNs) are widely used on a

10 Dec 20, 2022
Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information"

Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information" Notes I probabl

Berkeley Expert System Technologies Lab 0 Jul 01, 2021
Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22) Ok-Topk is a scheme for distributed training with sparse gradients

Shigang Li 9 Oct 29, 2022
This repository contains the source code for the paper First Order Motion Model for Image Animation

!!! Check out our new paper and framework improved for articulated objects First Order Motion Model for Image Animation This repository contains the s

13k Jan 09, 2023
A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

JDAI-CV 1.6k Jan 06, 2023
RRL: Resnet as representation for Reinforcement Learning

Resnet as representation for Reinforcement Learning (RRL) is a simple yet effective approach for training behaviors directly from visual inputs. We demonstrate that features learned by standard image

Meta Research 21 Dec 07, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 03, 2023
Memory-Augmented Model Predictive Control

Memory-Augmented Model Predictive Control This repository hosts the source code for the journal article "Composing MPC with LQR and Neural Networks fo

Fangyu Wu 1 Jun 19, 2022
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

Yunan Zhu 23 Nov 05, 2022
Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On

Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On [Project website] [Dataset] [Video] Abstract We propose a new g

71 Dec 24, 2022
Deep Learning Head Pose Estimation using PyTorch.

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

Nataniel Ruiz 1.3k Dec 26, 2022
The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

ECIR Reproducibility Paper: Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study This code corresponds to the reproducibility

ielab 3 Mar 31, 2022
This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking".

SCT This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking" The spatial-channel Transformer (SCT) enhan

Intelligent Vision for Robotics in Complex Environment 27 Nov 23, 2022