Title: AdANNS: A Framework for Adaptive Semantic Search
https://arxiv.org/pdf/2305.19435.pdf https://twitter.com/adityakusupati/status/1668295320445517824
Authors: Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain and Ali Farhadi
Word Count: Approximately 5690
Estimated Read Time: 18-20 minutes
Source Code: The source code is available at https://github.com/RAIVNLab/AdANNS
Summary: The paper proposes AdANNS, a framework that leverages adaptive representations to improve the accuracy-compute tradeoff for Approximate Nearest Neighbor Search (ANNS) systems. Traditional ANNS systems use rigid representations at each stage of construction and inference, which can be sub-optimal.
AdANNS utilizes Matryoshka Representations (MRs) which have nested representations of varying dimensionalities. This allows AdANNS to use lower-dimensional representations for clustering and quantization to optimize accuracy and compute, while using higher-dimensional representations for precise re-ranking when feasible.
AdANNS improves existing ANNS building blocks like inverted file index (AdANNS-IVF) and quantization (AdANNS-OPQ). It also combines the two to create better composite ANNS indices (AdANNS-IVFOPQ and AdANNS-DiskANN) that achieve higher accuracy with lower compute cost compared to baselines.
Overall, AdANNS shows gains of up to 1.5% in accuracy for the same compute cost over existing techniques, and matches accuracy while being up to 90x faster in deployment. It also generalizes across search structures, modalities and encoders.
Applicability to Large Language Models: AdANNS's techniques of using adaptive representations to optimize accuracy and compute could potentially be applied to Large Language Models. Some applications could be:
Using lower-dimensional word/token embeddings for clustering or quantization during training, and higher-dimensional embeddings for precise fine-tuning or inference. Dynamically switching between dimensionalities during inference based on available compute to achieve tradeoffs between latency and accuracy. Improving the efficiency of nearest neighbor searches within the model, which is useful for tasks like entity linking, knowledge retrieval, etc. Overall, the techniques presented in the paper are likely to be broadly applicable across different machine learning models that rely on accurate but efficient semantic search and retrieval.