this post was submitted on 13 Jun 2023
3 points (100.0% liked)

Machine Learning - Theory | Research

74 readers
1 users here now

We follow Lemmy’s code of conduct.

Communities

Useful links

founded 1 year ago
MODERATORS
 

Title: AdANNS: A Framework for Adaptive Semantic Search

https://arxiv.org/pdf/2305.19435.pdf https://twitter.com/adityakusupati/status/1668295320445517824

Authors: Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain and Ali Farhadi

Word Count: Approximately 5690

Estimated Read Time: 18-20 minutes

Source Code: The source code is available at https://github.com/RAIVNLab/AdANNS

Summary: The paper proposes AdANNS, a framework that leverages adaptive representations to improve the accuracy-compute tradeoff for Approximate Nearest Neighbor Search (ANNS) systems. Traditional ANNS systems use rigid representations at each stage of construction and inference, which can be sub-optimal.

AdANNS utilizes Matryoshka Representations (MRs) which have nested representations of varying dimensionalities. This allows AdANNS to use lower-dimensional representations for clustering and quantization to optimize accuracy and compute, while using higher-dimensional representations for precise re-ranking when feasible.

AdANNS improves existing ANNS building blocks like inverted file index (AdANNS-IVF) and quantization (AdANNS-OPQ). It also combines the two to create better composite ANNS indices (AdANNS-IVFOPQ and AdANNS-DiskANN) that achieve higher accuracy with lower compute cost compared to baselines.

Overall, AdANNS shows gains of up to 1.5% in accuracy for the same compute cost over existing techniques, and matches accuracy while being up to 90x faster in deployment. It also generalizes across search structures, modalities and encoders.

Applicability to Large Language Models: AdANNS's techniques of using adaptive representations to optimize accuracy and compute could potentially be applied to Large Language Models. Some applications could be:

Using lower-dimensional word/token embeddings for clustering or quantization during training, and higher-dimensional embeddings for precise fine-tuning or inference. Dynamically switching between dimensionalities during inference based on available compute to achieve tradeoffs between latency and accuracy. Improving the efficiency of nearest neighbor searches within the model, which is useful for tasks like entity linking, knowledge retrieval, etc. Overall, the techniques presented in the paper are likely to be broadly applicable across different machine learning models that rely on accurate but efficient semantic search and retrieval.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here