/vector_stores/search - Search Vector Store

Search a vector store for relevant chunks based on a query and file attributes filter. This is useful for retrieval-augmented generation (RAG) use cases.

Overview

Feature	Supported	Notes
Cost Tracking	✅	Tracked per search operation
Logging	✅	Works across all integrations
End-user Tracking	✅
Support LLM Providers	OpenAI, Azure OpenAI, Bedrock, Vertex RAG Engine	Full vector stores API support across providers

Usage

LiteLLM Python SDK

Basic Usage
Advanced Configuration
Multiple Queries
OpenAI Provider

Non-streaming example

Search Vector Store - Basic
import litellm

response = await litellm.vector_stores.asearch(
    vector_store_id="vs_abc123",
    query="What is the capital of France?"
)
print(response)

Synchronous example

Search Vector Store - Sync
import litellm

response = litellm.vector_stores.search(
    vector_store_id="vs_abc123",
    query="What is the capital of France?"
)
print(response)

With filters and ranking options

Search Vector Store - Advanced
import litellm

response = await litellm.vector_stores.asearch(
    vector_store_id="vs_abc123",
    query="What is the capital of France?",
    filters={
        "file_ids": ["file-abc123", "file-def456"]
    },
    max_num_results=5,
    ranking_options={
        "score_threshold": 0.7
    },
    rewrite_query=True
)
print(response)

Searching with multiple queries

Search Vector Store - Multiple Queries
import litellm

response = await litellm.vector_stores.asearch(
    vector_store_id="vs_abc123",
    query=[
        "What is the capital of France?",
        "What is the population of Paris?"
    ],
    max_num_results=10
)
print(response)

Using OpenAI provider explicitly

Search Vector Store - OpenAI Provider
import litellm
import os

# Set API key
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

response = await litellm.vector_stores.asearch(
    vector_store_id="vs_abc123",
    query="What is the capital of France?",
    custom_llm_provider="openai"
)
print(response)

LiteLLM Proxy Server

Setup & Usage
curl

Setup config.yaml

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

general_settings:
  # Vector store settings can be added here if needed

Start proxy

litellm --config /path/to/config.yaml

Test it with OpenAI SDK!

OpenAI SDK via LiteLLM Proxy
from openai import OpenAI

# Point OpenAI SDK to LiteLLM proxy
client = OpenAI(
    base_url="http://0.0.0.0:4000",
    api_key="sk-1234",  # Your LiteLLM API key
)

search_results = client.beta.vector_stores.search(
    vector_store_id="vs_abc123",
    query="What is the capital of France?",
    max_num_results=5
)
print(search_results)

Search Vector Store via curl
curl -L -X POST 'http://0.0.0.0:4000/v1/vector_stores/vs_abc123/search' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "query": "What is the capital of France?",
  "filters": {
    "file_ids": ["file-abc123", "file-def456"]
  },
  "max_num_results": 5,
  "ranking_options": {
    "score_threshold": 0.7
  },
  "rewrite_query": true
}'

Setting Up Vector Stores

To use vector store search, configure your vector stores in the vector_store_registry. See the Vector Store Configuration Guide for:

Provider-specific configuration (Bedrock, OpenAI, Azure, Vertex AI, PG Vector)
Python SDK and Proxy setup examples
Authentication and credential management

Using Vector Stores with Chat Completions

Pass vector_store_ids in chat completion requests to automatically retrieve relevant context. See Using Vector Stores with Chat Completions for implementation details.

Overview​

Usage​

LiteLLM Python SDK​

Non-streaming example​

Synchronous example​

With filters and ranking options​

Searching with multiple queries​

Using OpenAI provider explicitly​

LiteLLM Proxy Server​

Setting Up Vector Stores​

Using Vector Stores with Chat Completions​