Skip to main content

/vector_stores/search - Search Vector Store

Search a vector store for relevant chunks based on a query and file attributes filter. This is useful for retrieval-augmented generation (RAG) use cases.

Overviewโ€‹

FeatureSupportedNotes
Cost Trackingโœ…Tracked per search operation
Loggingโœ…Works across all integrations
End-user Trackingโœ…
Support LLM ProvidersOpenAI, Azure OpenAI, Bedrock, Vertex RAG EngineFull vector stores API support across providers

Usageโ€‹

LiteLLM Python SDKโ€‹

Non-streaming exampleโ€‹

Search Vector Store - Basic
import litellm

response = await litellm.vector_stores.asearch(
vector_store_id="vs_abc123",
query="What is the capital of France?"
)
print(response)

Synchronous exampleโ€‹

Search Vector Store - Sync
import litellm

response = litellm.vector_stores.search(
vector_store_id="vs_abc123",
query="What is the capital of France?"
)
print(response)

LiteLLM Proxy Serverโ€‹

  1. Setup config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY

general_settings:
# Vector store settings can be added here if needed
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it with OpenAI SDK!
OpenAI SDK via LiteLLM Proxy
from openai import OpenAI

# Point OpenAI SDK to LiteLLM proxy
client = OpenAI(
base_url="http://0.0.0.0:4000",
api_key="sk-1234", # Your LiteLLM API key
)

search_results = client.beta.vector_stores.search(
vector_store_id="vs_abc123",
query="What is the capital of France?",
max_num_results=5
)
print(search_results)

Setting Up Vector Storesโ€‹

To use vector store search, configure your vector stores in the vector_store_registry. See the Vector Store Configuration Guide for:

  • Provider-specific configuration (Bedrock, OpenAI, Azure, Vertex AI, PG Vector)
  • Python SDK and Proxy setup examples
  • Authentication and credential management

Using Vector Stores with Chat Completionsโ€‹

Pass vector_store_ids in chat completion requests to automatically retrieve relevant context. See Using Vector Stores with Chat Completions for implementation details.