Semantic Caching in Generative AI Chatbots

Reduce LLM latency and cost by over 95% using OpenAI, LiteLLM, Qdrant, and Sentence Transformers!

Marie Stephen Leo
Towards AI
Published in
7 min readMar 11, 2024

--

Image generated by Author using Dall E 3

Latency and costs are significant challenges with LLM-based chatbots today. The problem is even more pronounced in Retrieval Augmented Generation (RAG) agents, where we must make multiple calls to the LLM before returning an…

--

--