Member-only story

Build A Custom AI Based ChatBot Using Langchain, Weviate, and Streamlit

Skanda Vivek
Towards AI
Published in
9 min readAug 9, 2023

--

As multiple organizations are racing to build customized LLMs, a common question I have been asked is — what are the tools out there to streamline this process?

In this article, I show you how to build a fully functional application for engaging in conversations through a chatbot built on top of your documents. This application employs the power of ChatGPT/GPT-4 (or any other large language model) to extract information from document data stored as embeddings in a vector database, and Langchain for prompt chaining. Here’s a preview:

Docs QA Bot | Skanda Vivek

So let’s dive in!

Building the app🏗️

First, create a new folder named `app` where the source code for the application resides. This acts as the entry point for the streamlit application. Then create folders that perform different tasks like extracting text from PDF, creating text embeddings, storing embeddings, and finally — chatting. The `app` directory looks like this:

App Directory Structure | Skanda Vivek

PDF Upload

Upload a PDF and extract text for further processing.

from PyPDF2 import PdfReader
import streamlit as st

@st.cache_data()
def extract_text(_file):
"""
:param file: the PDF file to extract
"""

content = ""
reader = PdfReader(_file)
number_of_pages = len(reader.pages)

# Scrape text from multiple pages
for i in range(number_of_pages):
page = reader.pages[i]
text = page.extract_text()
content = content + text

return content

Code Link:

https://github.com/LLM-Projects/docs-qa-bot/blob/main/app/extract.py

--

--

Published in Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Written by Skanda Vivek

Senior Data Scientist in NLP and advisor

Write a response