SaaS-based Engineering Tool Onboarding with AI Assistance

A Prototype Solution Exploring Improving Engineering Tool Onboarding Experience by Leveraging LLM and Multi-agents

Jun Li

Published in

Towards AI

19 min readMay 3, 2024

Table of content

· Overview
· Application design
· Implement the application
∘ Define graph nodes
∘ Define graph edges
∘ Memory and states
∘ Add new services
∘ Chat UI
∘ Model choices
· Pathway to production
· Conclusion
· Complete code and demo

Overview

Nowadays, companies and enterprises tend to use SaaS-based engineering tools to equip their engineering teams for diverse aspects, including daily engineering work, security, and product releases. Big organizations usually have standardized processes for onboarding their teams to the selected SaaS platforms and team training and education on using the services.

Platform teams in organizations usually use forms and pipelines to automate git repos or JIRA support tickets for internal teams to onboard to SaaS services. Then, the user teams need to participate in standard training sessions, go through a list of learning objects, and read a series of internal or external documentation to prepare themselves for using the engineering tools.

However, there are several areas for improvement:

The teams may experience delays in receiving support from the platform team during the onboarding process, and the processes for different engineering tools may be scattered, making them challenging to remember.
The current training format is quite standardized and does not offer personalized learning paths tailored to the user’s existing knowledge level or preferences.
Users often need to consult extensive documentation to find answers, which can be time-consuming and may lead to the platform team dedicating additional time to provide support and answer questions.

The rise of generative AI using large language models (LLMs) makes it possible to change how users are onboarded to the SaaS platform and post-onboarding learnings within an organization. In this article, I will explore a prototype solution that leverages LLMs and agents to improve the onboarding experience for SaaS-based engineering tools. From the technology perspective, it also explores how we can use LangChain and LangGraph to create a multi-agent-based LLM application.

Application design

In this prototype application, it provides three essential features:

Onboarding: Conversational interactive onboarding process.
Learning path advice: Recommend learning paths on a service based on the user’s knowledge level or preferences.
Q&A: Answer users’ questions about a service by referring to documents or searching the web.

It picks GitHub Enterprise Cloud, Snyk, and LaunchDarkly as example SaaS platforms. The application is a chatbot that puts all the services and features in one place. It uses LangChain and LangGraph to orchestrate all the agents and route to the right agent to take action intelligently by leveraging LLM. Behind the scenes, the agent calls one or multiple functions to achieve the expected goals. The application is designed to support each agent, which can have different models. For instance, “Router Agent” can use the OpenAI GPT-3.5 model while “Onboarding Agent” can use the OpenAI GPT-4 model instead. The diagram below shows the application’s high-level design.

Application high-level design — Diagramed by Author

The “Router Agent” is like a brain that determines where the request should go once it receives the request or query from the user. For example, if the user says, “I want to onboard to GitHub”, then the “Router Agent” knows it should go to the “Onboarding Agent” for service “GitHub” to process the onboarding.

The “Onboarding Agent” calls tools and functions to process the onboarding. It has multiple units or sub-agents, each of which can take different actions, like collecting information from the user, calling SaaS platform APIs or pre-built onboarding pipelines, etc.

The “LearningPath Agent” recommends suitable learning paths or objects from the collections of all available learning paths/objects based on the user’s request or description. For example, if the user says, “I have 5 years of experience using GitHub, what learning path do you recommend if I want to learn more?” then the agent will respond to the user with advanced-level learning objects.

The “Query Agent” answers the user’s questions about a service. It uses RAG (retrieval-augmented generation) to process the semantic search for a service based on the user’s query on the existing knowledge base. It can also provide web search by using the Tavily search tool when needed.

The “General Agent” handles user requests for cases unrelated to services or features.

Implement the application

The application is implemented in Python with LangChain framework and its extension library LangGraph. Since the application is a prototype, we only use dummy data and partial official SaaS platform documentation and downloaded public PDF tutorials for the demo. The primary purpose of this prototype is to show the approach using LLM and agents rather than onboarding itself, the actual onboarding process, like SaaS platform API calling and onboarding pipeline are not involved. As shown in the diagram below, we will build a LangGraph DAG (directed acyclic graph) workflow for agent routing and coordination.

DAG workflow with agents — Diagramed by Author

Define graph nodes

The nodes are essentially a LangChain chain with function binding, a tool executor, or the LangChain agent executor that binds tools.

Router node

The graph starts with the “router” node, which is the entry point for receiving the user’s request/query. Then it determines which node to go next. The “router” node with the LLM binds the function to identify which service and feature the user request targets. When the model is invoked, the response should indicate the service and feature via the function calling. We can use a prompt template (Jinja format) like the one below to guide the language model.

Your job is to help identify what service and feature the user is looking at.

The supported features are:

- onboarding: Help the user onboard to the service.
- learningpath: Provide advice on the learning paths/objects for the service depends on user's knowledge level or preferences. User can request to learn or know something about the service.
- query: Answer user's questions about the service other than onboarding and learningpaths.

The services support feature "onboarding" are:
{% for service in onboarding_services %}
- {{service}}
{% endfor %}

The services support feature "learningpath" are:
{% for service in learningpath_services %}
- {{service}}
{% endfor %}

The services support feature "query" are:
{% for service in query_services %}
- {{service}}
{% endfor %}

If the user requests a service or feature that is not supported, tell them what is supported.

If the user requests something unrelated to any service or feature, respond politely and generally.

We then use the pydantic model to define the fields and ask the LLM to follow the data rule from the LLM’s response.

class ServiceAndFeatureInput(BaseModel):
    service: str = Field(description="The service the user is looking at. The value should be lowercase.", title="Service", default="")
    feature: str = Field(description="The supported features, should be one of: onboarding, learningpath, query", title="Feature", default="")

For example, if the user requests: “I want to onboard to Snyk”, then the “router” node will resolve the service as “snyk” and the feature as “onboarding” and then route the next action to onboarding nodes. The below code snippet defines the “router” node that binds the function to determine service and feature based on the user’s requests. We define the “function_call” argument with the function name to tell the model this function must be called when dealing with the user’s request.

router_model = self.llm.bind_functions(functions=[convert_to_openai_function(t) for t in [get_service_and_feature]], function_call="get_service_and_feature")
router_chain = get_general_messages | router_model
router = partial(call_model, model=router_chain, base_model=self.llm)

Onboarding nodes

The onboarding process includes:

Collecting the required information for onboarding a service from the user.
Confirm with the user whether they want to continue or abort the onboarding once all the information is collected.
Taking action to onboard or abort.

To simulate the process, we can create a pydantic model with dummy fields for each service (GitHub, Snyk, LaunchDarkly) . Take GitHub as an example, the pydantic model can be defined as below. This will tell the model what fields to check and fill during the conversation with the user.

class GitHubInput(BaseModel):
    service: str = Field(description="The service you are onboarding to", title="Service", default="github")
    name: str = Field(description="Your full name", title="Full Name", default=None)
    email: EmailStr = Field(description="Your email", title="Email", default=None)
    organization: str = Field(description="Your organization", title="Organization", default=None)
    role: str = Field(description="Your role", title="Role", default=None)

To implement the onboarding process, we need to define five nodes.

CollectInfoAgent: Determine if it should continue collecting information or go to confirm the information once all is collected.

class OnboardService():
    
    serviceArgSchema: Type[BaseModel] = None
    
    def __init__(self, service: str):
        self.service = service
        
    async def onboard(self, user_data):
        raise NotImplementedError("onboard method must be implemented in the subclass")

# this snippet is from 'onboarding_utils.py'
# the serviceArgSchema will be the pydantic model as mentioned above like GitHubInput
def collect_info_tool(workflow, onboardService):
    collect_info_partial = functools.partial(collect_info, workflow=workflow)
    return StructuredTool.from_function(
        func=collect_info_partial,
        name="collect_info",
        description="Collects the information from the user, must collect all information required for onboarding.",
        return_direct=False,
        args_schema=onboardService.serviceArgSchema,
    )

# This snippet is from 'MainWorkflow' class.
#onboard_svc is the encapsulated class with service github, snyk or launchdarkly
collect_info_func = collect_info_tool(workflow=self, onboardService=onboard_svc)
model = llm.bind_functions(functions=[convert_to_openai_function(t) for t in [collect_info_func]], function_call="auto")
collect_info_agent = partial(call_model, model=model, base_model=llm)

CollectInfoAction: Used to collect the required information on the onboarding service via conversations with the user. It is the tool to update the user data and check if all required information is filled.

tool_executor = ToolExecutor([collect_info_func])
collect_info_action = partial(call_tool, tool_executor=tool_executor)

ConfirmInfoAgent: Once all the information is collected, it returns an AI message asking the user if they want to continue or abort onboarding. Then, LLM processes the user’s response to determine “yes” or “no.” Using an additional argument “need_confirm” in the AI message will tell the model to process the user’s response especially. This model binds the function to parse the user’s response as “yes” or “no.”

def confirm_information(state, workflow):
    """Confirm the information the user has provided and see if the user wants to continue or abort the onboarding process"""
    messages = state["messages"]
    service, _ = get_service_and_feature_from_messages(messages)
    if not service or service == "":
        raise ValueError("Service is not provided.")
    user_data = get_user_data(messages)
    missing_field = has_all_info(service, workflow, user_data)
    if missing_field == "":
        msg = "Thanks for providing all the information. Do you want to continue the onboarding(yes/no) or update the information?"
    else:
        msg = f"Still need to collect information for {missing_field}."
    return {
        "messages": [
            AIMessage(
                content=msg,
                additional_kwargs={"service": service, "need_confirm": "yes"},
            )
        ]
    }

def get_confirm_messages(messages):
    sys_msg = """
    Your job is to help identify if the user wants to continue or abort the onboarding process. 
    For other options, tell them to provide the correct choice (yes or no).
    """   
    messages = [SystemMessage(content=sys_msg)] + messages
    return messages

def call_model(state, model, base_model):
    ...
    if confirm and isinstance(last_message, HumanMessage):
        _llm = base_model.bind_functions(functions=[convert_to_openai_function(t) for t in [confirm_onboarding]], function_call="confirm_onboarding")
        chain = get_confirm_messages | _llm
        response = chain.invoke([HumanMessage(content = last_message.content)])
    ...

class ConfirmOnboardInput(BaseModel):
    choice: str = Field(description="Continue onboard or abort, should be one of: yes, no", title="Confirm", default=None)

@tool(return_direct=False, args_schema=ConfirmOnboardInput)
def confirm_onboarding(state):
    """Confirm the onboarding process"""
    messages = state["messages"]
    data = get_user_data(messages)
    if not data or len(data) == 0:
        msg = "No user data provided."
    else:
        msg = f"User data provided:\n" + " \n".join([f"{key}: {value}" for key, value in data.items()])
    return {
        "messages": [
            AIMessage(
                content=msg,
                additional_kwargs={},
            )
        ]
    }

OnboardAgent: Use the collected data as input to onboard to the SaaS platform. The onboard method under the “onboardService” will be called asynchronously. We add additional arguments in AI message to mark the onboarding completion. In this prototype application, we simply print the collected data, as we don’t focus on the onboarding operation.

def onboard_service(state, onboardService):
    """Onboards the user to the service"""
    messages = state["messages"]
    service, _ = get_service_and_feature_from_messages(messages)
    if not service or service == "":
        raise ValueError("Service is not provided.")
    user_data = get_user_data(messages)
    if not user_data or len(user_data) == 0:
        msg = "No user data provided."
    else:
        msg = f"You are onboarding to {service} with the following information:\n \n " + " \n".join([f"{key}: {value}" for key, value in user_data.items()]) + " \n\n Once it is completed, you will be notified."
        # The line below takes the onboard action asynchronously
        asyncio.run(onboardService.onboard(user_data))
        state["messages"].clear()
    return {
        "messages": [
            AIMessage(
                content=msg,
                additional_kwargs={"onboard_status": "completed", "service": service}
            )
        ]
    }

class GitHubOnboardService(OnboardService):
    serviceArgSchema: Type[BaseModel] = GitHubInput
    
    # Just simply print the collected data
    async def onboard(self, user_data):
        print(user_data) 

# Onboard service is the base class the encapsulate the service and its pydantic model
onboard_svc_func = partial(onboard_service, onboardService=onboard_svc)

OnboardAbort: Abort the onboarding when “no” is detected from the user’s response. It returns the AI message to inform the user that onboarding is aborted. Likewise, it appends additional arguments to mark the onboarding as completed.

def onboard_abort(state):
    """Aborts the onboarding process"""
    messages = state["messages"]
    service, _ = get_service_and_feature_from_messages(messages)
    if not service or service == "":
        raise ValueError("Service is not provided.")
    state["messages"].clear()
    return {
        "messages": [
            AIMessage(
                content=f"Onboarding process for {service} has been aborted. Please let me know if you need any other help.",
                additional_kwargs={"onboard_status": "completed", "service": service}
            )
        ]
    }

Learning path node

This node is the agent that binds two tools to give the user recommended learning paths and objects. The agent will automatically pick up the appropriate tool to call. It uses dummy data in the prototype stored in YAML.

lookup learning objects: Lookup all the available learning objects for the service.
retrieve learning paths: The agent will determine the user’s knowledge level or preferences by analyzing the user’s request or demand description and advise the recommended learning paths and objects. The user’s knowledge level is categorized as “beginner”, “intermediate” and “advanced”.

# in learningpath_input.py
class GitHubLearningPathInput(BaseModel):
    level: str = Field(description="GitHub knowledge level", title="GitHub Knowledge Level", default=None)
    preferences: List[str] = Field(description="Get the list of topics from available GitHub learning objects based preferences.", title="GitHub Learning Preferences", default=[])

# in base_learningpath_tool.py
class BaseLearningPathTool(BaseTool):
   ...
   def retrieve_learningpath(self, level: str, preferences: List[str]) -> Any:
      if len(preferences) == 0 and (not level or level not in ["beginner", "intermediate", "advanced"]):
          return "Sorry, I can't find an appropriate learning path for you. Or you can just simply choose from beginner, intermediate, or advanced."
      learningObjects = Configs.get_service_config(self.service, "learningObjects", "learningpath")
      if len(preferences) > 0:
          df = generate_learning_objects_dataframe()
          topics_with_preference = search_learning_objects(self.service, preferences, df)
          learningObjects = list(filter(lambda x: x["topic"] in topics_with_preference, learningObjects))
          response_prefix = f"Based on your preferences, here are the recommended learning objects on service {self.service}: \n"
      else:
          learningObjects = list(filter(lambda x: x["level"] == level, learningObjects))
          response_prefix = f"Here are the recommended learning objects to enhance your skill level on service {self.service} based on your current knowledge level as \"{level}\": \n"
      return unfold_learning_objects(learningObjects, response_prefix)
    ...

# in agent_utils.py
def create_func_agent(name: str, sys_prompt_path: str, sys_prompt_data: dict, tools: List, llm: ChatOpenAI):
    sys_template = PromptTemplate.from_file(sys_prompt_path)
    sys_prompt = sys_template.format(**sys_prompt_data)
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", sys_prompt),
            MessagesPlaceholder(variable_name="messages"),
            MessagesPlaceholder(variable_name="agent_scratchpad"),
        ]
    )
    func_agent = create_openai_functions_agent(llm, tools, prompt)
    executor = AgentExecutor(agent=func_agent, tools=tools, verbose=True)
    return name, executor

# in main_workflow.py. Pass the lookup learning objects and retrieve tool to th agent
name, learningpath_agent = create_func_agent(name=f"{service}_learningpath", sys_prompt_path=prompt_path, sys_prompt_data=prompt_data, tools=tools, llm=llm)
learningpath_node = partial(agent_node, agent=learningpath_agent, name=name)

To catch the user’s preferences over the conversation more accurately, the application simulates a way for similarity search. All the available learning objects’ topics and descriptions are embedded by the OpenAI embedding model (“text-embedding-ada-002”) and vectorized and then are persisted in a CSV file that can be fetched as the Pandas data frame. When carrying on the similarity search, it embeds the query, calculates the cosine similarity against each learning object topic and description, and then ranks the cosine similarity scores from high to low to pick the top n records.

def cosine_similarity(a: List[float], b: List[float]) -> float:
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def search_learning_objects(service: str, queries: List[str], df: pd.DataFrame, top_n: int = 3) -> List[str]:
    df_temp = df.copy()
    df_temp = df_temp[df_temp["service"] == service]
    results = {}
    for query in queries:
        query_vector = embed_text(query)
        df1 = df_temp.copy()
        df1["similarity"] = df1["vector"].apply(lambda x: cosine_similarity(x, query_vector))
        query_results = df1.sort_values(by="similarity", ascending=False).head(top_n)["topic"].values.tolist()
        for result in query_results:
            if result not in results:
                results[result] = 1
    return list(results.keys())

Query node

This node is the agent that binds two tools to answer the user’s questions on a service. One tool is “TavilySearchResult” which can be used for web searches via the Tavily API. The other is the retriever tool, which retrieves the top k relevant documents from the Chroma vector store as context for LLM to answer the user’s query.

# Define the query node with bound tools against a service
name, query_agent = create_func_agent(name=f"{service}_query", sys_prompt_path=prompt_path, sys_prompt_data=prompt_data, tools=tools, llm=llm)
query_node = partial(agent_node, agent=query_agent, name=name)

The application implements a basic RAG system that embeds and vectorizes the document sources from some example PDFs and official SaaS documentation sites. The Chroma vector store is persisted and indexed locally. Before running the application, the loading documents and Chroma persistence should be run and only need to run once. Then, it can be retrieved quickly by the agent. Take “GitHub” as an example, below is the code snippet to create the retriever to load the documents and persist the data.

class ServiceRetriever():
...
    def load_documents(self):
        ...
        for item in service_retrievers:
           type, paths = item["type"], item["paths"]
           if type == "pdf":
              ...
           elif type == "web":
              web_loader = WebBaseLoader(paths)
              docs = web_loader.aload()
              self.documents.extend(docs)
           ...
     
     def persist(self, search_type: str="mmr", top_k: int=2):
        if not self.documents or len(self.documents) == 0:
            raise ValueError("No documents to persist")
        if self.retriever is None:
            self.retriever = self.get_retriever(search_type=search_type, top_k=top_k)
        self.vectorstore.persist()
        self.retriever.add_documents(self.documents)

github_retriever = ServiceRetriever(service="github")
github_retriever.load_documents()
github_retriever.persist()

Define graph edges

The graph can have normal and conditional edges which connect the nodes. The normal edge connects two nodes in a direction that the path is definitely going through. In the application graph, we create a normal edge between the “CollectInfoAction” node and the “CollectInfoAgent” node, and after the “CollectInfoAction” node is executed, it must go back to the “CollectInfoAgent” node to determine the next action. We also define the normal edges between the last action node, such as the “OnboardAgent” to the “END” node, which is a special node in LangGraph that means the execution finishes.

The conditional edge is more complex. It has a start node that has multiple branches, reaching the node at the end of each branch. The conditional edges are connected with the condition rules. For example, “ConfirmInfoAgent” can have conditional edges between “OnboardAgent”, “OnboardAbort” and itself. The code snippet below is the condition function to define all the routes amongst all the conditional edges.

def route(state, workflow):
    messages = state["messages"]
    last_message = messages[-1]
    # If there is no function call, then finish
    if "function_call" not in last_message.additional_kwargs:
        return "End"
    else:
        if last_message.additional_kwargs["function_call"]["name"] == "get_service_and_feature":
            choices = json.loads(last_message.additional_kwargs["function_call"]["arguments"])
            if "service" in choices.keys() and "feature" in choices.keys():
                service, feature = choices["service"], choices["feature"]
                if service == "" or feature == "":
                    return "general"
                else:
                    if feature == "onboarding":
                        return f"{service}_CollectInfoAgent"
                    else:              
                        return f"{service}_{feature}"
            else:
                return "general"
        else:
            service, _ = get_service_and_feature_from_messages(messages)
            if not service or service == "":
                raise ValueError("Service is not provided.")
            if last_message.additional_kwargs["function_call"]["name"] in ["collect_info", "initiate_onboarding"]:
                user_data = get_user_data(messages)
                if has_all_info(service, workflow, user_data) == "":
                    return f"{service}_ConfirmInfoAgent"
                else:
                    return f"{service}_CollectInfoAction"
            elif last_message.additional_kwargs["function_call"]["name"] == "confirm_onboarding":
                data = json.loads(last_message.additional_kwargs["function_call"]["arguments"])
                if "choice" in data.keys():
                    if data["choice"] == "yes":
                        return f"{service}_OnboardAgent"
                    elif data["choice"] == "no":
                        return f"{service}_OnboardAbort"
                    else:
                        return f"{service}_ConfirmInfoAgent"
                else:
                    return f"{service}_ConfirmInfoAgent"
            else:
                return "End"

Once we build the graph, we have setup the application backbone. Below is the code snippet to show the method of building the graph.

# State class
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    service: Annotated[str, Field(description="The service the user is on", title="Service", default="")]
    feature: Annotated[str, Field(description="The feature the user is on", title="Feature", default="")]
    thread_id: Annotated[str, Field(description="The thread id for the conversation", title="Thread ID", default="")]

class MainWorkflow():

   def __init__(self, llm: ChatOpenAI, memory: BaseCheckpointSaver):
      ...
      self.workflow = StateGraph(AgentState)
   ...
   
   def build_graph(self):
      router_model = self.llm.bind_functions(functions=[convert_to_openai_function(t) for t in [get_service_and_feature]], function_call="get_service_and_feature")
      router_chain = get_general_messages | router_model
      router = partial(call_model, model=router_chain, base_model=self.llm)

      general_chain = get_general_messages | self.llm
      general_node = partial(call_model, model=general_chain, base_model=self.llm)
      
      self.workflow.add_node("router", router)
      nodes = {}
      
      if len(self.onboarding_nodes) == 0 or len(self.learningpath_nodes) == 0 or len(self.query_nodes) == 0:
          raise ValueError("No nodes have been registered")
      for name, node in self.onboarding_nodes.items():
          self.workflow.add_node(name, node)
          nodes[name] = name
      for name, node in self.learningpath_nodes.items():
          self.workflow.add_node(name, node)
          nodes[name] = name
      for name, node in self.query_nodes.items():
          self.workflow.add_node(name, node)
          nodes[name] = name
      self.workflow.add_node("general", general_node)
      nodes["general"] = "general"
      nodes["End"] = END
      route_func = partial(route, workflow=self)
      self.workflow.add_conditional_edges("router", route_func, nodes)
      for service in self.onboarding_services:
          self.workflow.add_conditional_edges(f"{service}_CollectInfoAgent", route_func, {
              f"{service}_CollectInfoAction": f"{service}_CollectInfoAction",
              f"{service}_ConfirmInfoAgent": f"{service}_ConfirmInfoAgent",
              f"{service}_OnboardAgent": f"{service}_OnboardAgent",
              f"{service}_OnboardAbort": f"{service}_OnboardAbort",
              "End": END
          })
          self.workflow.add_conditional_edges(f"{service}_ConfirmInfoAgent", route_func, {
              f"{service}_ConfirmInfoAgent": f"{service}_ConfirmInfoAgent",
              f"{service}_OnboardAgent": f"{service}_OnboardAgent",
              f"{service}_OnboardAbort": f"{service}_OnboardAbort",
              "End": END
          })
          self.workflow.add_edge(f"{service}_CollectInfoAction", f"{service}_CollectInfoAgent")
          self.workflow.add_edge(f"{service}_OnboardAgent", END)
          self.workflow.add_edge(f"{service}_OnboardAbort", END)
  
      for service in self.learningpath_services:
          self.workflow.add_edge(f"{service}_learningpath", END)
      
      for service in self.query_services:
          self.workflow.add_edge(f"{service}_query", END)
      
      self.workflow.add_edge("general", END)
      self.workflow.set_entry_point("router")
      self.graph = self.workflow.compile(checkpointer=self.memory)
      return self.graph

Memory and states

LangGraph supports checkpoints for message memories. In this prototype application, the in-memory sqlite is selected as the memory provider. It uses the “thread_id” attribute in the RunnableConfig to mark the current thread. When the thread id is changed, all the states across the graph will be refreshed, including messages. In this case, we can dynamically change the thread id when context change is detected, such as a service or feature switch during the conversation or the onboarding process is complete.

When the model is invoked, the tool is executed, or the agent executor is invoked, we can add the state check and return the new state data when the context is changed. Take the calling model scenario as an example, we can detect context change as below:

def call_model(state, model, base_model):
    ...
    cur_service, cur_feature = state["service"], state["feature"]
    if service and service != "" and feature and feature != "" and cur_service and cur_service != "" and cur_feature and cur_feature != "":
        if cur_service != service or cur_feature != feature:
            thread_id = str(uuid.uuid4())
    return {"messages": [response], "service": service, "feature": feature, "thread_id": thread_id}

# Then update the state in the graph
def update_state(self, config: RunnableConfig, prev_msgs: List[BaseMessage]):
    cur_state = self.graph.get_state(config=config).values
    prev_msgs.clear()
    if cur_state:
        messages = cur_state["messages"]            
        cur_thread_id = cur_state["thread_id"]
        if cur_thread_id and cur_thread_id != "":
            if config["configurable"]["thread_id"] != cur_thread_id:
                if len(messages) > 1:
                    # add the last human message and the last AI message as context for new state
                    for message in messages[::-1]:
                        if isinstance(message, HumanMessage):
                            prev_msgs.append(message)
                            break
                    prev_msgs.append(messages[-1])
                config["configurable"]["thread_id"] = cur_thread_id

When onboarding on a service process is complete (processed or aborted), we can add the onboard status to the AIMessage additional kwargs that can be detected from the state. Here is the code snippet to update the graph state when onboarding is completed.

def update_state_after_onboarding(self, config: RunnableConfig):
    cur_state = self.graph.get_state(config=config).values
    if cur_state:
        messages = cur_state["messages"]
        if isinstance(messages[-1], AIMessage) and "onboard_status" in messages[-1].additional_kwargs and messages[-1].additional_kwargs["onboard_status"] == "completed":
            print("Onboarding completed, updating thread id")
            thread_id = str(uuid.uuid4())
            config["configurable"]["thread_id"] = thread_id

In the graph, before and after the user message is processed, we just need to update the state when new thread id is applied.

Add new services

The application is designed to easily scale to new services. Take onboarding services as example, you can create a subclass of “OnboardService”, define the service argument schema which contains required fields for onboarding, and then implement the “onboard” method.

class SomeServiceInput(BaseModel):
   service: str = Field(description="The service you are onboarding to", title="Service", default="some_service")
   field1: str = Field(description="This is field1", title="Field1", default="")
   ...

class SomeServiceOnboard(OnboardService):
   serviceArgSchema: Type[BaseModel] = SomeServiceInput

   async def onboard(self, user_data):
      # process onboard
      pass

Then, you register this service with the graph, which will automatically expand the new nodes and edges accordingly.

some_service = SomeServiceOnboard(service="some_service")
main_workflow.register_onboarding_service(service="someservice", onboard_svc=some_service, llm=llm)

You can add a different number of services or features. Let’s say the new service is available on the “onboarding” feature but not available on the “learning path” and “query” features, you just need to register it for onboarding.

Chat UI

The application uses streamlit to quickly set up the chat window. It supports message history throughout the conversation with different roles (“user” and “assistant”). It also uses chunks in response messages to simulate streaming.

import streamlit as st
...

st.title("SaaS-based Engineering Tool Onboarding Assistant")
    
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
        
if user_input := st.chat_input("Tell me what do you want to do?"):
    st.session_state.messages.append({"role": "user", "content": user_input})
    with st.chat_message("user"):
        st.markdown(user_input)
        
    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        
        st.session_state.workflow.update_state_after_onboarding(st.session_state.config)
        response = st.session_state.graph.invoke({"messages": st.session_state.prev_msgs + [HumanMessage(content=user_input)], "thread_id": st.session_state.config["configurable"]["thread_id"]}, config=st.session_state.config)
        st.session_state.workflow.update_state(st.session_state.config, st.session_state.prev_msgs)
        final_resp = ""
        # simulate streaming
        for chunk in re.split(r'(\s+)', response["messages"][-1].content):
            final_resp += chunk + " "
            time.sleep(0.01)
            message_placeholder.markdown(final_resp)
    st.session_state.messages.append({"role": "assistant", "content": response["messages"][-1].content})

Here is a screenshot of when the application is started with the chat window.

The screenshot with the Chat UI when the application is started — By Author

Model choices

The application has been tested with “gpt-3.5-turbo” and “gpt-4-turbo-preview” models, which support function calling and tool execution well. It roughly groups the models as “router model”, which is used by the “router” node, and “service model”, which is shared by other nodes. You can also update the code to make the model specifically for a node. The GPT-3.5 model can have a not-bad outcome and has a faster response compared to the GPT-4 model; it is also much cheaper than GPT-4, which is more cost-effective. While the GPT-4 model can have better performance on question answering and routing decision making.

# the model name is defined in general.yaml
router_model = ChatOpenAI(model_name=Configs.general_configs()["router_model"], temperature=0)
service_model = ChatOpenAI(model_name=Configs.general_configs()["service_model"], temperature=0)
...
main_workflow = MainWorkflow(llm=router_model, memory=memory)
main_workflow.register_onboarding_service(service="github", onboard_svc=github_onboarding_service, llm=service_model)
...

## general.yaml
# For GPT-4 model, use "gpt-4-turbo-preview"
router_model: "gpt-3.5-turbo"
service_model: "gpt-3.5-turbo"

Pathway to production

When we talk about production from a prototype in this application, there are a series of items we need to cover. Here, I list some items that I think we should consider from a software engineering perspective.

Some of the design and code need to be refactored to make it more modular and easier to scale. It can be a standalone product or a component to integrate into a bigger engineering tooling platform or ecosystem.
The onboarding process can be enhanced to allow users to update the information before processing and attaching support documents such as approval evidence.
Deal with exceptions and edge cases for the language model response to provide a more friendly user experience.
Build a system that can manage and provide a full landscape of learning paths and objects for a service. For example, we can have more granular levels to meet different levels of audience and associate each learning object with links and materials that the users can directly access.
Build infrastructure for the RAG system to host and manage vector db and indexes on the knowledge base rather than persisting locally. Implement a more advanced RAG system rather than just the naive one. Llamaindex would be a better choice for RAG and indexing. RAG is the key for query features. It should have a fast and robust query response, an enhanced retriever, and a query engine that can also handle a long context window.
Enhanced memory and state management. Use a more robust memory provider like Redis cluster rather than in-memory SQLite. The context switch for the state change in the current prototype is not perfect. It has limitations to changing thread id on the context change that sometimes will confuse the LLM and get an inaccurate response. A better way can be considered is to lookup Redis for history on each context with corresponding memory operation in the application that can increase the accuracy of LLM response.
Improve prompts and output formats. Improve the prompts to guide the language models and generate more optimized responses. Also, enhance the output formats to have a better conversation experience.
Organizations or enterprises usually use Microsoft Teams or Slack as their instant message applications. It would be ideal to integrate the application into them, such as a Microsoft Teams chatbot.
Set up monitoring on the application, including LLM performance and API calling status. This can help us understand the application's health status and API usage. We can also add a feedback mechanism on the LLM response to help us adjust the application behavior to optimize the response.
Additional models support. If we need to support more models other than OpenAI models, we need to select the models that support function calling or tool executions, such as Mistral AI models or Anthropic claude models, but the code needs to be updated to adapt those models.

Conclusion

This article demonstrates a prototype solution that leverages OpenAI LLM and multi-agents to improve the SaaS-based engineering tool onboarding experience, including conversational, interactive onboarding, personalized learning paths/objects recommendations, and service Q&A with RAG and web search. It explores the application design, application implementation and pathways to production from a software engineering perspective. It covers the design and use of the LangChain framework and its extension library LangGraph to implement the multiple agents routing and coordinations with function calling and tool executions.

Complete code and demo

The complete code and demo can be found in this GitHub repository. The repository instructions show the steps to prepare the environment and data and run the application in more detail. The source code references some code from below repositories and articles:

https://github.com/langchain-ai/langgraph/tree/main/examples
https://github.com/DjangoPeng/openai-quickstart/
“Using LangChain ReAct Agents for Answering Multi-hop Questions in RAG Systems”, by Dr. Varshita Sher, https://towardsdatascience.com/using-langchain-react-agents-for-answering-multi-hop-questions-in-rag-systems-893208c1847e