How to use Streaming in LangChain and Streamlit

Table of Contents

Introduction
#

Quick links:
GitHub repo.
Google Colab.

Streaming is a powerful feature of many LLM providers and a crucial element in any LLM-powered application. It allows you to receive the generated text in real-time, as it is being generated. This way, you don’t have to wait for the entire text to be ready before you can start showing it to the user. This is especially useful for chatbots, where you want to show the user the response as soon as possible (like ChatGPT). This feature can improve the UX of your application and make it feel more responsive.

In this tutorial, we will be using LangChain to interact with the LLM provider and Streamlit to create the front-end of the app. Let’s get to it.

The chatbot that we will be building will have the following features:

It will stream the response from the LLM as it is being generated.
It will use LangChain to interact with the LLM.
It will use Streamlit to create the front-end of the app.
It will remember the chat history and show it to the user.

What is LCEL
#

LangChain Expression Language (LCEL) is the syntax used to define the LangChain chain. It is a simple and powerful language that allows you to define the structure of your chain in a pipeline-like manner. Here is a simple example of an LCEL chain:

chain = prompt | model | output_parser

If you are not familiar with the concept of a chain, you can think of it as a pipeline that takes an input, processes it through a series of steps, and then produces an output. You can learn more about it in the LangChain documentation.

A very convenient part of LCEL is that you can stream the output of a chain as it is generated.

How to stream a response from you LLM
#

Usually, when you create a chain in LangChain, you would have to use the method chain.invoke() to generate the output. This method will return the output of the chain as a whole. However, if you want to stream the output, you can use the method chain.stream() instead. This method will return a generator that will yield the output as it is generated.

Let’s take a look at how it works. Here is a simple example of a chain that streams the output:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI()
chunks = []

async for chunk in llm.astream("hello. tell me what is the difference between kubernetes and docker"):
    chunks.append(chunk)
    print(chunk.content, end="|", flush=True)

# |K|ubernetes| and| Docker| are| both| popular| tools| used| in| the| world| of| container|ization|,| but| they| serve| different| purposes| and| have| different| functionalities|.
...

This code will print the generated every token as it is generated. In this example, we added a vertical bar | character between each token to make it easier to see the tokens being generated.

But consider that each token generated is returned as an instance of AIMessageChunk. Just like AIMessage and HumanMessage, AIMessageChunk is a class that represents a generated text. These chunks are additive, which means that they can be concatenated to form the full response.

A regular Streamlit Chatbot
#

Now that we saw how to stream the response from the LLM using LangChain, let’s chck a way to stream it in Streamlit. Strealit is a great way to create simple web applications in Python with minimal code.

Let’s create the structure of a simple chatbot that is already working. Consider this code:

import streamlit as st
from langchain_core.messages import AIMessage, HumanMessage
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate


load_dotenv()

# app config
st.set_page_config(page_title="Streamlit Chatbot", page_icon="🤖")
st.title("Chatbot")

def get_response(user_query, chat_history):

    template = """
    You are a helpful assistant. Answer the following questions considering the history of the conversation:

    Chat history: {chat_history}

    User question: {user_question}
    """

    prompt = ChatPromptTemplate.from_template(template)

    llm = ChatOpenAI()
        
    chain = prompt | llm | StrOutputParser()
    
    return chain.invoke({
        "chat_history": chat_history,
        "user_question": user_query,
    })

# session state
if "chat_history" not in st.session_state:
    st.session_state.chat_history = [
        AIMessage(content="Hello, I am a bot. How can I help you?"),
    ]

    
# conversation
for message in st.session_state.chat_history:
    if isinstance(message, AIMessage):
        with st.chat_message("AI"):
            st.write(message.content)
    elif isinstance(message, HumanMessage):
        with st.chat_message("Human"):
            st.write(message.content)

# user input
user_query = st.chat_input("Type your message here...")
if user_query is not None and user_query != "":
    st.session_state.chat_history.append(HumanMessage(content=user_query))

    with st.chat_message("Human"):
        st.markdown(user_query)

    with st.chat_message("AI"):
        response = get_response(user_query, st.session_state.chat_history)
        st.write(response)

    st.session_state.chat_history.append(AIMessage(content=response))

The code above is a very simple chatbot application that is capable of having a conversation with the user. But the code above has one problem: it waits for the entire response to be generated before showing it to the user. This can make the application feel unresponsive and slow.

Stream the response in Streamlit
#

To stream the response in Streamlit, we can use the latest method introduced by Streamlit (so be sure to be using the latest version): st.write_stream(). This method writes the content of a generator to the app. This way, we can use the chain.stream() method to stream the response from the LLM to the app.

Let’s update our get_response function to use the chain.stream() method:

def get_response(user_query, chat_history):

    template = """
    You are a helpful assistant. Answer the following questions considering the history of the conversation:

    Chat history: {chat_history}

    User question: {user_question}
    """

    prompt = ChatPromptTemplate.from_template(template)

    llm = ChatOpenAI()
        
    chain = prompt | llm | StrOutputParser()
    
    return chain.stream({
        "chat_history": chat_history,
        "user_question": user_query,
    })

Now we can call that function in the Streamlit app and use the st.write_stream() method to stream the response to the app:

# user input
user_query = st.chat_input("Type your message here...")
if user_query is not None and user_query != "":
    st.session_state.chat_history.append(HumanMessage(content=user_query))

    with st.chat_message("Human"):
        st.markdown(user_query)

    with st.chat_message("AI"):
        response = st.write_stream(get_response(user_query, st.session_state.chat_history))

    st.session_state.chat_history.append(AIMessage(content=response))

And that’s it! Now the chatbot will show the response as it is being generated. This will make the application feel more responsive and improve the user experience.

Note that we are defining the variable response in the same line as the st.write_stream() method. This is because the st.write_stream() method returns the final value of the generator. This way, we can use the response variable to append the generated message to the chat history.

If we were to define the response variable before calling the st.write_stream() method, the variable would be assigned the generator object, and the chat history would be appended with the generator object instead of the generated message. I will let you try that on your side and see what I mean.

Conclusion
#

In this article we saw how to create a very simple LLM-powered chatbot that uses streaming to improve the user experience. To do this, we used LangChain and Streamlit to make the development as fast and simple as possible. I hope that you found this useful.

Don’t forget to join the community on Discord to meet cool people who are also building generative AI applications.

Introduction#

What is LCEL#

How to stream a response from you LLM#

A regular Streamlit Chatbot#

Stream the response in Streamlit#

Conclusion#