Skip to main content
  1. Posts/

Streamlit App: Create a Machine Learning App with Python and Logistic Regression

·19 mins
Alejandro AO
Author
Alejandro AO
I’m a software engineer building AI applications. I publish weekly video tutorials where I show you how to build real-world projects. Feel free to visit my YouTube channel or Discord and join the community.

In this article, we will create a web application that predicts whether a tumor is malignant or benign. To do that, we will first train a model using the Logistic Regression algorithm. Then we will use the model to predict the diagnosis of a tumor. And finally, we will use Streamlit to create the web application.

We will use the Wisconsin Breast Cancer Dataset to train our model. So let’s get started! Also, feel free to check out the video version of this article right here πŸ‘‡

The dataset
#

The dataset contains 569 observations and 32 variables. The first 30 variables are the features that we will use to train our model. The last two variables are the ID number and the diagnosis (M = malignant, B = benign). We will use the first 30 variables to train our model and the last variable to evaluate it.

This dataset does need a bit of cleaning. The ID number is not useful for our model. So we will drop it. There is also a column called Unnamed: 32 and this column is empty. So we will drop it as well. We will also encode the diagnosis variable. We will use the map function from pandas to encode the diagnosis variable.

# Import the dataset
df = pd.read_csv('data.csv')

# Drop the ID number
df = df.drop(['id'], axis=1)

# Drop the Unnamed: 32 column
df = df.drop(['Unnamed: 32'], axis=1)

# Encode the diagnosis variable
df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})

The model
#

We will use the LogisticRegression function from sklearn.linear_model to train our model. But first, we need to normalize the data. We will use the StandardScaler function from sklearn.preprocessing to normalize the data.

We normalize the data because the Logistic Regression algorithm is sensitive to the scale of the features. Imagine one of your predictors is in the range of 0 to 1 and another predictor is in the range of 0 to 100. The Logistic Regression algorithm will give more weight to the predictor in the range of 0 to 100. By normalizing the data, we make sure that all the predictors are in the same range.

# Normalize the data
scaler = StandardScaler()
scaler.fit(df.drop('diagnosis', axis=1))
scaled_features = scaler.transform(df.drop('diagnosis', axis=1))

# Create the dataframe
df_feat = pd.DataFrame(scaled_features, columns=df.columns[:-1])

# Create the X and y variables
X = df_feat
y = df['diagnosis']

Now we can train our model using the LogisticRegression function from sklearn.linear_model.

# Create the model
logmodel = LogisticRegression()
logmodel.fit(X, y)

Test the model
#

There you go. We have our model ready. But how do we know if our model is any good? We can use the train_test_split function from sklearn.model_selection to split the dataset into a training set and a test set. We will use the training set to train our model and the test set to test our model.

# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

Now we can take the model that we previously trained and test it on the test set.

# Test the model
predictions = logmodel.predict(X_test)

We can use the classification_report function from sklearn.metrics to get a report of the model’s performance.

# Print the report
print(classification_report(y_test, predictions))

The report shows that our model has an accuracy of 98.25%. This is a very good accuracy. We could certainly do better by using the GridSearchCV function from sklearn.model_selection to find the best parameters for our model. But for now, since this tutorial is about creating a web application, let’s focus on that.

Now that we have our model, we can use it to predict whether a tumor is malignant or benign. But how do we do that? Just as we did above, we can use the predict function from sklearn.linear_model to predict the diagnosis of a tumor. This works even when you pass a list of features. This list of features will be an input from a user in the application that we are going to create.

# Predict the diagnosis of a tumor
logmodel.predict([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]])

In the code above, the predict function will return a list with one number. This number is the diagnosis of the tumor. If the number is 1, the tumor is malignant. If the number is 0, the tumor is benign. Of course, the actual values of the features will be different. But this is just an example.

Save the model and the scaler
#

Now that we have our model and our scaler, we need to save them –or export them– so that we can use them in our Streamlit app. But why do we need to save them? We can just use the predict function from sklearn.linear_model to predict the diagnosis of a tumor. Why do we need to save the model? And why do we need to save the scaler?

The answer is that we need to save the model and the scaler because we need to use them in our Streamlit app. We cannot just use the predict function from sklearn.linear_model to predict the diagnosis of a tumor in our Streamlit app if we don’t have the model.

Similarly, we cannot just use the transform function from sklearn.preprocessing to normalize the data in our Streamlit app. Saving the scaler is important because we cannot just create a new scaler on the streamlit app. The scaler would be different if we did that! We need the same scaler that we used to train the model.

To do this, we will use the pickle module from Python. It will allow us to save the model and the scaler.

In case you don’t know, the pickle module is used to save objects. We can save any object with the pickle module. We can save a list, a dictionary, a dataframe, a model, a scaler, etc. We can even save a function. And this is super useful when we want to export a model that we built to another project (or Streamlit app).

Let’s save our model. We will save the model as model.pkl.

# Save the model
pickle.dump(logmodel, open('model.pkl', 'wb'))

Now you should have the file model.pkl in your project folder. This is the file that we will use in our Streamlit app. Now let’s also save the scaler. We will save the scaler as scaler.pkl.

# Save the scaler
pickle.dump(scaler, open('scaler.pkl', 'wb'))

Create the web application
#

Now that we have our model, we can create the web application. Let’s set up the project. We will create a new folder called app. We will also create a new file called app.py inside the app folder. This is the file that we will use to create the web application. Inside this folder we will structure our project like this:

app
β”œβ”€β”€ app.py
β”œβ”€β”€ model.pkl
└── scaler.pkl

Note that we have added the model.pkl and scaler.pkl files to the app folder.

Install Streamlit
#

Streamlit is a Python library that makes it easy to create web applications. We will use Streamlit to create our web application. To install Streamlit, we will use the pip command.

pip install streamlit

Set up the project
#

Now let’s open the app.py file and start coding. We will start by importing the necessary libraries.

import streamlit as st
import pickle
import pandas as pd

To make the app more robust and by convention, we should add the following code:

if __name__ == '__main__':
    main()

This code tests whether the file is being run directly or imported. In short, this code will only run the main function if the file is being run directly. If the file is being imported, the main function will not run. It is a safety measure to make sure that the main function is only run when the file is being run directly.

Now we can start creating the app in the main function. We will start by adding a title to the app and some configurations. We will also add the page icon and the page layout.

# Add a title
st.set_page_config(page_title="Breast Cancer Diagnosis",
                    page_icon="πŸ‘©β€βš•οΈ", 
                    layout="wide", 
                    initial_sidebar_state="expanded")

This function allows us to set the title of the app, the icon, the layout, and the initial state of the sidebar. We can also set the theme of the app. We will use the default theme for now.

Let’s just add a simple header to the app to see that everything is working.

# Add a header
st.title("Breast Cancer Diagnosis")

Now we can run the app. To do this, we will use the streamlit run command. We will run the app from the terminal. We will use the cd command to go to the app folder. Then we will run the app with the streamlit run command.

cd app
streamlit run app.py

If everything is working, you should see the following output in the terminal:

You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://

Now open the link in your browser. You should see a page with the title “Breast Cancer Diagnosis” and the header “Breast Cancer Diagnosis”.

Set up the container and columns
#

Now we can set up the structure of the app. We will use the container function from streamlit to set up the structure. This is just a block that allows us to organize the app. We can add multiple containers to the app and put things inside them.

In order to write inside the container, we have two approaches. We can create the container first and then write inside it. Or we can write inside the container directly. We will use the second approach. We will create the container and write inside it in the same line of code.

Let’s remove the header that we previously added to actually add the structure. We will add a container with a title and a description of how to use the application.

# Set up the structure
    with st.container():
        st.title("Breast Cancer Diagnosis")
        st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")

Now we can run the app again. You should see the title and the description.

Now let’s create two columns under the title and description, but still inside the container. We will use the columns function from streamlit to create the columns. These columns will be stored in the variables col1 and col2 and we will define them so that the first column is 4 times bigger than the second column. Finally, we will write inside the columns with the with function.

# Set up the structure
    with st.container():
        st.title("Breast Cancer Diagnosis")
        st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
        col1, col2 = st.columns([4,1])
        with col1:
            st.write("Column 1")
        with col2:
            st.write("Column 2")

Now we can run the app again. Now you have the title and the description. And you have two columns under the title and description. Let’s add the sidebar to the app.

Add the sidebar
#

Now we can add the sidebar to the app. We will use the sidebar function from streamlit to add the sidebar. Inside the sidebar, we will add the sliders to update the measurements.

To make our code clearer, we will create a function called add_sidebar that will add the sidebar to the app. We will add the sliders to the sidebar in this function.

Now, there are a lot of predictors in our model, so we can think of the sliders as a way to update by hand the measurements that we receive from the cytology lab. We don’t need a button to update the measurements because the sliders will update the measurements automatically.

Also, the sliders require a minimum and a maximum value. But how can we know which are the minimum and maximum values for each predictor? For this exercise, since our training set is small,we will use the minimum and maximum values from it. But in a real application, we would need to do one of two things:

  1. We could know the minimum and maximum throretical values for each predictor and use those values.
  2. Or we could export the minimum and maximum values from the training set if it is too big (in order to avoid exporting the whole training set).

And yes. I had ChatGPT write the labels for me. It’s just faster.

# Load the data
import pandas as pd
def load_data():
    data = pd.read_csv("data/data.csv")
    return data

data = load_data()

# Add the sidebar
def add_sidebar(data):
    st.sidebar.header("Cell Nuclei Measurements")
    
    # Define the labels
    slider_labels = [
        ("Radius (mean)", "radius_mean"),
        ("Texture (mean)", "texture_mean"),
        ("Perimeter (mean)", "perimeter_mean"),
        ("Area (mean)", "area_mean"),
        ("Smoothness (mean)", "smoothness_mean"),
        ("Compactness (mean)", "compactness_mean"),
        ("Concavity (mean)", "concavity_mean"),
        ("Concave points (mean)", "concave points_mean"),
        ("Symmetry (mean)", "symmetry_mean"),
        ("Fractal dimension (mean)", "fractal_dimension_mean"),
        ("Radius (se)", "radius_se"),
        ("Texture (se)", "texture_se"),
        ("Perimeter (se)", "perimeter_se"),
        ("Area (se)", "area_se"),
        ("Smoothness (se)", "smoothness_se"),
        ("Compactness (se)", "compactness_se"),
        ("Concavity (se)", "concavity_se"),
        ("Concave points (se)", "concave points_se"),
        ("Symmetry (se)", "symmetry_se"),
        ("Fractal dimension (se)", "fractal_dimension_se"),
        ("Radius (worst)", "radius_worst"),
        ("Texture (worst)", "texture_worst"),
        ("Perimeter (worst)", "perimeter_worst"),
        ("Area (worst)", "area_worst"),
        ("Smoothness (worst)", "smoothness_worst"),
        ("Compactness (worst)", "compactness_worst"),
        ("Concavity (worst)", "concavity_worst"),
        ("Concave points (worst)", "concave points_worst"),
        ("Symmetry (worst)", "symmetry_worst"),
        ("Fractal dimension (worst)", "fractal_dimension_worst"),
    ]

    input_dict = {}

    # Add the sliders
    for label, key in slider_labels:
        input_dict[key] = st.sidebar.slider(
            label,
            min_value=float(data[key].min()),
            max_value=float(data[key].max()),
            value=float(data[key].mean())
        )
    
    return input_dict

The function add_sidebar returns a dictionary with the measurements. We will use this dictionary to make predictions with the model every time the user updates the measurements.

Now we can add the sidebar to the app. This way, our main function now looks like this:

def main():

    st.set_page_config(page_title="Breast Cancer Diagnosis",
                    page_icon="πŸ‘©β€βš•οΈ", 
                    layout="wide", 
                    initial_sidebar_state="expanded")

    # Add the sidebar
    input_dict = add_sidebar()

    # Add the structure
    with st.container():
        st.title("Breast Cancer Diagnosis")
        st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
        col1, col2 = st.columns([4, 1])
        with col1:
            st.write("Column 1")
        with col2:
            st.write("Column 2")

Great! Now we can run the app again. You should see the sidebar with the sliders. Now we can start filling the columns with the data! The first column will have a radar chart with the measurements. The second column will have the prediction.

Let’s start with the radar chart.

Add the radar chart
#

Now let’s create a radar chart with the measurements of the sliders. We will use the plotly library to create the radar chart. And it will be rendered when the user updates the measurements. We will get the measurements from the dictionary that the add_sidebar function returns.

Also, keep in mind that the data from the dictionary (the data from the sliders) is not scaled. so some of the measurements will be very small and others will be very big. This is not great for the radar chart. So we will scale the data before we create the radar chart.

But wait. Here is an issue. Do you remember that we saved a scaler in the pickle file? We could probably use it to scale the data. But since that scaler takes a list of 30 measurements, we would need to create a list with the measurements from the dictionary. And then we would need to scale the list.

But we don’t want to do that. We want to scale the data one by one. So we will create a function really quick to scale the data from the dictionary. Here is the function:

def get_scaled_values_dict(values_dict):
    # Define a Function to Scale the Values based on the Min and Max of the Predictor in the Training Data
    data = load_data()
    X = data.drop(['diagnosis'], axis=1)

    scaled_dict = {}

    for key, value in values_dict.items():
        max_val = X[key].max()
        min_val = X[key].min()
        scaled_value = (value - min_val) / (max_val - min_val)
        scaled_dict[key] = scaled_value

    return scaled_dict

Now we can use this function to scale the data from the dictionary inside the add_radar_chart function. Here is the function:

# Import the libraries
import plotly.graph_objects as go

# Import the scaler
scaler = pickle.load(open("scaler.pkl", "rb"))

def add_radar_chart(input_dict):
    # Scale the values
    input_dict = get_scaled_values_dict(input_dict)

    # Create the radar chart
    fig = go.Figure()

    # Add the traces
    fig.add_trace(
        go.Scatterpolar(
            r=[input_data['radius_mean'], input_data['texture_mean'], input_data['perimeter_mean'],
                input_data['area_mean'], input_data['smoothness_mean'], input_data['compactness_mean'],
                input_data['concavity_mean'], input_data['concave points_mean'], input_data['symmetry_mean'],
                input_data['fractal_dimension_mean']],
            theta=['Radius', 'Texture', 'Perimeter', 'Area', 'Smoothness', 'Compactness', 'Concavity', 'Concave Points',
                   'Symmetry', 'Fractal Dimension'],
            fill='toself',
            name='Mean'
        )
    )

    fig.add_trace(
        go.Scatterpolar(
            r=[input_data['radius_se'], input_data['texture_se'], input_data['perimeter_se'], input_data['area_se'],
                input_data['smoothness_se'], input_data['compactness_se'], input_data['concavity_se'],
                input_data['concave points_se'], input_data['symmetry_se'], input_data['fractal_dimension_se']],
            theta=['Radius', 'Texture', 'Perimeter', 'Area', 'Smoothness', 'Compactness', 'Concavity', 'Concave Points',
                   'Symmetry', 'Fractal Dimension'],
            fill='toself',
            name='Standard Error'
        )
    )

    fig.add_trace(
        go.Scatterpolar(
            r=[input_data['radius_worst'], input_data['texture_worst'], input_data['perimeter_worst'],
                input_data['area_worst'], input_data['smoothness_worst'], input_data['compactness_worst'],
                input_data['concavity_worst'], input_data['concave points_worst'], input_data['symmetry_worst'],
                input_data['fractal_dimension_worst']],
            theta=['Radius', 'Texture', 'Perimeter', 'Area', 'Smoothness', 'Compactness', 'Concavity', 'Concave Points',
                   'Symmetry', 'Fractal Dimension'],
            fill='toself',
            name='Worst'
        )
    )

    # Update the layout
    fig.update_layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[0, 1]
            )
        ),
        showlegend=True,
        autosize=True
    )

    return fig

And now we can add the radar chart to the app. This way, our main function now looks like this:

def main():

    st.set_page_config(page_title="Breast Cancer Diagnosis",
                    page_icon="πŸ‘©β€βš•οΈ", 
                    layout="wide", 
                    initial_sidebar_state="expanded")

    # Add the sidebar
    input_dict = add_sidebar()

    # Add the structure
    with st.container():
        st.title("Breast Cancer Diagnosis")
        st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
        col1, col2 = st.columns([4, 1])
        with col1:
            radar_chart = add_radar_chart(input_dict)
            st.plotly_chart(radar_chart, use_container_width=True)
        with col2:
            st.write("Column 2")

In the code above, we are adding the radar chart to the first column specifying that we want to use the full width of the column.

Great! Now we can run the app again. You should see the sidebar with the sliders. And you should see the radar chart. Try to update the measurements using the sliders. You should see the radar chart updating.

Add the prediction
#

Now let’s add the prediction. We will add some content to the second column. We will add the prediction and the probability of the prediction. Also, we will add some text to explain the prediction.

We will use the model and the scaler that we saved in the pickle file (because –don’t forget– we need to use the same scaler that we used to train the model).

Our function will take the input data and the model and the scaler as arguments. It will write the prediction and the probability of the prediction in our column. Here is the function:

def display_predictions(input_data, model, scaler):
    import streamlit as st

    import numpy as np
    input_array = np.array(list(input_data.values())).reshape(1, -1)
    input_data_scaled = scaler.transform(input_array)
    prediction = model.predict(input_data_scaled)

    st.subheader('Cell cluster prediction')
    st.write("The cell cluster is: ")

    if prediction[0] == 0:
        st.write("<span class='diagnosis bright-green'>Benign</span>",
                 unsafe_allow_html=True)
    else:
        st.write("<span class='diagnosis bright-red'>Malignant</span>",
                 unsafe_allow_html=True)

    st.write("Probability of being benign: ",
             model.predict_proba(input_data_scaled)[0][0])
    st.write("Probability of being malignant: ",
             model.predict_proba(input_data_scaled)[0][1])

    st.write("This app can assist medical professionals in making a diagnosis, but should not be used as a substitute for a professional diagnosis.")

In the code above, we first scale the input data. Then, we use the model to make the prediction. We use the predict_proba method to get the probability of the prediction. We then write the prediction and the probability in the column. We also add some text to explain the prediction.

Now we can add the function to our main function. This way, our main function now looks like this:

def main():
    ist.set_page_config(page_title="Breast Cancer Diagnosis",
                    page_icon="πŸ‘©β€βš•οΈ", 
                    layout="wide", 
                    initial_sidebar_state="expanded")

    # Add the sidebar
    input_dict = add_sidebar()

    # Add the structure
    with st.container():
        st.title("Breast Cancer Diagnosis")
        st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
        col1, col2 = st.columns([4, 1])
        
        with col1:
            radar_chart = add_radar_chart(input_dict)
            st.plotly_chart(radar_chart, use_container_width=True)
        
        with col2:
            display_predictions(input_data, model, scaler)

In the code above, we are adding the function to the second column. We are also passing the input data, the model, and the scaler as arguments.

Great! Now we can run the app again. You should see the sidebar with the sliders. And you should see the radar chart. Try to update the measurements using the sliders. You should see the radar chart updating. You should also see the prediction and the probability of the prediction.

But now, let’s add some style to the prediction. Did you notice that we used the unsafe_allow_html argument in the st.write function? This is because we want to add some HTML to the text. We can then add custom css classes to the content and target those classes in our style.css file. Let’s do that.

Add some style
#

We will create a style.css file in the same folder as our app.py file. We will add the following content to the file:

/* streamlit styles */
.block-container {
    height: 100vh;
    padding: 1rem 2rem;
}

/* graph and diagnosis container */
.css-z5fcl4 > div:nth-child(1) { /* replace */
    height: 100%;
    padding: 0;
}

/* make chart full height */
div.css-1sdqqxz div { /* replace */
    height: 100% !important;
    padding: 0 !important;
}

/* diagnosis box */
.css-j5r0tf { /* replace */
    padding: 1rem;
    border-radius: 0.5rem;
    background-color: #7E99AB;
}

/* sidebar */
.css-1vq4p4l { /* replace */
    padding-top: 1.5rem;
}

h3 {
    font-size: 1.5rem;
}

.diagnosis {
    color: #fff;

    padding: 0.2rem 0.5rem;
    border-radius: 0.5rem;
}

.bright-red {
    background-color: rgb(255, 75, 75);
}
.bright-green {
    background-color: #01DB4B;
    color: #000;
}

In the code above, we are adding some custom css classes. We are also targeting some of the default streamlit css classes and adding some custom css to them. Note that the css classes are different for each app. You can find the css classes by inspecting the elements in your browser. Once you find the css classes, you can replace them in the style.css file where I added the comment /* replace */.

Now we can add the style.css file to our app. We will add the following code to our app.py file:

with open("style.css") as f:
    st.markdown('<style>{}</style>'.format(f.read()), unsafe_allow_html=True)

In the code above, we are opening the style.css file and adding the content to the app. We are also using the unsafe_allow_html argument to allow the style.css file to be added to the app.

Now we can run the app again. You should see your app with some new styles! You should see the prediction in a box with a background color if it is benign and a red background color if it is malignant.

Conclusion
#

Great job! You did it! In this tutorial, we learned how to build a machine learning app with streamlit. We learned how to add a sidebar to the app and how to add a radar chart to the app. We then added a prediction to the app that updates as we update the measurements in the sidebar. We even added some custom styles to the app! If you want to check the final code, you can find it on GitHub.

Resources
#