In this article, we will create a web application that predicts whether a tumor is malignant or benign. To do that, we will first train a model using the Logistic Regression algorithm. Then we will use the model to predict the diagnosis of a tumor. And finally, we will use Streamlit to create the web application.
We will use the Wisconsin Breast Cancer Dataset to train our model. So let’s get started! Also, feel free to check out the video version of this article right here π
The dataset#
The dataset contains 569 observations and 32 variables. The first 30 variables are the features that we will use to train our model. The last two variables are the ID number and the diagnosis (M = malignant, B = benign). We will use the first 30 variables to train our model and the last variable to evaluate it.
This dataset does need a bit of cleaning. The ID number is not useful for our model. So we will drop it. There is also a column called Unnamed: 32
and this column is empty. So we will drop it as well. We will also encode the diagnosis variable. We will use the map
function from pandas
to encode the diagnosis variable.
# Import the dataset
df = pd.read_csv('data.csv')
# Drop the ID number
df = df.drop(['id'], axis=1)
# Drop the Unnamed: 32 column
df = df.drop(['Unnamed: 32'], axis=1)
# Encode the diagnosis variable
df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})
The model#
We will use the LogisticRegression
function from sklearn.linear_model
to train our model. But first, we need to normalize the data. We will use the StandardScaler
function from sklearn.preprocessing
to normalize the data.
We normalize the data because the Logistic Regression algorithm is sensitive to the scale of the features. Imagine one of your predictors is in the range of 0 to 1 and another predictor is in the range of 0 to 100. The Logistic Regression algorithm will give more weight to the predictor in the range of 0 to 100. By normalizing the data, we make sure that all the predictors are in the same range.
# Normalize the data
scaler = StandardScaler()
scaler.fit(df.drop('diagnosis', axis=1))
scaled_features = scaler.transform(df.drop('diagnosis', axis=1))
# Create the dataframe
df_feat = pd.DataFrame(scaled_features, columns=df.columns[:-1])
# Create the X and y variables
X = df_feat
y = df['diagnosis']
Now we can train our model using the LogisticRegression
function from sklearn.linear_model
.
# Create the model
logmodel = LogisticRegression()
logmodel.fit(X, y)
Test the model#
There you go. We have our model ready. But how do we know if our model is any good? We can use the train_test_split
function from sklearn.model_selection
to split the dataset into a training set and a test set. We will use the training set to train our model and the test set to test our model.
# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
Now we can take the model that we previously trained and test it on the test set.
# Test the model
predictions = logmodel.predict(X_test)
We can use the classification_report
function from sklearn.metrics
to get a report of the model’s performance.
# Print the report
print(classification_report(y_test, predictions))
The report shows that our model has an accuracy of 98.25%. This is a very good accuracy. We could certainly do better by using the GridSearchCV
function from sklearn.model_selection
to find the best parameters for our model. But for now, since this tutorial is about creating a web application, let’s focus on that.
Now that we have our model, we can use it to predict whether a tumor is malignant or benign. But how do we do that? Just as we did above, we can use the predict
function from sklearn.linear_model
to predict the diagnosis of a tumor. This works even when you pass a list of features. This list of features will be an input from a user in the application that we are going to create.
# Predict the diagnosis of a tumor
logmodel.predict([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]])
In the code above, the predict
function will return a list with one number. This number is the diagnosis of the tumor. If the number is 1, the tumor is malignant. If the number is 0, the tumor is benign. Of course, the actual values of the features will be different. But this is just an example.
Save the model and the scaler#
Now that we have our model and our scaler, we need to save them βor export themβ so that we can use them in our Streamlit app. But why do we need to save them? We can just use the predict
function from sklearn.linear_model
to predict the diagnosis of a tumor. Why do we need to save the model? And why do we need to save the scaler?
The answer is that we need to save the model and the scaler because we need to use them in our Streamlit app. We cannot just use the predict
function from sklearn.linear_model
to predict the diagnosis of a tumor in our Streamlit app if we don’t have the model.
Similarly, we cannot just use the transform
function from sklearn.preprocessing
to normalize the data in our Streamlit app. Saving the scaler is important because we cannot just create a new scaler on the streamlit app. The scaler would be different if we did that! We need the same scaler that we used to train the model.
To do this, we will use the pickle
module from Python. It will allow us to save the model and the scaler.
In case you don’t know, the pickle
module is used to save objects. We can save any object with the pickle
module. We can save a list, a dictionary, a dataframe, a model, a scaler, etc. We can even save a function. And this is super useful when we want to export a model that we built to another project (or Streamlit app).
Let’s save our model. We will save the model as model.pkl
.
# Save the model
pickle.dump(logmodel, open('model.pkl', 'wb'))
Now you should have the file model.pkl
in your project folder. This is the file that we will use in our Streamlit app. Now let’s also save the scaler. We will save the scaler as scaler.pkl
.
# Save the scaler
pickle.dump(scaler, open('scaler.pkl', 'wb'))
Create the web application#
Now that we have our model, we can create the web application. Let’s set up the project. We will create a new folder called app
. We will also create a new file called app.py
inside the app
folder. This is the file that we will use to create the web application. Inside this folder we will structure our project like this:
app
βββ app.py
βββ model.pkl
βββ scaler.pkl
Note that we have added the model.pkl
and scaler.pkl
files to the app
folder.
Install Streamlit#
Streamlit is a Python library that makes it easy to create web applications. We will use Streamlit to create our web application. To install Streamlit, we will use the pip
command.
pip install streamlit
Set up the project#
Now let’s open the app.py
file and start coding. We will start by importing the necessary libraries.
import streamlit as st
import pickle
import pandas as pd
To make the app more robust and by convention, we should add the following code:
if __name__ == '__main__':
main()
This code tests whether the file is being run directly or imported. In short, this code will only run the main
function if the file is being run directly. If the file is being imported, the main
function will not run. It is a safety measure to make sure that the main
function is only run when the file is being run directly.
Now we can start creating the app in the main
function. We will start by adding a title to the app and some configurations. We will also add the page icon and the page layout.
# Add a title
st.set_page_config(page_title="Breast Cancer Diagnosis",
page_icon="π©ββοΈ",
layout="wide",
initial_sidebar_state="expanded")
This function allows us to set the title of the app, the icon, the layout, and the initial state of the sidebar. We can also set the theme of the app. We will use the default theme for now.
Let’s just add a simple header to the app to see that everything is working.
# Add a header
st.title("Breast Cancer Diagnosis")
Now we can run the app. To do this, we will use the streamlit run
command. We will run the app from the terminal. We will use the cd
command to go to the app
folder. Then we will run the app with the streamlit run
command.
cd app
streamlit run app.py
If everything is working, you should see the following output in the terminal:
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Network URL: http://
Now open the link in your browser. You should see a page with the title “Breast Cancer Diagnosis” and the header “Breast Cancer Diagnosis”.
Set up the container and columns#
Now we can set up the structure of the app. We will use the container
function from streamlit
to set up the structure. This is just a block that allows us to organize the app. We can add multiple containers to the app and put things inside them.
In order to write inside the container, we have two approaches. We can create the container first and then write inside it. Or we can write inside the container directly. We will use the second approach. We will create the container and write inside it in the same line of code.
Let’s remove the header that we previously added to actually add the structure. We will add a container with a title and a description of how to use the application.
# Set up the structure
with st.container():
st.title("Breast Cancer Diagnosis")
st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
Now we can run the app again. You should see the title and the description.
Now let’s create two columns under the title and description, but still inside the container. We will use the columns
function from streamlit
to create the columns. These columns will be stored in the variables col1
and col2
and we will define them so that the first column is 4 times bigger than the second column. Finally, we will write inside the columns with the with
function.
# Set up the structure
with st.container():
st.title("Breast Cancer Diagnosis")
st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
col1, col2 = st.columns([4,1])
with col1:
st.write("Column 1")
with col2:
st.write("Column 2")
Now we can run the app again. Now you have the title and the description. And you have two columns under the title and description. Let’s add the sidebar to the app.
Add the sidebar#
Now we can add the sidebar to the app. We will use the sidebar
function from streamlit
to add the sidebar. Inside the sidebar, we will add the sliders to update the measurements.
To make our code clearer, we will create a function called add_sidebar
that will add the sidebar to the app. We will add the sliders to the sidebar in this function.
Now, there are a lot of predictors in our model, so we can think of the sliders as a way to update by hand the measurements that we receive from the cytology lab. We don’t need a button to update the measurements because the sliders will update the measurements automatically.
Also, the sliders require a minimum and a maximum value. But how can we know which are the minimum and maximum values for each predictor? For this exercise, since our training set is small,we will use the minimum and maximum values from it. But in a real application, we would need to do one of two things:
- We could know the minimum and maximum throretical values for each predictor and use those values.
- Or we could export the minimum and maximum values from the training set if it is too big (in order to avoid exporting the whole training set).
And yes. I had ChatGPT write the labels for me. It’s just faster.
# Load the data
import pandas as pd
def load_data():
data = pd.read_csv("data/data.csv")
return data
data = load_data()
# Add the sidebar
def add_sidebar(data):
st.sidebar.header("Cell Nuclei Measurements")
# Define the labels
slider_labels = [
("Radius (mean)", "radius_mean"),
("Texture (mean)", "texture_mean"),
("Perimeter (mean)", "perimeter_mean"),
("Area (mean)", "area_mean"),
("Smoothness (mean)", "smoothness_mean"),
("Compactness (mean)", "compactness_mean"),
("Concavity (mean)", "concavity_mean"),
("Concave points (mean)", "concave points_mean"),
("Symmetry (mean)", "symmetry_mean"),
("Fractal dimension (mean)", "fractal_dimension_mean"),
("Radius (se)", "radius_se"),
("Texture (se)", "texture_se"),
("Perimeter (se)", "perimeter_se"),
("Area (se)", "area_se"),
("Smoothness (se)", "smoothness_se"),
("Compactness (se)", "compactness_se"),
("Concavity (se)", "concavity_se"),
("Concave points (se)", "concave points_se"),
("Symmetry (se)", "symmetry_se"),
("Fractal dimension (se)", "fractal_dimension_se"),
("Radius (worst)", "radius_worst"),
("Texture (worst)", "texture_worst"),
("Perimeter (worst)", "perimeter_worst"),
("Area (worst)", "area_worst"),
("Smoothness (worst)", "smoothness_worst"),
("Compactness (worst)", "compactness_worst"),
("Concavity (worst)", "concavity_worst"),
("Concave points (worst)", "concave points_worst"),
("Symmetry (worst)", "symmetry_worst"),
("Fractal dimension (worst)", "fractal_dimension_worst"),
]
input_dict = {}
# Add the sliders
for label, key in slider_labels:
input_dict[key] = st.sidebar.slider(
label,
min_value=float(data[key].min()),
max_value=float(data[key].max()),
value=float(data[key].mean())
)
return input_dict
The function add_sidebar
returns a dictionary with the measurements. We will use this dictionary to make predictions with the model every time the user updates the measurements.
Now we can add the sidebar to the app. This way, our main function now looks like this:
def main():
st.set_page_config(page_title="Breast Cancer Diagnosis",
page_icon="π©ββοΈ",
layout="wide",
initial_sidebar_state="expanded")
# Add the sidebar
input_dict = add_sidebar()
# Add the structure
with st.container():
st.title("Breast Cancer Diagnosis")
st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
col1, col2 = st.columns([4, 1])
with col1:
st.write("Column 1")
with col2:
st.write("Column 2")
Great! Now we can run the app again. You should see the sidebar with the sliders. Now we can start filling the columns with the data! The first column will have a radar chart with the measurements. The second column will have the prediction.
Let’s start with the radar chart.
Add the radar chart#
Now let’s create a radar chart with the measurements of the sliders. We will use the plotly
library to create the radar chart. And it will be rendered when the user updates the measurements. We will get the measurements from the dictionary that the add_sidebar
function returns.
Also, keep in mind that the data from the dictionary (the data from the sliders) is not scaled. so some of the measurements will be very small and others will be very big. This is not great for the radar chart. So we will scale the data before we create the radar chart.
But wait. Here is an issue. Do you remember that we saved a scaler in the pickle
file? We could probably use it to scale the data. But since that scaler takes a list of 30 measurements, we would need to create a list with the measurements from the dictionary. And then we would need to scale the list.
But we don’t want to do that. We want to scale the data one by one. So we will create a function really quick to scale the data from the dictionary. Here is the function:
def get_scaled_values_dict(values_dict):
# Define a Function to Scale the Values based on the Min and Max of the Predictor in the Training Data
data = load_data()
X = data.drop(['diagnosis'], axis=1)
scaled_dict = {}
for key, value in values_dict.items():
max_val = X[key].max()
min_val = X[key].min()
scaled_value = (value - min_val) / (max_val - min_val)
scaled_dict[key] = scaled_value
return scaled_dict
Now we can use this function to scale the data from the dictionary inside the add_radar_chart
function. Here is the function:
# Import the libraries
import plotly.graph_objects as go
# Import the scaler
scaler = pickle.load(open("scaler.pkl", "rb"))
def add_radar_chart(input_dict):
# Scale the values
input_dict = get_scaled_values_dict(input_dict)
# Create the radar chart
fig = go.Figure()
# Add the traces
fig.add_trace(
go.Scatterpolar(
r=[input_data['radius_mean'], input_data['texture_mean'], input_data['perimeter_mean'],
input_data['area_mean'], input_data['smoothness_mean'], input_data['compactness_mean'],
input_data['concavity_mean'], input_data['concave points_mean'], input_data['symmetry_mean'],
input_data['fractal_dimension_mean']],
theta=['Radius', 'Texture', 'Perimeter', 'Area', 'Smoothness', 'Compactness', 'Concavity', 'Concave Points',
'Symmetry', 'Fractal Dimension'],
fill='toself',
name='Mean'
)
)
fig.add_trace(
go.Scatterpolar(
r=[input_data['radius_se'], input_data['texture_se'], input_data['perimeter_se'], input_data['area_se'],
input_data['smoothness_se'], input_data['compactness_se'], input_data['concavity_se'],
input_data['concave points_se'], input_data['symmetry_se'], input_data['fractal_dimension_se']],
theta=['Radius', 'Texture', 'Perimeter', 'Area', 'Smoothness', 'Compactness', 'Concavity', 'Concave Points',
'Symmetry', 'Fractal Dimension'],
fill='toself',
name='Standard Error'
)
)
fig.add_trace(
go.Scatterpolar(
r=[input_data['radius_worst'], input_data['texture_worst'], input_data['perimeter_worst'],
input_data['area_worst'], input_data['smoothness_worst'], input_data['compactness_worst'],
input_data['concavity_worst'], input_data['concave points_worst'], input_data['symmetry_worst'],
input_data['fractal_dimension_worst']],
theta=['Radius', 'Texture', 'Perimeter', 'Area', 'Smoothness', 'Compactness', 'Concavity', 'Concave Points',
'Symmetry', 'Fractal Dimension'],
fill='toself',
name='Worst'
)
)
# Update the layout
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[0, 1]
)
),
showlegend=True,
autosize=True
)
return fig
And now we can add the radar chart to the app. This way, our main function now looks like this:
def main():
st.set_page_config(page_title="Breast Cancer Diagnosis",
page_icon="π©ββοΈ",
layout="wide",
initial_sidebar_state="expanded")
# Add the sidebar
input_dict = add_sidebar()
# Add the structure
with st.container():
st.title("Breast Cancer Diagnosis")
st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
col1, col2 = st.columns([4, 1])
with col1:
radar_chart = add_radar_chart(input_dict)
st.plotly_chart(radar_chart, use_container_width=True)
with col2:
st.write("Column 2")
In the code above, we are adding the radar chart to the first column specifying that we want to use the full width of the column.
Great! Now we can run the app again. You should see the sidebar with the sliders. And you should see the radar chart. Try to update the measurements using the sliders. You should see the radar chart updating.
Add the prediction#
Now let’s add the prediction. We will add some content to the second column. We will add the prediction and the probability of the prediction. Also, we will add some text to explain the prediction.
We will use the model and the scaler that we saved in the pickle
file (because βdon’t forgetβ we need to use the same scaler that we used to train the model).
Our function will take the input data and the model and the scaler as arguments. It will write the prediction and the probability of the prediction in our column. Here is the function:
def display_predictions(input_data, model, scaler):
import streamlit as st
import numpy as np
input_array = np.array(list(input_data.values())).reshape(1, -1)
input_data_scaled = scaler.transform(input_array)
prediction = model.predict(input_data_scaled)
st.subheader('Cell cluster prediction')
st.write("The cell cluster is: ")
if prediction[0] == 0:
st.write("<span class='diagnosis bright-green'>Benign</span>",
unsafe_allow_html=True)
else:
st.write("<span class='diagnosis bright-red'>Malignant</span>",
unsafe_allow_html=True)
st.write("Probability of being benign: ",
model.predict_proba(input_data_scaled)[0][0])
st.write("Probability of being malignant: ",
model.predict_proba(input_data_scaled)[0][1])
st.write("This app can assist medical professionals in making a diagnosis, but should not be used as a substitute for a professional diagnosis.")
In the code above, we first scale the input data. Then, we use the model to make the prediction. We use the predict_proba
method to get the probability of the prediction. We then write the prediction and the probability in the column. We also add some text to explain the prediction.
Now we can add the function to our main function. This way, our main function now looks like this:
def main():
ist.set_page_config(page_title="Breast Cancer Diagnosis",
page_icon="π©ββοΈ",
layout="wide",
initial_sidebar_state="expanded")
# Add the sidebar
input_dict = add_sidebar()
# Add the structure
with st.container():
st.title("Breast Cancer Diagnosis")
st.write("Please connect this app to your cytology lab to help diagnose breast cancer form your tissue sample. This app predicts using a machine learning model whether a breast mass is benign or malignant based on the measurements it receives from your cytosis lab. You can also update the measurements by hand using the sliders in the sidebar. ")
col1, col2 = st.columns([4, 1])
with col1:
radar_chart = add_radar_chart(input_dict)
st.plotly_chart(radar_chart, use_container_width=True)
with col2:
display_predictions(input_data, model, scaler)
In the code above, we are adding the function to the second column. We are also passing the input data, the model, and the scaler as arguments.
Great! Now we can run the app again. You should see the sidebar with the sliders. And you should see the radar chart. Try to update the measurements using the sliders. You should see the radar chart updating. You should also see the prediction and the probability of the prediction.
But now, let’s add some style to the prediction. Did you notice that we used the unsafe_allow_html
argument in the st.write
function? This is because we want to add some HTML to the text. We can then add custom css classes to the content and target those classes in our style.css
file. Let’s do that.
Add some style#
We will create a style.css
file in the same folder as our app.py
file. We will add the following content to the file:
/* streamlit styles */
.block-container {
height: 100vh;
padding: 1rem 2rem;
}
/* graph and diagnosis container */
.css-z5fcl4 > div:nth-child(1) { /* replace */
height: 100%;
padding: 0;
}
/* make chart full height */
div.css-1sdqqxz div { /* replace */
height: 100% !important;
padding: 0 !important;
}
/* diagnosis box */
.css-j5r0tf { /* replace */
padding: 1rem;
border-radius: 0.5rem;
background-color: #7E99AB;
}
/* sidebar */
.css-1vq4p4l { /* replace */
padding-top: 1.5rem;
}
h3 {
font-size: 1.5rem;
}
.diagnosis {
color: #fff;
padding: 0.2rem 0.5rem;
border-radius: 0.5rem;
}
.bright-red {
background-color: rgb(255, 75, 75);
}
.bright-green {
background-color: #01DB4B;
color: #000;
}
In the code above, we are adding some custom css classes. We are also targeting some of the default streamlit css classes and adding some custom css to them. Note that the css
classes are different for each app. You can find the css
classes by inspecting the elements in your browser. Once you find the css
classes, you can replace them in the style.css
file where I added the comment /* replace */
.
Now we can add the style.css
file to our app. We will add the following code to our app.py
file:
with open("style.css") as f:
st.markdown('<style>{}</style>'.format(f.read()), unsafe_allow_html=True)
In the code above, we are opening the style.css
file and adding the content to the app. We are also using the unsafe_allow_html
argument to allow the style.css
file to be added to the app.
Now we can run the app again. You should see your app with some new styles! You should see the prediction in a box with a background color if it is benign and a red background color if it is malignant.
Conclusion#
Great job! You did it! In this tutorial, we learned how to build a machine learning app with streamlit. We learned how to add a sidebar to the app and how to add a radar chart to the app. We then added a prediction to the app that updates as we update the measurements in the sidebar. We even added some custom styles to the app! If you want to check the final code, you can find it on GitHub.