Guide to model training — Part 6: Save & Load

First published on January 4, 2022

Last updated at January 12, 2022

 

6 minute read

Nathaniel Tjandra

TLDR

After creating multiple models, it’s hard to keep track of all of them especially in a collaborative work environment. Learn how to save and load your models using Pickle!

Outline

  • Recap

  • Before we begin

  • Model differences

  • Saving the model

  • Loading the model

  • Conclusion

Recap

In our last part, we successfully created a model for a remarketing campaign for the holidays. To review the model, we’ll need to share the model results with our cross functional team and staff. We’ll want our data scientists and data analysts to be able to access the data without remaking the model every time they close their computer.

Lost progress

Before we begin

In this guide, we’ll cover how to export our machine learning model and import it back in using Python, no prior knowledge is required. In part 6, we completed training a classification model, so now we’ll be exporting it. The dataset can also be found

.

Model differences

When creating a model, it’s worth noting that each time you run it the results may change due to a random value, also known as a

seed

. Therefore, even when you have the same data, the model may give different results when you run it. For instance, our model uses

Logistic Regression

to train it, which is a

discriminative

machine learning algorithm.

Create a model using Logistic Regression

Discriminative algorithms

This doesn’t mean that the algorithm is discriminatory, but rather it tries to draw a line between our data to represent a boundary. This line is also referred to as the

decision boundary

. Then, it will classify the data based on where it ends up, depending on which side of the boundary, in our case it’s whether a user will click on the remarketing email.

Decision boundary (Source: Vidhya)

Splitting the data

When data is split into a train and test set, not all values are guaranteed to be the same each time because there’s no set seed. In this case, each time the model is trained, the algorithm will use a

pseudo-random

value that makes multiple splits that are highly unlikely to be the same.

As unlikely as it seems, repeats do happen (Source: Dilbert)

To keep it consistent, we set random_state equal to a constant value. For this model I’ve chosen 3493 as my seed, to have the same resulting splits making it easier to replicate.

Saving the model

To save our model, we’ll use the

pickle

function in Python. The pickle function starts by

pickling

the data, converting it through serialization into a byte stream. This serialization is a sequence of bytes arranged to form the hierarchy, or order, of the original model. Note that only booleans, integers, strings, arrays, dictionaries, functions, classes, and other Python original data types may be pickled. It cannot pickle numpy objects unless using

, which has a similar syntax.

As for why it’s called Pickle… (Source: PngItem)

Remember the Pickle

The term remains shrouded in mystery as to why it’s called pickle, but a fun way to remember the name is due to the process of why people pickle. Traditionally, many cultures practice pickling as a form of preservation and storage. Having a longer shelf time means they can go back without the food spoiling. Likewise, data scientists aren’t going to complete optimizing a model in one sitting, nor will developers share their computers.

Pickling isn’t only for pickles. There’s kimchi too! (Source: ABCNews)

Dumping Pickle

The simplest way to save a model is as a byte object tied directly to a variable. This can be useful if you don’t need it as a file or want to experiment with different models in the same sitting. When using

.dumps

(with an s), the model is stored into a byte object.

The command to create a pickle file is

pickle.dump

, which converts the model into a pickle and places it into a file. First, we’ll open the file with write access, write our pickle into it, and then close it.

Remember to specify wb to allow the program to write to the file.

Loading the model

Similar to saving the model, we’ll use pickle again to load our data with the

load 

and

loads 

function. Once you have a pickle, you can open it up to retrieve the original data using pickle.load. Likewise, pickle.loads will take a byte instead.

From variable

Remember to specify rb to allow the program to read the file.

Checking the Pickle

Next, we’ll train it on the same split of X_train and X_test and evaluate the scores. When loading a model, results are always the same, since it’s the same model and data.

The scores all match, so the pickle is correct.

Conclusion

Thus ends this segment on saving and loading your machine learning model. We hope that you’re able to remember the pickle, the process of pickling, and will pickle and share your machine learning models. In the next series, we’ll export our results stored as

to evaluate our model metrics more thoroughly.