Product developers’ guide to cross joins

First published on October 21, 2021

 

6 minute read

Nathaniel Tjandra

TLDR

Learn how to use AI to select your next outfit that matches the occasion. Featuring Pandas and the merge function to find all combinations.

Introduction

We’ve all been there staring into our closet, admiring all the clothes but not quite realizing all the different combinations of clothes. In this follow up guide, I’ll go over how we can use AI to make our selection more transparent by understanding all the combinations we can use and when we should wear them. Let’s get started by looking at our wardrobe!

What’s in your wardrobe? (Source: Anna Mu)

Before we begin

First, please make sure you know the basics of sorting and filtering in Pandas, as we’ll be using it frequently. For a quick refresher, read our

for information on importing, exporting, and processing all our data, and our guide to

. With that out of the way, let’s start by downloading the

dataset and importing it on Google Collab.

My closet

The wardrobe dataset contains information on 5 different shirts, and 5 different pants, along with labels for whether they are casual wear. We’ll break down this dataset into 2 portions, 1 for shirts and another for pants.

Shirts: T-Shirt, Sweatshirt, Vest, Tuxedo, and Suit

Pants: Slacks, Jeans, Shorts, Cargo pants, and overalls

So many options

Cartesian Product

Using our datasets, we’re going to begin creating a combination calculated by the

. This is an operation where every object is paired with another object. In other words, a shirt for each pair of pants to calculate all possible combinations. This is a many to many relationship, where many objects connect with many other objects.

Source: Mathstopia

Cross Join

Another term for the cartesian product is a cross join, which is a join that follows the operation of a cartesian product, by crossing each object with each other. A many to many relationship can be specified in Pandas using on and matching values. In this case, we want everything to match with everything, so we add a new column called ‘keys’ and store in both the same value.

Once the key is created, we call

merge

on the keys to perform the Cartesian Product.

Result of the cross join

Our result shows that we have 25 different outfits, and will use this finalized data to answer questions about the wardrobe and pick the right outfit for each special occasion, whether it’s a date, casual hangout, trips and more. Let’s start by looking at common scenarios.

Scenarios

What could be worn to a date?

You’ve got to dress to impress by picking formal clothing.

Slacks with a vest, tux, or suit. (Source: Date Night)

How about playing at an Amusement Park?

Casual clothing for fun and games!

You’ve got a lot of outfits to choose from.

What about going to class?

Wear clothes that match. The last thing you want is to be bullied because of the way you dress. There are 11 different combinations of matching clothes that are both casual or not.

Outfits that match (Source: Tumblr)

Anything special we could wear for hiking?

When hiking you want to be prepared, so wear cargo shorts with lots of pockets, along with any casual shirt.

A happy camper wears cargo pants and a t-shirt or a sweatshirt. (Source: National Geographic)

Finally, what about a leisure-filled vacation?

You may wear whatever outfit you wish. You’re on vacation and it doesn’t need to match. You have 25 different combinations to choose from.

A vacation to the Caribbean (Source: Jade Mountain)

Conclusion

Outside of saving time and wearing the right attire for the occasion, cross joins are an efficient way to get all combinations. Combined with filters, this can greatly reduce the cost and increase the speed of data preprocessing. Speaking of which, our outfit isn’t complete yet. It’s become customary to wear a mask out in public. Cross join your new combination with another dataset of masks to complete the outfit.

Just in time for Halloween! (Source: Disney)