Using artificial intelligence to differentiate between human and synthetic hair wigs with Amazon SageMaker

Olalekan Elesin
6 min readMay 2, 2020

--

Hair retailers that sell high volume of hair extensions must ensure that they hair types and quality are not compromised when fulfilling customer orders. This is often accomplished with human workers as sales agents which is often times error prone, and might not be the best use of the creativity of their sales persons. This task of identifying hair quality or type is accomplished based on experience of hair retailers looking at the hair extension strands. This is a classic case for computer vision.

To enable hair retailers focus on growing business, one can build an application that uses a custom image classification model to detect and notify a retailer the quality and type of an extension, at the point he/she is buying in bulk from the vendor. This provides a powerful, scalable, and simple solution for quality control. In this tutorial, we will make use of Amazon SageMaker to train and deploy a hair quality control machine learning model. In a later tutorial, we will automate this machine learning pipeline with Amazon Step Functions. Check out my previous tutorial to learn about Amazon SageMaker and Amazon Step Functions.

To simulate a production scenario, we will train an image classification model with Amazon SageMaker built-in algorithm using an example dataset of images scraped from an online hair vendor, GlamorousRemiHair.com. Code snippets for scraping in the images are also provided as well.

Required Steps:

  1. Define a hypothesis.
  2. Download example Jupyter Notebook.
  3. Run on your local machine or create Amazon SageMaker Notebook Instance or launch Amazon SageMaker Studio.
  4. Scrape and prepare image dataset and upload to S3.
  5. Use Jupyter Notebook to train and deploy image classification model with Amazon SageMaker
  6. Use test images to make predictions in your Jupyter Notebook.

Define our Hypothesis

If we are able to differentiate between synthetic and human hair wigs with images right before customers complete their purchase, then we can help customers can build trust with hair vendors. We will know that we succeeded if we achieve 65% validation accuracy with our machine learning model.

Download Jupyter Notebook

The starting point for this tutorial is to download my example Jupyter Notebook which you can run on your local machine, Amazon SageMaker Notebook Instance or Amazon SageMaker Studio. I made use of Amazon SageMaker Studio. I made use of Amazon SageMaker Studio.

Scrape Image Dataset

In the example notebook which you downloaded, we will scrape our image dataset from download GlamorousRemiHair.com and ensure that the images are in the right format for Amazon SageMaker Image Classification Algorithm.

Functions to scrape images from GlamorousRemiHair.com
images_to_classify├── human-hair
│ ├── 1.jpg
│ ├── 2.jpg
| ├── 3.jpg
│ └── . . .
└── synthetic-hair
│ ├── 1.jpg
│ ├── 2.jpg
│ ├── 3.jpg
│ ├── . . .
└── . . .
Synthetic and Human Hair wig images. Courtesy: GlamorousHair.com

Prepare Image Dataset and Upload to S3

Amazon SageMaker Image Classification algorithm requires the dataset to be in Apache MxNet RecordIO format or in .lst file format. Luckily, Apache MxNet provides a script to do this with few lines of code.

Create .lst flle with Apache MxNet imrec converter

Once the completed the data preparation, we upload our prepared dataset to Amazon S3. I also had a quick look into the contents of the generated .lst file.

Train and deploy Image Classification Model

Our image classification model is trained on Amazon SageMaker GPU instance ml.p2.xlarge. To save up to 70% on training costs, I used the managed spot settings on Amazon SageMaker. You can read here for more information on Amazon SageMaker managed spot training. See image below:

Managed Spot Training configuration on Amazon SageMaker

With all training infrastructure configurations in place, we then set our hyperparameters. The Amazon SageMaker Image Classification comes with some default hyperparameters, but you can creative and adjust to your use case. For production grade models, it is recommended tune your hyperparameters. This can be done using the Automated Model Tuning from Amazon SageMaker.

We set our hyperparameters, training and validation dataset, and train our custom image classification model:

Once model training is completed, we see the following:

  • Our model ran for 10 epochs, and on the 10th epoch, we have a Train-Accuracy of 1.0 and validation-accuracy of ~0.96. This means that on our training set, our model is able to accurately distinguish between our image classes — human hair and synthetic hair.
  • We saved 70.4% of training costs. Instead of paying $1.26, which is the cost/hour for ml.p2.xlarge instance, we paid ~$0.38.

We then deploy our model with Amazon SageMaker’s beautiful one-liner:

wig_classifier = image_classifier.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')

Testing Predictions

Our Amazon SageMaker Endpoint is now deployed and ready to make predictions. For this, we download an image, convert it to Python bytearray and call the method predict on our endpoint. To handle this in a single flow, I created a simple function with a response similar to that of predictions from Amazon Rekognition (I’m a huge fan of Amazon Rekognition).

def predict(file_name: str):
"""
"""
# test image
with open(file_name, 'rb') as f:
payload = f.read()
payload = bytearray(payload)
wig_classifier.content_type = 'application/x-image'
result = json.loads(wig_classifier.predict(payload))
return {
'Labels': [
{
'Name': 'Human Hair',
'Confidence': "%.16f" % float(result[0])
},
{
'Name': 'Synthetic Hair',
'Confidence': "%.16f" % float(result[1])
}
]
}

Synthetic Hair

Source: Aliexpress.com Synthetic Hair
{
"Labels": [
{
"Name": "Human Hair",
"Confidence": "0.0000056653252614"
},
{
"Name": "Synthetic Hair",
"Confidence": "0.9999942779541016"
}
]
}

Human Hair

Source: Aliexpress.com Human Hair
{
"Labels": [
{
"Name": "Human Hair",
"Confidence": "0.9999984502792358"
},
{
"Name": "Synthetic Hair",
"Confidence": "0.0000015822050727"
}
]
}

Curly Hair

Source: Aliexpress.com
{
"Labels": [
{
"Name": "Human Hair",
"Confidence": "0.5440295934677124"
},
{
"Name": "Synthetic Hair",
"Confidence": "0.4559704363346100"
}
]
}

Clean up

Finally, we clean up our deployed Amazon SageMaker Endpoint to save costs.

wig_classifier.delete_endpoint()

Conclusion

This is my first recipe with Amazon SageMaker Studio. There are several next steps one could build on to improve the model for real-life use cases. One which I tried was an image segmentation model to identify hair quality at pixel level. Another I’m looking into is Data Augmentation techniques, e.g. changing image angles, replacing backgrounds with real life backgrounds, etc. By augmenting our image dataset, we could easily go from 119 images to 52,000 images depending on the techniques combined. This would increase the number of training image samples, likely the accuracy of the model

I hope to share more with you. If you enjoyed reading this, kindly share and comment. You can reach me via email, follow me on Twitter or connect with me on LinkedIn. Can’t wait to hear from you!!

Jupyter Notebook available below:

Follow through video tutorial

--

--

Olalekan Elesin
Olalekan Elesin

Written by Olalekan Elesin

Enterprise technologist with experience across technical leadership, architecture, cloud, machine learning, big-data and other cool stuff.

No responses yet