CS 194-26: Project 4

Facial Keypoint Detection with Neural Networks

Imaani Choudhuri
Part 1: Nose Tip Detection

The first part of this project consist of training a neural network to predict where the bottom of the nose would be on a facial image.

hover to enlarge

Here are some images sampled from my dataloader with the true nose keypoint:

Here is the training and validation MSE loss during the training process (green is validation, blue is training):

Here are some results. As you can see, the fifth and eighth images in the second batch detects the nose very well, while the second, fourth and eighth images in the first batch do not. I suspect this is because the neural net confuses the nose region with changes in facial topography, like the cheek recesses in the failure cases.

Part 2: Full Facial Keypoints Detection

The second part of this project is training all 58 keypoints on the same dataset, with a small neural network. Specifically, my architecture used 5 convolutional layers and two linear layers, with maxpools (on convolutions) and relu after each layer (except the last). My hyperparameters were a learning rate 1e-03 with the Adam optimizer and MSE loss, a batch size of 4, and 12 epochs of training.

Here are some images sampled from my dataloader with the true keypoints:

Here is the training and validation MSE loss during the training process (green is validation, blue is training):

Here are some results. As you can see, the first image in the first batch and the last image in the second batch appear to be quite good predictions, but the last image of the first batch and third image of the second batch appear to be quite off. The failure cases seem to be mainly on turned faces, but not all turned faces. It appears that sometimes the neural net will recognize half the face, and incorrectly predicts which side the rest of the face goes. The neural net tends to choose whichever option is more centered, which makes sense given that the majority of faces in the dataset are front-facing and centered.

Here are the learned filters for the first convolutional layer. (I calculated them for the subsequent layers as well, but the images were very large and wouldn't fit).

Part 3: Train With Larger Dataset

In the final part, we do the same as last part but on a much larger dataset. I used the recommended architecture, resnet18 modified to take one channel as input and output 68*2 points. My hyperparameters were similar to part 2, learning rate 1e-03 with the Adam optimizer and MSE loss, a batch size of 6, and 10 epochs.

Here is the MSE loss over training epochs:

Here are some results.

thanks for checking out my submission! :D