Facial Keypoint Detection with Neural Networks

Matthew Hallac

Overview

For this project, I use neural networks to automatically detect key points on faces.

Nose tip detection

For the first part, given a face I try to detect only the tip of the nose. I compressed the images to 80x60 grayscale and used three convolutional layers with kernel size 3x3, with each followed by a max pool and a relu. My number of output channels went from 16 to 20 to 32, and these layers were followed by two fully connected layers from 1280 to 120 to 2, the x and y coordinates of the nose.

In the failure cases above, the nose was incorrectly identified to be below the eye. This could be due to the low amount of data that was used for training. The failure case on the left identifies the point as closer to the center of the image than it actually is, and that could be due to the face that for this step, data augmentation was not implemented, and when the face is not perfectly centered the network will get it wrong.

Full facial keypoints detection

Now, given a face I try to detect 58 keypoints. I used data augmentation with random rotations and translations and colorjitter.

The architecture of my example was as follows:

My hyperparameters were to run it for 25 epochs with a learning rate of .001, and I found this to give the optimal results. In the first failure case above, the network detected the eye as the nose, and in the second failure case, I believe the algorithm saw the shadow under the mans chin and followed that to get an incorrect outline of the face. Above you can also see some examples of learned filters from the first and second convolutional layers.

Train with a larger dataset

For the larger dataset, I used the Resnet18 model from pytorch with some modifications. The architecture consisted of the following:

Here were my results from the test dataset:

Here are the results on my own images