The function get_landmarks() takes an image in the form of a numpy array, and
returns a 68x2 element matrix, each row of which corresponding with the
x, y coordinates of a particular feature point in the input image.
The feature extractor (predictor) requires a rough bounding box as input to
the algorithm. This is provided by a traditional face detector (detector)
which returns a list of rectangles, each of which corresponding with a face in
So at this point we have our two landmark matrices, each row having coordinates
to a particular facial feature (eg. the 30th row gives the coordinates of the
tip of the nose). We’re now going to work out how to rotate, translate, and
scale the points of the first vector such that they fit as closely as possible
to the points in the second vector, the idea being that the same transformation
can be used to overlay the second image over the first.
To put it more mathematically, we seek \( T \), \( s \), and \( R \) such
is minimized, where \( R \) is an orthogonal 2x2 matrix, \( s \) is a
scalar, \( T \) is a 2-vector, and \( p_i \) and \( q_i \) are the rows
of the landmark matrices calculated above.
This function attempts to change the colouring of im2 to match that of im1.
It does this by dividing im2 by a gaussian blur of im2, and then
multiplying by a gaussian blur of im1. The idea here is that of a RGB
colour-correction, but instead of a constant scale factor across
all of the image, each pixel has its own localised scale factor.
With this approach differences in lighting between the two images can be
accounted for, to some degree. For example, if image 1 is lit from one side
but image 2 has uniform lighting then the colour corrected image 2 will
appear darker on the unlit side aswell.
That said, this is a fairly crude solution to the problem and an appropriate
size gaussian kernel is key. Too small and facial features from the first
image will show up in the second. Too large and kernel strays outside of the
face area for pixels being overlaid, and discolouration occurs. Here a kernel
of 0.6 * the pupillary distance is used.
4. Blending features from the second image onto the first
A mask is used to select which parts of image 2 and which parts of image 1
should be shown in the final image:
Regions with value 1 (shown white here) correspond with areas where image 2
should show, and regions with colour 0 (shown black here) correspond with areas
where image 1 should show. Value in between 0 and 1 correspond with a mixture
of image 1 and image2.
Here’s the code to generate the above:
Let’s break this down:
A routine get_face_mask() is defined to generate a mask for an image and a
landmark matrix. It draws two convex polygons in white: One surrounding the
eye area, and one surrounding the nose and mouth area. It then feathers the
edge of the mask outwards by 11 pixels. The feathering helps hide any
Such a face mask is generated for both images. The mask for the second is
transformed into image 1’s coordinate space, using the same transformation as
in step 2.
The masks are then combined into one by taking an element-wise maximum.
Combining both masks ensures that the features from image 1 are covered up,
and that the features from image 2 show through.
Finally, the mask is applied to give the final image: