12 years in 15 seconds: Aligning and condensing a self-portrait time-lapse

The above is derived from the images in this timelapse video by Noah Kalina (not me!). Used with permission.

Introduction

Following from my previous experiments with face alignment, I got to thinking if the same techniques could be applied to photo-a-day time lapse projects, such as Noah Kalina’s 12.5 year (and counting) epic.

The idea is to account for incidental variances to acheive a smooth, yet extremely fast-forwarded view of the subject. Specifically, the variances being eliminated are:

• Face position: The orientation of the face within the frame.
• Lighting: Differences in illumination colour and/or white balance.
• Pose: Differences in facial pose, and lighting direction.

As usual, I’m attacking this problem in Python. The code is relatively short (~250 lines), although I’m using dlib, OpenCV and numpy to do the heavy lifting. Source code is available here.

Aligning

As a first step, lets account for facial position by rotating, translating and scaling images to match the first. Here’s the code to do that:

Here, the get_landmarks() function uses dlib to use facial landmark features:

…and orthogonal_procrustes() generates a transformation matrix which maps one set of landmarks features onto another. This transformation is used by warp_im() to translate, rotate and scale images to line up with the first image. For more details refer to my Switching Eds post which uses an identical approach in steps 1 and 2 to align the images.

After correcting for face position, you get a video that looks like this:

There are still a few obvious discontinuities. One variance we can easily iron out is the overall change in colour on the face due to different lighting and/or white balance settings.

The correction works by computing a mask for each image:

…based on the convex hull of the landmark points. This is then multiplied by the image itself:

The sum of the pixels in the masked image is then divided by the sum of the values in the mask, to give an average colour for the face:

Images’ RGB values are then scaled such that their average face colour matches that of the average face colour of the first image:

…where ref_color is the color of the first face, saved from the first iteration.

Here’s the first 5 seconds with color correction applied:

Speeding up

The above is looking pretty good, but there are still some issues causing lack of smoothness:

• Minor variations in facial pose.
• Changes in lighting direction.

Given these perturbations are more or less random for each frame, the best we can do is select a subset of frames which is in some sense smooth.

To solve this I went the graph theory route:

Here I’ve split the video into 10 frame layers, with full connections from each layer to the next. The weights of each edge measures how different the two frames are, with the goal being to find the shortest path from Start to End; frames on the selected path are used in the output video. By doing this the total “frame difference” is minimized. Because the path length is fixed by the graph structure, this is equivalent to minimizing the average frame difference.

The metric used for frame difference is the euclidean norm between the pair of images, after being masked to the face area. Here’s the code to calculate weights, a dict of dicts where weight[n1][n2] gives the weight of the edge between node n1 and n2:

And because the graph is in fact a directed acyclic graph we can use a simplified version of Dijkstra’s Algorithm to find the shortest path:

Which yields the final smoother, although shorter video:

If you enjoyed this post, please consider supporting my efforts:

Share on: