The above is derived from the images in this timelapse video by Noah Kalina (not me!). Used with permission.
Following from my previous experiments with face alignment, I got to thinking if the same techniques could be applied to photo-a-day time lapse projects, such as Noah Kalina’s 12.5 year (and counting) epic.
The idea is to account for incidental variances to acheive a smooth, yet extremely fast-forwarded view of the subject. Specifically, the variances being eliminated are:
As usual, I’m attacking this problem in Python. The code is relatively short (~250 lines), although I’m using dlib, OpenCV and numpy to do the heavy lifting. Source code is available here.
As a first step, lets account for facial position by rotating, translating and scaling images to match the first. Here’s the code to do that:
Here, the get_landmarks()
function uses dlib
to use facial landmark
features:
…and orthogonal_procrustes()
generates a transformation matrix which maps
one set of landmarks features onto another. This transformation is used by
warp_im()
to translate, rotate and scale images to line up with the first
image. For more details refer to my Switching Eds post which uses an identical approach in steps 1 and 2
to align the images.
After correcting for face position, you get a video that looks like this:
There are still a few obvious discontinuities. One variance we can easily iron out is the overall change in colour on the face due to different lighting and/or white balance settings.
The correction works by computing a mask for each image:
…based on the convex hull of the landmark points. This is then multiplied by the image itself:
The sum of the pixels in the masked image is then divided by the sum of the values in the mask, to give an average colour for the face:
Images’ RGB values are then scaled such that their average face colour matches that of the average face colour of the first image:
…where ref_color
is the color of the first face, saved from the first
iteration.
Here’s the first 5 seconds with color correction applied:
The above is looking pretty good, but there are still some issues causing lack of smoothness:
Given these perturbations are more or less random for each frame, the best we can do is select a subset of frames which is in some sense smooth.
To solve this I went the graph theory route:
Here I’ve split the video into 10 frame layers, with full connections from each layer to the next. The weights of each edge measures how different the two frames are, with the goal being to find the shortest path from Start to End; frames on the selected path are used in the output video. By doing this the total “frame difference” is minimized. Because the path length is fixed by the graph structure, this is equivalent to minimizing the average frame difference.
The metric used for frame difference is the euclidean norm between the pair of
images, after being masked to the face area. Here’s the code to calculate
weights
, a dict of dicts where weight[n1][n2]
gives the weight of the edge
between node n1
and n2
:
And because the graph is in fact a directed acyclic graph we can use a simplified version of Dijkstra’s Algorithm to find the shortest path:
Which yields the final smoother, although shorter video: