Tuesday, 7 January 2014

Shards: Building an image out of triangles

This blog was originally supposed to be about small snippets or projects in Python. After more than a year of it being active and only two meagre posts about C++ I thought it might be time to get this going. So ...

Around eight months ago I stumbled on the blog post Genetic algorithms, Mona Lisa and JavaScript + Canvas by Jacob Seidelin where a simple program is presented to approximate an image by a finite number of polygons (triangles in his examples). The purpose of the post is to explore the use of genetic algorithms for this task and was actually written in response to an even earlier post by Roger Alsing - Genetic Programming: Evolution of Mona Lisa.

A genetic algorithm in this application is really just a stochastic optimisation algorithm. The population is a set of states - the vertices and colours of the polygons - which are the DNA. The perturbations to the colours and vertices are the mutations, and the fitness is measured by constructing the approximation and checking how close it is to the original image. Minor details can vary, but in essence, for each iteration the population is mutated - which can include mixing the DNA from different states - based on the fitness of each DNA strand to define a new population. Hopefully overtime this population will comprise only of DNA strands with high fitness.

After reading each of these posts, I thought it might be interesting to see how well a more standard optimisation algorithm would fare. Since the interim results of the algorithm are kind of fun to look at I thought it would be worth writing this up. All code is available on github.

Problem formulation

For simplicity we will consider approximating an image by $N$ overlaid triangles with fixed opacity $\alpha \in (0, 1]$.

Let $\mathcal{I}(\mathbf{x}) \colon \mathcal{D} \to \mathbb{U}^3$ denote the input colour image where $\mathcal{D} \subset \mathbb{R}^2$ is the image domain and $\mathbb{U}$ is the unit interval $[0,1]$. Let $\Delta$ denote the 6-tuple which specifies the geometry of a triangle so that $\Delta = (x_0, y_0, x_1, y_1, x_2, y_2)$. The mask defined by $\Delta$ is the function $M[\Delta](\mathbf{x}) \colon \mathcal{D} \to \{0, 1\}$ which returns $1$ when $\mathbf{x}$ is inside or on the boundary of the triangle, and $0$ otherwise. Given an initial approximate image $\mathcal{J}_0$, the image formed by overlaying the triangle $\Delta$ with colour $\mathbf{c} \in \mathbb{U}^3$ and opacity $\alpha$ is given by:

\[ \mathcal{J}_1(\mathbf{x}) = \mathbf{c} \alpha M[\Delta](\mathbf{x}) + \mathcal{J}_0(\mathbf{x})(1 - \alpha M[\Delta](\mathbf{x})) \]

Therefore, if $\mathcal{J}_n$ denotes the approximation after $n$ triangles then $\mathcal{J}_{n + 1}$ is similarly defined:

\[ \mathcal{J}_{n + 1}(\mathbf{x}) = \mathbf{c}_{n + 1} \alpha M[\Delta_{n + 1}](\mathbf{x}) + \mathcal{J}_n(\mathbf{x})(1 - \alpha M[\Delta_{n + 1}](\mathbf{x})) \]

which can be unwrapped so that the final approximation $\mathcal{J}_N$ - which is dependent on $\mathcal{T} = \{\Delta_i\}_{i = 1}^N$ and $\mathcal{C} = \{\mathbf{c}_i\}_{i = 1}^N$ - is given by:

\[ \mathcal{J}_N(\mathbf{x} | \mathcal{T}, \mathcal{C}) = \mathcal{J}_0(\mathbf{x})\prod_{i = 1}^N \left(1 - \alpha M[\Delta_i](\mathbf{x}) \right) + \sum_{i = 1}^N \mathbf{c}_i \alpha M[\Delta_i](\mathbf{x}) \prod_{j = i + 1}^N \left(1 - \alpha M[\Delta_j](\mathbf{x}) \right) \]

The advantage of the unwrapped equation is that it confirms that the triangles and their colours are not exchangeable - triangles in lower layers ($i$ closer to 1) are attenuated by triangles in the higher layers ($i$ closer to $N$).

With $\mathcal{J}_N$ defined, the least squares reconstruction error is:

\[ E(\mathcal{T}, \mathcal{C}) = \frac{1}{2} \sum_{\mathbf{x} \in \mathcal{D}} \left\| \mathcal{I}(\mathbf{x}) - \mathcal{J}_N(\mathbf{x} | \mathcal{T}, \mathcal{C}) \right\|^2 \]

Optimisation methods and results

With $E(\mathcal{T}, \mathcal{C})$ defined we are now free to choose the optimisation method to solve for $\mathcal{T}$ and $\mathcal{C}$.

For simplicity, an initial approach to consider is a greedy one where you solve for each triangle and colour one at a time.  That is, given $\mathcal{J}_n$, solve for $\Delta_{n + 1}$ and $\mathbf{c}_{n + 1}$ by minimising the reconstruction error of $\mathcal{J}_{n + 1}$:

\[ E_{n + 1}(\Delta_{n + 1}, \mathbf{c}_{n + 1}) = \frac{1}{2} \times \sum_{\mathbf{x} \in \mathcal{D}} \left\| \mathcal{I}(\mathbf{x}) - \left\{ \mathbf{c}_{n + 1} \alpha M[\Delta_{n + 1}](\mathbf{x}) + \mathcal{J}_n(\mathbf{x})(1 - \alpha M[\Delta_{n + 1}](\mathbf{x})) \right\} \right\|^2 \]

Since we are aiming to start with the simplest optimisation algorithm first, we can solve for $\Delta_{n + 1}$ and $\mathbf{c}_{n + 1}$ in alternation - fix $\Delta_{n + 1}$ and solve for $\mathbf{c}_{n + 1}$ then fix $\mathbf{c}_{n + 1}$ and solve for $\Delta_{n + 1}$. This is initialised by randomly sampling $\Delta_{n + 1}$. This whole process is repeated multiple times and the final result with the lowest reconstruction error is retained.

This is appealing - as a first approach - because when $\Delta_{n + 1}$ is fixed $\mathbf{c}_{n + 1}$ can be solved in closed form (if you ignore the constraint that $\mathbf{c} \in \mathbb{U}^3$ until afterwards). To solve for $\Delta_{n + 1}$, the Levenberg-Marquardt (LM) algorithm is an appropriate and simple choice because the reconstruction error is a sum of square terms. It is provided in scipy as scipy.optimize.leastsq which is a wrapper to MINPACK's LMDER. However, this is where we run into a problem - LM requires that $E_{n + 1}$ is differentiable in $\Delta_{n + 1}$ but it is not because of the hard (non-smooth in $\mathbf{x}$) mask $M[\Delta](\mathbf{x})$.

The solution is to use a soft mask which is commonly found in level set segmentation methods. Let $D[\Delta](\mathbf{x})$ represent the signed distance transform of the triangle which is negative inside the triangle and positive outside (Figure 1). The soft mask can then be defined as:

\[ \hat M[\Delta](\mathbf{x}) = \frac{1}{1 + e^{k D[\Delta](\mathbf{x})}} \]

which is a sigmoid mirrored about the $y$-axis. For $k > 0$ this means that $0.5 < \hat M[\Delta](\mathbf{x}) < 1.0$ inside the triangle and $0 < \hat M[\Delta](\mathbf{x}) < 0.5$ outside (Figure 1).

Figure 1: The signed distance transform for a triangle (left) and the soft mask (right).

Using this initial approach (GreedyAlt) with $\alpha = 0.3, k = 4.0$ and $10$ restarts per triangle, the reconstruction results can be seen in Figure 2. The reconstruction errors from left to right for $N = 50$ are $70.56$, $263.19$, $335.76$, $243.46$, and $136.94$ respectively. For $N = 100$ they are $33.66$, $150.52$, $190.70$, $129.07$, and $75.36$ respectively. On an Intel Core i7-4702MQ, the average running time for $N = 50$ was around 25 minutes and for $N = 100$ was around 40 minutes.

Figure 2: Results of GreedyAlt. First row is ground-truth, second row is with $N = 50$, third row is with $N = 100$.

These initial results are quite good but we can still do better. Instead of using alternation we can solve for $\Delta_{n + 1}$ and $\mathbf{c}_{n + 1}$ jointly (GreedyJoint). To ensure that $\mathbf{c} \in \mathbb{U}^3$, it is convenient to define it as the output of a sigmoid and some other parameter $\mathbf{t} \in \mathbb{R}^3$ so that it is strictly between $0$ and $1$. These results are shown in Figure 3. The reconstruction errors from left to right for $N = 50$ are $64.42$, $248.24$, $311.59$, $221.77$, and $128.55$ respectively. For $N = 100$ they are $30.35$, $136.15$, $161.13$, $97.91$, and $60.76$ respectively.

Figure 3: Results of GreedyJoint. First row is ground-truth, second row is with $N = 50$, third row is with $N = 100$.
The addition of each triangle up to $N = 500$ for the Mona Lisa is shown below (reconstruction error of $6.46$):


video

The reconstruction error is lower for GreedyJoint because by solving $\Delta_{n + 1}$ and $\mathbf{c}_{n + 1}$ jointly both the position and colour are updated at each iteration of LM.

But can we squeeze anything more out? Yes! Given $\mathcal{T}$ and $\mathcal{C}$ from GreedyJoint we can optimise $E(\mathcal{T}, \mathcal{C})$ with all variables jointly (AllJoint). That is, we optimise the position and colours of all triangles simultaneously. The reconstruction errors from left to right for $N = 50$ are: $48.96$, $220.02$, $280.47$, $178.04$, and $128.55$ (unchanged from GreedyJoint) respectively. Limiting the maximum function evaluations to 200, the average time for AllJoint was just under 40 minutes. These results are shown along with the output from Seidelin in Figure 4.

Figure 4: Final comparisons. First row is ground-truth, second row is with AllJoint with $N = 50$, third row is output from Seidelin.

Implementation

The Python code which implements the solvers and generates the output is available on GitHub. All of the details can be gleaned from the source code, but the basic project organisation is as follows:
  • shard.py: Implements Shard which provides $\hat M[\Delta]$ and its derivatives.
  • solve.py: Implements fit_shard and colour_shard for GreedyAlt, fit_and_colour_shard for GreedyJoint, and fit_and_colour_shards for AllJoint.
  • reconstruct.py: Implements ShardReconstructor which manages the state for GreedyAlt and GreedyJoint.
  • solve_multiple_shards.py: Main program to solve with GreedyAlt and GreedyJoint.
  • refine_multiple_shards_joint.py Main program to solve with AllJoint.
  • colate_progress.py: Helper script which uses ffmpeg to join progress figures into a movie.
Example commands can be found in misc/commands.txt and misc/run_*.bat.

So what? What's next?

So in the end, I think the main point of this post was to just show that deterministic non-linear optimisation methods like Levenberg-Marquardt are not too difficult to use and can do really quite well. Furthermore, joint optimisation of all variables can allow you to achieve even better reconstruction error, but at the "cost" of a more involved implementation.

So where to next? Well, the whole process could be sped up enormously by making using of the sparsity of the problem. In the current implementation all residuals and Jacobians are calculated densely. This simplified the solver implementation significantly and meant that I could use scipy.optimize.leastsq. However, it is inefficient because for small LM updates many of the residuals are essentially unchanged. Another performance bottleneck is the calculation of some of the Jacobians by finite differences. This was done purely for initial implementation simplicity but it is slow.

Part of me would like to see this run on some big images and at a decent speed so maybe at some point in the future I will come back to it. In the mean time ...