Example of Denoising Dirty Documents with AutoEncoders

Samuel Chemama
4 min readDec 26, 2020

AutoEncoders is a particular neuronal network used for unsupervised task. There are a lot of goals of AutoEncoders like fraud detection, data compression, denoising images etc..

GENERATIVE MODEL

A generative model is a model that will try to learn the represents distribution of the data .

For example, in the image below, we can see that the distribution of x is modeled using 3 Gaussian distributions. Knowing the distribution we can generate new samples.

How work a AE ?

We can represent it like this :

x is the input, here it’s the pixels of the input image . Z is call the latent space, it’s a smaller dimensional space than the entrance .

For exemple, if the shape of the input image is (28,28) so there are 784 pixels, Z can be a space of only 32 . The goal of the latent space is to represent as best as possible the input intrance but in a smaller dimension .

We can compare it to the PCA . The PCA extract the more relevant information from data . To create Z we use a neuronal network (or CNN) with decreasing number of units per layer . This step is call : ENCODER

After that we have to do the opposite : From the latent space Z we want to reconstruct the original input : It’s DECODER’S step . So x̂ has the same shape of the input x.

When we fit the model, the model learn optimal weights to have the best Z ( which best represents the data) and to minimize the loss between input and the reconstruct image . If the model is perfect the output image is exactly the same as the input image.

Thanks to the encoder step, the model learn and keep the more relevant information : So it erases the noise.

Exemple

To give an example we are working on the following competition : https://www.kaggle.com/c/denoising-dirty-documents

The goal is to denoise images . For this competition we work like supervised model . We have 3 datasets :

1 ) A dataset with dirty documents (call X_train)

2) The same dirty documents but without noise (call y_train)

3) A test dataset to see how our model perform (call X_test)

We process the images of each folders

And now we are building a model who take : Dirty image in input and the corresponding own image . The model will learn a performer latent space and reconstruct an image in output as similar as possible as the input image. The latent space will keep only the relevant information of the input image, so he won’t keep the noise..

The input space is the shape of input image here : (540,420,1) , so as we see above the output space must be the same size. Here for the latent space Z we choose a (270,210,1) space (just with one MaxPooling layer to reduce the dimensional space) .

That we can see, we build a very sample CNN model . We start from an input image of shape (540,420,1) to arrive at an latent space (270,210,64) a from it we reconstruct an image of shape (540,420,1) => it’s a symetric model .

As we see previously, we want the image output look like as more as possible as the output image . To do this, we want to minimise the loss use (because each pixels is a continuous variable).

RESULTS

After training, we can apply our model result on the test set and try to generate a new image but without noise

Wow !! Impressive result .. The model can reconstruct image without noise ! Note : the third image is blurry but we can do better with a most performer model .

--

--

Samuel Chemama

Data Scientist in a French Bank, I'm passionate about Machine Learning and deep learning method .