How to safely blur or pixelize text and images (an experiment)
A lot of hoopla was pushed in late December when reports came out of a new tool, Depix, that could uncover pixelized images. Essentially, if you use some sort of pixelation to hide sensitive information in a screenshots – like your password – this tool claimed to be able to undo that process (well, if you used a linear box filter) and access that information.
The screenshots provided as proof certainly seemed strong:
The recovered image looks pretty readable, and if the tool could deliver on its promises, it could be a game changer. Here at CyberNews, we often pixelize sensitive data, like PII or passwords, when we discuss a data breach or investigation.
Ever curious, we decided to see what kind or levels of pixelization would be strong enough to beat the tool.
Even further, since we also pixelize faces (for example, in screenshots of leaked passports or other IDs), we wanted to see what kind of deblurring or depixelization would uncover the real faces. Similar to the text depixelization, we set up a few tests, hoping to use the results to make our pixelization even stronger.
Unfortunately, it didn’t really get that far. Spoiler alert: none of these tools or methods were able to uncover any text or images in any useful way.
Nonetheless, as is often the case in Hollywood family movies, the real lesson was in the journey all along. With that, we’ll show you how you can futureproof your censored screenshots so that no one will be able to uncover that sensitive information.
Testing the methods
In this experiment, I wanted to look at multiple ways to decensor pixelized or blurred text and images. Of course, there’ll be a chorus of people now clamoring that it’s pretty difficult, near impossible to uncover censored images.
And the reason is pretty solid: text has a limited number of options – letters, numbers, and special characters – in a large, but limited, number of arrangements, while images can be pretty much of anything in pretty much any arrangement.
And, naturally, we think the same. Nonetheless, to be on the safe side, we’re looking at some simple “solutions” that have been proposed for decensoring these images.
For the images, we tried the method proposed by Somdev Sangwan to uncover pixelized images via the blur function. For text, we used Depix by Sipke Mellema.
Pixelized faces, once blurred, may not reveal much
Sangwan’s method requires using the Gaussian blur tool in Gimp, although Photoshop or other image editing software would work just as well.
First, you’d need a pixelized image. Then, you blur it until the squares disappear. After that, you sort of inverse the image to reveal the edges, pass through soft light, and end up with a less blurry image. After that, if you can’t quite make out the person, load that image into a reverse image search like Yandex (arguably the better reverse image search engine) and get the match.
Theoretically, that sounds great. Let’s see how it actually worked. For this one, I tried two versions. First, I used Obama.
In the top left, you can see the pixelized version, and the bottom right is the final version of this process. (Don’t ask me what’s the material difference between the top-right blur and the bottom-right blur.)
When I plugged it into Yandex’s reverse image search, we got a hit:
OK, not too bad, right? But that image of Obama is so popular, you probably could’ve guessed it without any depixelization. Beyond that, the picture was a bit on the higher-resolution side.
For us, the most likely scenario is for a lower-resolution image of a non-popular person. Enter: me.
I ran the same process with my own face, and here’s that process:
Immediately, you can see a lot more noise on my face from the initial pixelization. Both of the blurring steps look similar again. Uploading that to Yandex, we get this:
Not really that good. Let’s see how the text experiment goes.
Depixing a small, inconsequential part
Mellema’s Python-powered tool has the following steps:
- Cut out the pixelized text you want to Depix as a single rectangle
- Create a De Bruijn sequence, and take a picture of that. Essentially, a De Bruijn sequence is like putting all unique, possible combinations together. You should optimally take the screenshot of this sequence using the same tool with which the pixelized password was created. So, for our experiment, I used Notepad to create the pixelized password, keeping the same size font.
- Run the Depix script
Pretty simple, right? To understand how it works, you can think of the difference between encryption and hashing. When you encrypt something, like a password, it is meant to be decrypted. However, hashing is more like a one-way mathematical function, since it’s never meant to be undone. Just like blended fruit can never be unblended, so hashed information can never be “dehashed.”
So, the way that researchers and others “dehash” a password, for example, is by hashing a wide variety of common passwords to find the same hashed outcome. So, let’s say some hashed password has this value:
5f4dcc3b5aa765d61d8327deb882cf99
And I know that it was hashed using some particular algorithm, in this case MD5. I’ll hash a large library of passwords and see which one matches the hashed value from above:
Depixelizing using Depix works in the same way. This time, it uses the De Bruijn number-combo image and checks the pixelized versions of all of those combinations to find the matching pattern.
Now, with all that said and done, how did it do?
Terribly.
This was my original password and the pixelized version of just the password, using Notepad:
After running Depix, this is what I got:
Nice. Reading through the discussion on Github, it seems many users had similar problems:
Nonetheless, I read through one of the solutions and Mellema advised using the unpixelized version of the password as a substitute for the De Bruijn sequence as a way to prove that Depix works. Didn’t really make sense since it’s not practical, but I did it anyway. This is the result:
Hmm.
Much ado about nothing
So, what have we learnt today? Pretty much, that current methods for deblurring or depixing (or, in general, decensoring) censored images or text is not really effective in any practical sense.
However, it’s best to note that this is the current scene. Will that change in the future? Of course, it’s just a matter of when.
So far, we can probably make some recommendations based on how any image decensoring tools would work: destroy the original information as much as possible.
Two ways:
- The most recommended is to use a simple, full-on, pitch black box over the information you want to censor, whether it’s text or an image. It can’t be transparent in any way, so opacity should be 100%. This is why redacted information in government documents is always blacked out.
- Pixelize low-resolution images. It’s probably best to pixelize a few times, if you prefer the aesthetics of pixelization vs a black CIA-style-redaction box.
- It's probably not best to blur, swirl, or apply any mildly destructive processes to your text or images, since they may be reversed. When in doubt, use a solid black box.
Comments
If you are trying to redact just a short name in some text, eg Tim, that might not be good enough for privacy.
Your email address will not be published. Required fields are markedmarked