Some raw data in the Epstein Files survived DOJ redactions

The Department of Justice (DOJ) appears to have failed to redact all Epstein files completely: some blacked-out documents also contain raw email data that enables the complete reconstruction of email attachments. Computer forensic experts are already working to uncover hidden pieces.

Mahmoud Al-Qudsi, the founder of NeoSmart Technologies, has shared a method and tools for reconstructing PDF documents from Epstein emails.

This is possible because the DOJ released Epstein emails in their raw code, with attachments preserved as base64-encoded data. However, there is also a challenge – the files were scanned from printouts, and text recognition tools fail to recognize some characters, making reconstruction labor-intensive.

“It’s safe to say that Pam Bondi’s DOJ did not put its best and brightest on this undertaking,” said Al-Qudsi in a blog post, which also mentions other blunders, such as forgetting to redact passwords and other credentials.

Don't miss our latest stories on Google News. Add us as your Preferred Source on Google

Add us as your Preferred Source on Google.

Accidental find

Al-Qudsi explains that the method to unredact some files was discovered by accident. The expert was searching the Epstein Files for some emails for another story when they came across “a curious artifact.”

Some Epstein emails were unusually long, containing pages of base64 code. Base64 is the way the SMTP email protocol encodes attachments for transmission over the wire.

Essentially, all images, PDFs, or other attachments are sent within the body of the original email as scrambled text, and the DOJ released this raw data along with the emails.

“The unlucky intern that was assigned to the documents in question didn’t realize the significance of what they were looking at and didn’t see the point in censoring seemingly meaningless page after page of hex content,” Al-Qudsi said.

Challenge: DOJ used a font with nearly identical “1” and “l” characters

Converting the exposed base64 back to the original attachments is not straightforward. Before the release, the documents were printed, then scanned again.

“The real problem is that the text is rendered in possibly the worst typeface for the job at hand: Courier New,” Al-Qudsi explains.

OCR tools struggle to correctly recognize the symbols due to low-quality scans and the similarity between the “1” and “l” characters.

“In the Epstein PDFs released by the DOJ, we only have low-quality JPEG scans at a fairly small point size,” Al-Qudsi said.

Al-Quisi posted a “nerdsnipe” challenge to decode the discovered email attachment using brute-force or other methods. The community member on Hacker News already shared the first page of the recovered document – it appears to be an invitation to a public event. “Nerds” used AI assistance to develop a script.

While an impressive discovery, this method’s value is limited. Searching the released Epstein Files library for “base64” currently returns only 173 results, and even fewer of them contain actual attachments.

The label for PDF attachments, “application/pdf,” returns 70 search results, but most of them do not contain any attachments. There may be additional attachments that aren’t picked up by the search.

This method will not help to recreate redacted emails, chats, and the majority of other Epstein files. And emails with raw code exposed seem to be isolated occurrences.

Despite that, the discovery and challenge itself attracted massive interest among cybersecurity professionals on Reddit.

“There needs to be an ‘Epstein Files’ Village at the next DEFCON, where everyone can work on unmasking perpetrators,” one of the cyber pros suggested.

Unlock more exclusive Cybernews content on YouTube

Cyber pros found a way to unredact some Epstein PDFs: emails contain raw code

More from Cybernews

Accidental find

Challenge: DOJ used a font with nearly identical “1” and “l” characters