Researchers Invent Method to Hide Info in Plain Text

Columbia scientists have conceived of a new technique for embedding secret messages into documents. The method is the first of its kind to be document-independent and maintain the code when printed or converted.

In a statement released last week by Columbia Engineering, the department announced that a team of its researchers invented a new code for hiding information in plain text. The code, named FontCode, consists of a technique for embedding secret information in ordinary text by imperceptibly altering the shapes of fonts.

Font perturbations

FontCode functions by creating “font perturbations” that are used to encode a hidden message that can be recovered when decoded. One of the major advantages of this new application is that it works with most fonts and document types, text or image, and maintains the hidden data even on printed or converted files.

FontCode is compatible with most word processing, editing and drawing programs. Every letter can be altered by the application’s unique minute font perturbations to hide information so secret messages are only limited to the length of available text.

“Changing any letter, punctuation mark, or symbol into a slightly different form allows you to change the meaning of the document,” said Chang Xiao (PhD student), the code's co-creator. “This hidden information, though not visible to humans, is machine-readable just as barcodes and QR codes are instantly readable by computers. However, unlike barcodes and QR codes, FontCode doesn’t mar the visual aesthetics of the printed material, and its presence can remain secret.”

Although not the first method to hide message in text, FontCode is the first to be “document-independent and to retain the secret information even when a document or an image with text (PNG, JPG) is printed or converted to another file type.” 

Beyond espionage

“While there are obvious applications for espionage, we think FontCode has even more practical uses for companies wanting to prevent document tampering or protect copyrights, and for retailers and artists wanting to embed QR codes and other metadata without altering the look or layout of a document,” said Changxi Zheng, associate professor of computer science and co-creator of the code.

To avoid any mishaps during the decoding process, the researchers made use of the 1700-year-old Chinese Remainder Theorem. The theorem, discovered by Chinese mathematician Sunz, gives a unique solution to simultaneous linear congruences with coprime moduli.

The team makes use of this ancient number theory to recover original messages even when not all elements can be correctly recognized. Their studies found that messages could be accurately decoded even when 25% of the letter perturbations were not identifiable.

Innovation

The Origin of Algorithms We Use Every Single Day

“Imagine having three unknown variables,” explained Zheng. “With three linear equations, you should be able to solve for all three. If you increase the number of equations from three to five, you can solve the three unknowns as long as you know any three out of the five equations.”

The code's findings have been published in a paper entitled “FontCode: Embedding Information in Text Documents using Glyph Perturbation." Its authors have plans to extend the method to other languages and character sets including Chinese.

“We are excited about the broad array of applications for FontCode,” said Zheng, “from document management software, to invisible QR codes, to protection of legal documents. FontCode could be a game changer.”