This post is a high-level introduction to hiding messages in images using Fourier Transforms on the color data. This technique is less susceptible to accidental destruction than techniques like Least Significant Bit steganography, while remaining far more challenging to detect than metadata-based approaches like storing secret messages in image comments. No background in steganography or Fourier Transforms is expected. This post is largely based on “Image Steganography and Steganalysis”, by Mayra Bachrach and Frank Shih.
Our objective is to hide secret messages, whether they be text or arbitrary files, inside images. Images are an attractive secret-message envelope, since they can be transferred in a number of ways (texting, emails, posting on forums, sharing through Google Photos, etc), and do not raise suspicions in many contexts. Before discussing Fourier Transform steganography, we’ll talk about some simpler approaches as context.
Most images include metadata for storing various statistics about the image contents. This metadata can be viewed and edited with tools like exiftool:
% exiftool qr.png ExifTool Version Number : 11.01 File Name : qr.png Directory : . File Size : 9.7 kB File Modification Date/Time : 2019:04:27 20:10:55-04:00 File Access Date/Time : 2019:04:27 20:10:57-04:00 File Inode Change Date/Time : 2019:04:27 20:10:56-04:00 File Permissions : rw-rw-rw- File Type : PNG File Type Extension : png MIME Type : image/png Image Width : 246 Image Height : 246 Bit Depth : 8 Color Type : RGB with Alpha Compression : Deflate/Inflate Filter : Adaptive Interlace : Noninterlaced Image Size : 246x246 Megapixels : 0.061
A first trivial attempt at message hiding is simply putting your secret message in one of these metadata fields. Unfortunately, this is easy to detect with automated tools, as most images won’t have many human-readable strings in them. This data may also be accidentally deleted, as many web services strip image data intentionally, or unintentionally lose it when translating from one image format (like PNG) to another (like JPG).
A slightly more sophisticated solution is Least Significant Bit steganography. The synopsis is:
Every pixel’s color is represented as three bytes, for Red, Green, and Blue
A change to the least significant bit will result in a nearly-identical color, and the difference will not be perceptible to the human eye
We can represent our secret message as a binary sequence
We can set the least significant bit of each pixel to the bits from our secret message
Done! And no longer trivially detectable! Even if someone does find your message, it will be hard to prove it is a secret message if it’s encrypted. Unfortunately this method is also susceptible to accidental breaks: If an image is resized or translated to another format then it will be recompressed, and these least significant bits are likely to be damaged in the process.
We want an equally secret message encoding system that is as difficult to detect, but less susceptible to damage.
The math behind Fourier Transforms is complicated, but the intuition is not. Consider a sine wave:
This wave can be described as a frequency and amplitude - let those be x- and y-coordinates in 2-D space:
We can add a second wave, with a different frequency:
And when we combine the two signals we can represent the combination as two points in frequency-amplitude space, representing the two waves we’ve added:
(The code used to generate the above images can be found here)
This leads to three conclusions:
Any arbitrarily complicated signal can be represented as a series of sine waves, layered on top of one another
A finite-length signal can be represented with a finite number of sine waves
The original signal can be reconstructed by taking the constituent sine waves and combining them
The Discrete Fourier Transform is an algorithm that derives these waves, given a signal of finite length described as discrete samples.
Why would we want to represent waves in this way? A plethora of reasons. Noise reduction, by deleting all waves with an amplitude below a specific threshold. Noise isolation, by deleting all waves not within a specific frequency range (such as the human vocal range). Noise correction, by shifting the frequency of some waves (as used in auto-tune). Ultimately, the Fourier Transform is the foundation of much audio, video, and image compression and manipulation.
The paper is a little hand-wavey on this part, but images can be expressed as a layering of sine and cosine waves with input based on the pixel coordinates. As with the one-dimensional Fourier transforms used in audio analysis, this process can be reversed to reproduce the original image. The number of samples and waves used determines the accuracy of the approximation, and thus the accuracy when inverting the function to recreate the original image. This video goes into further detail on the process, which is commonly used for image editing and compression.
Next, the user expresses their embedded message as a Fourier series. This can be done in a variety of ways, from adapting the waveform of an audio message, to encoding text as a bitsequence and solving the Fourier series for that sequence, to simply Fourier encoding a second image. Once the user has a message encoded as a Fourier series they can easily superimpose the signal by adding coefficients to the corresponding polynomials in the image matrix. The matrix can then be reversed, translating from the frequency domain back to the spatial image domain. The effect is a slight dithering, or static, applied to the image. By shifting the frequency of the hidden message up or down the user may adjust the static properties until a subtle effect is achieved.
The steganographic data can be relatively easily extracted given a copy of the original image. Comparing the pixels of the original and modified image can demonstrate that something has been changed, but not a discernible pattern that can be distinguished from artifacting resulting from lossy image compression, such as what one would see by switching the data format from PNG to JPEG. However, by converting both images to their Fourier matrix representation and subtracting from each other, anyone can retrieve the polynomial representing the encoded message. If the message was frequency adjusted to minimize visual presence, it must now be frequency shifted back, before decoding from Fourier to the original format (audio, bitsequence, etc).
If the unaltered image is not available, because the photo is an original rather than something taken from the web, then a simple delta is impossible. Instead, statistical analysis is necessary. Once again, the Fourier transform is critical, as it allows for pattern recognition and signal detection, differentiating between normal image frequencies and the structured data resulting from layering a message on top of the image.
The same Fourier-delta technique can be used for the more difficult task of detecting and extracting steganography of an unknown format. In this case, we are given an image, and need to establish both whether there is a hidden message, and preferably, what it is. Given an arbitrary image, we first need to establish a baseline. We can perform a reverse image search and find similar images, with an identical appearance but different hashes. We then compare each baseline image to the possibly steganographic image by converting both to Fourier matrices and calculating a delta, as above. We must then perform noise reduction to remove minor perturbations such as image re-encoding and re-sizing artifacting. If the remaining delta is statistically significant, then there is evidence of a secret signal. This completes the first step, identifying the presence of a steganographic message.
Unfortunately, interpreting this identified message is beyond the scope of the paper. Both participants in a secret conversation can pre-agree on an encoding scheme such as audio, bitstrings, or an embedded image. Given only a frequency spectrum, an analyst needs to attempt multiple encodings until something meaningful is produced. Particularly if the frequency-shifting outlined above has been performed, this is an extremely tedious process, better suited (at least so far) to manual inspection and intuitive analysis than a purely automated design.
Network Science is a relatively young discipline, which uses a small number of basic models to represent a wide variety of network phenomenon, ranging from human social interactions, to food webs, to the structure of the power grid.
This post focuses on social communities, historically modeled as random networks, where every person has a percent change of having a social connection to any other person. The following is an example of a random network, where the circular nodes represent people, and the edges between circles indicate a social link:
Of particular interest to me are information cascades, where some “state” is passed along from node to node, rippling through the network. These “states” can represent the spread of disease, a meme, a political ideology, or any number of things that can be passed on through human interaction.
A starting point for understanding information cascades is disease propagation. Traditionally, infection models run in discrete timesteps, and assume that the infection has a percent chance of spreading across an edge each turn, and will continue to spread until the infected are no longer contagious. This is an adequate model for large-scale disease propagation, where each disease has a different contagious period and infection rate. This basic model is less accurate when applied to small communities, where the differences between individuals cannot be statistically ignored, or when applied to the spread of information rather than disease.
Two research papers by Santa Fe Institute researchers extend the infection model to better reflect human behavior and repurpose it to examine the spread of ideas. Duncan Watts’ “A Simple Model of Global Cascades on Random Networks” proposes that each individual has a different susceptibility to infection, which they call an “Activation Threshold”, represented as a percentage of a node’s peers that must be infected before the node will be infected.
To understand this idea, consider a social platform like Twitter or Google+ in its infancy. Some people are early adopters - as soon as they hear a friend is on the platform they will join the platform to participate. Other people are more apprehensive, but once a majority of their peers have Facebook accounts then they, too, will make Facebook accounts so they do not miss out on significant social community. Early adopters have a low activation threshold, apprehensive people have a high threshold.
The result is that a cascade can only succeed in overwhelming a community if it reaches a “critical mass” where adoption is high enough to pass every member’s activation threshold:
If the activation threshold is too high, or the cascade starts in a poor location where many peers have high thresholds, then the cascade will be unable to reach a critical mass, then the infection will fizzle out. This may explain why some ideas, like Twitter or Facebook, succeed, while others like Google+ or Voat, do not. A failed cascade can convert a small cluster of users, but is ultimately contained by high threshold peers:
Akbarpour and Jackson propose a radically different, if simple, extension in “Diffusion in Networks and the Virtue of Burstiness”. They argue that it is insufficient for two people to have a social link for an idea to spread - the users must meet in the same place at the same time. People fall in one of three patterns:
Active for long periods, inactive for long periods. Think of office workers that are only at work from 9 to 5. If two offices are geographically separated, they can only share information while their work days overlap.
Active then inactive in quick succession. Perhaps someone that takes frequent breaks from work to have side conversations with their neighbors.
Random activity patterns. Likely better for representing online communities where someone can check in from their phone at any time.
While random networks are simple, human communities are rarely random. Since their introduction in 1999, human communities have more frequently been modeled as scale-free networks. In a scale-free network, new members of the community prefer to connect to well-connected or “popular” members. This results in a small number of highly connected, very social individuals referred to as “hubs”, each with many peers with far fewer connections. Here is an example scale-free network demonstrating the hub-clustering phenomenon:
My objective is to combine the above two papers, and apply the results to scale-free networks. In this configuration, the activity pattern and activation threshold of the hubs is of paramount importance, since the hubs act as gatekeepers between different sections of the network.
When a hub is infected, the cascade will spread rapidly. The hub is connected to a large number of peers, and makes up a significant percentage of the neighbors for each of those peers, since most nodes in the network only have a handful of connections. This means an infected hub can overcome even a relatively high activation threshold.
Compromising even a single hub will allow the cascade to reach new branches of the network and dramatically furthers infection:
However, infecting a hub is challenging, because hubs have so many peers that even a low activation threshold can prove to be an obstacle. Without capturing a hub, the spread of a cascade is severely hindered:
Even with highly susceptible peers, well-protected hubs can isolate a cascade to a district with ease:
So far, I have implemented Watts’ idea of “activation thresholds” on scale-free networks, and have implemented-but-not-thoroughly-explored the activity models from the Burstiness paper. The next step is to examine the interplay between activity types and activation thresholds.
These preliminary results suggest that highly centralized, regimented networks are better able to suppress cascades, but when a cascade does spread, it will be rapid, dramatic, and destabilizing. A fully decentralized, random network has far more frequent and chaotic cascades. Is there a mid-level topology, where small cascades are allowed to occur frequently, but the thread of a catastrophic cascade is minimized? I would like to investigate this further.
Taking a break from recent social architecture posts, here’s some more technical security content. It’s a pretty soft introduction to reverse engineering and binary patching for those unfamiliar with the topic.
One of the tools I use in my social research recently became abandonware. The product requires annual registration, but earlier this year the developer’s website disappeared, no one has been able to reach them by email for months, and I literally cannot give them my money to reactivate the product. Unfortunately, I am still reliant on the tool for my work. With no alternative, let’s see if the software can be reactivated through more technical means.
The tool in question is a graph layout plugin for Cytoscape, so it’s distributed as a JAR file. JAR files are bundled executables in Java - they’re literally ZIP archives with a few specific files inside.
The goal is to find where the software checks whether it’s registered, and patch the binary so it always returns true.
Since this is Java, we’ll start with a decompiler. The goal of a decompiler is to go from compiled bytecode back to human-readable sourcecode in a language like Java or C. On platforms like x86 this is often very messy and unreliable, both because x86 is disgustingly complicated, and because there are many valid C programs that could compile to the same assembly. Fortunately, today we’re working with JVM bytecode, which is relatively clean and well-understood, so the JVM to Java decompilers are surprisingly capable.
I’ll be using JD-Gui, the “Java Decompiler”. Sounds like the tool for the job. Open up JD-GUI, tell it to open the JAR file, and we’re presented with a screen like this:
Wandering through the
.class files in
com (where most of the code resides), we eventually find reference to
public static boolean isActivated(), which sure sounds promising. Here’s the method definition:
If either the product is activated, or it’s still in a trial period, the function returns true. This appears to be our golden ticket. Let’s change this method so that it always returns true.
There are two techniques for patching Java code. The apparently simple option would be to use a decompiler to get Java code out of this JAR, change the Java code, then recompile it back in to a JAR. However, this assumes the decompiler was 100% perfectly accurate, and the JAR you get at the end is going to look rather different than the one you started with. Think throwing a sentence in to Google Translate and going from English to Japanese back to English again.
The cleaner technique is to leave the JAR intact, and patch this one method while it is still JVM bytecode. The decompiler helped find the target, but we’ll patch at the assembly level.
First, we’ll need to extract contents of the JAR. Most JVM bytecode editors work at the level of
.class files, and won’t auto-extract and repack the JAR for us.
$ mkdir foo $ cp foo.jar foo $ cd foo $ jar -xf foo.jar
Now all the individual files visible in the JD-Gui sidebar are unpacked on the filesystem to edit at our leisure. Let’s grab the Java Bytecode Editor, open the
DualLicenseManager.class file, and scroll to the
Above is the assembly for the short Java if-statement we saw in JD-GUI. If
isProductActivated() returns true, jump to line 14 and push a 1 on the stack before jumping to line 19. Else, if
isTrialActivated() returns false, jump to line 18 and push a 0 on the stack. Then, return the top item on the stack.
Uglier than the Java, but not hard to follow. The patch is trivially simple - change the
iconst_0 to an
iconst_1, so that no matter what, the method always pushes and returns a 1.
Then we save the method, and it’s time to re-pack the JAR.
Re-creating the JAR is only slightly more complicated than unpacking:
$ jar -cvf foo.jar * $ jar -uvfm foo.jar META-INF/MANIFEST.MF
For some reason creating a JAR from the contents of a folder ignores the manifest and creates a new one. Since we specifically want to include the contents of the manifest file (which include some metadata necessary for the plugin to connect with Cytoscape), we explicitly update the manifest of the JAR with the manifest.mf file.
From here, we can reinstall the plugin in Cytoscape and it runs like a charm.
Often, this kind of binary patching is less straightforward. This work would have been much more time-consuming (though not much more challenging) if the executable had not included symbols, meaning none of the method names are known. Most C and C++ executables do not include symbols, so you are forced to learn what each function does by reading the assembly or looking at the context in which the function is called. This is done for performance more than security, since including symbols makes the executable larger, and is only helpful for the developers during debugging.
More security-minded engineers will use tools like Packers to obfuscate how the code works and make it more difficult to find the relevant method. These are not insurmountable, but usually require watching the packer decompress the main program in memory, waiting for the perfect moment, then plucking it from memory to create an unpacked executable.
Another option is including some kind of checksumming so that the program can detect that it has been tampered with, and refuses to run, or modifies its behavior. This is rarely helpful however, since the reverse engineer can simply look for the appropriate
hasBeenTamperedWith() function and patch it to always return false. An insidious programmer could try to hide this kind of code, but it’s a cat-and-mouse game with the reverse engineer that will only buy them time.
Ultimately, this tool was built by scientists, not battle-hardened security engineers, and included no such counter-measures. Should they resurface I will gladly continue paying for their very useful software.
There’s been a lot of discourse recently about the responsibility social media giants like Facebook and Twitter have to police their communities. Facebook incites violence by allowing false-rumors to circulate. Twitter only recently banned large communities of neo-nazis and white supremacists organizing on the site. Discord continues to be an organizational hub for Nazis and the Alt-Right. There’s been plenty of discussion about why these platforms have so little moderatorship, ranging from their business model (incendiary content drives views and is beneficial for Facebook), to a lack of resources, to a lack of incentive.
I’d like to explore a new side of the issue: Why should a private company have the role of a cultural censor, and how can we redesign our social media to democratize censorship?
To be absolutely clear, censorship serves an important role in social media in stopping verbal and emotional abuse, stalking, toxic content, and hate speech. It can also harm at-risk communities when applied too broadly, as seen in recent well-intentioned U.S. legislation endangering sex workers.
Censorship within the context of social media is not incompatible with free-speech. First, Freedom of Speech in the United States is largely regarded to apply to government criticism, political speech, and advocacy of unpopular ideas. These do not traditionally include speech inciting immediate violence, obscenity, or inherently illegal content like child pornography. Since stalking, abuse, and hate-speech do not contribute to a public social or political discourse, they fall squarely outside the domain of the USA’s first amendment.
Second, it’s important to note that censorship in social media means a post is deleted or an account is locked. Being banned from a platform is more akin to exile than to arrest, and leaves the opportunity to form a new community accepting of whatever content was banned.
Finally there’s the argument that freedom of speech applies only to the government and public spaces, and is inapplicable to a privately-owned online space like Twitter or Facebook. I think had the U.S. Bill of Rights been written after the genesis of the Internet this would be a non-issue, and we would have a definition for a public commons online. Regardless, I want to talk about what should be, rather than what is legally excusable.
Corporations have public perceptions which effect their valuations. Therefore, any censorship by the company beyond what is legally required will be predominantly focused on protecting the ‘image’ of the company and avoiding controversy so they are not branded as a safe-haven for bigots, nazis, or criminals.
Consider Apple’s censorship of the iOS App Store - repeatedly banning drone-strike maps with minimal explanatory feedback. I don’t think Apple made the wrong decision here; they reasonably didn’t want to be at the epicenter of a patriotism/pro-military/anti-war-movement debate, since it has nothing to do with their corporation or public values. However, I do think that it’s unacceptable that Apple is in the position of having this censorship choice to begin with. A private corporation, once they have sold me a phone, should not have say over what I can and cannot use that phone to do. Cue Free Software Foundation and Electronic Frontier Foundation essays on the rights of the user.
The same argument applies to social media. Facebook and Twitter have a vested interest in limiting conversations that reflect poorly on them, but do not otherwise need to engender a healthy community dynamic.
Sites like Reddit that are community-moderated have an advantage here: Their communities are self-policing, both via the main userbase downvoting inappropriate messages until they are hidden, and via appointed moderators directly removing unacceptable posts. This works well in large subreddits, but since moderators have authority only within their own sub-communities there are still entire subreddits accepting of or dedicated to unacceptable content, and there are no moderators to review private messages or ban users site wide. A scalable solution will require stronger public powers.
The privacy, anonymity, and counter-cultural communities have been advocating “federated” services like Mastadon as an alternative to centralized systems like Twitter and Facebook. The premise is simple: Anyone can run their own miniature social network, and the networks can be linked at will to create a larger community.
Privacy Researcher Sarah Jamie Lewis has written about the limitations of federated models before, but it boils down to “You aren’t creating a decentralized democratic system, you’re creating several linked centralized systems, and concentrating power in the hands of a few.” With regards to censorship this means moving from corporate censors to a handful of individual censors. Perhaps an improvement, but not a great one. While in theory users could react to censorship by creating a new Mastadon instance and flocking to it, in reality users are concentrated around a handful of large servers where the community is most vibrant.
A truly self-regulatory social community should place control over censorship of content in the hands of the public, exclusively. When this leads to a Tyranny of the Majority (as I have no doubt it would), then the effected minorities have an incentive to build a new instance of the social network where they can speak openly. This is not an ideal solution, but is at least a significant improvement over current power dynamics.
Community censorship may take the form of voting, as in Reddit’s “Upvotes” and “Downvotes”. It may involve a majority-consensus to expel a user from the community. It may look like a more sophisticated republic, where representatives are elected to create a temporary “censorship board” that removes toxic users after quick deliberation. The key is to involve the participants of the community in every stage of decision making, so that they shape their own community standards instead of having them delivered by a corporate benefactor.
Care needs to be taken to prevent bots from distorting these systems of governance, and giving a handful of users de-facto censorship authority. Fortunately, this is a technical problem that’s been explored for a long time, and can be stifled by deploying anti-bot measures like CAPTCHAs, or by instituting some system like “voting for representatives on a blockchain”, where creating an army of bot-votes would become prohibitively expensive.
This should be not only compatible, but desirable, for social media companies. Allowing the community to self-rule shifts the responsibility for content control away from the platform provider, and means they no longer need to hire enormous translator and moderator teams to maintain community standards.
I recently got to see a talk at the Chaos Communication Congress titled “When the Dutch secret service knocks on your door”, with the following description:
This is a story of when the Dutch secret service knocked on my door just after OHM2013, what some of the events that lead up to this, our guesses on why they did this and how to create an environment where we can talk about these things instead of keeping silent.
Since the talk was not recorded, the following is my synopsis and thoughts. This post was written about a week after the talk, so some facts may be distorted by poor memory recall.
The speaker was approached by members of the Dutch secret service at his parents’ house. They initially identified themselves as members of the department of the interior, but when asked whether they were part of the secret service, they capitulated.
The agents began by offering all-expenses-paid travel to any hackathon or hackerspace. All the speaker needed to do was write a report about their experience and send it back. A relatively harmless act, but it means they would be an unannounced informant in hacker communities.
When the author refused, the agents switched to harder recruitment techniques. They pursued the author at the gym, sat nearby in cafes when the author held meetings for nonprofits, and likely deployed an IMSI catcher to track them at a conference.
Eventually, the author got in contact with other members of the hacker community that had also been approached. Some of them went further through the recruitment process. The offers grew, including “attend our secret hacker summer camp, we’ll let you play with toys you’ve never heard of,” and “If you want to hack anything we can make sure the police never find out.” In either of these cases the recruit is further indebted to the secret service, either by signing NDAs or similar legal commitments to protect government secrets, or by direct threat, wherein the government can restore the recruit’s disappeared criminal charges at any time.
I have two chief concerns about this. First, given how blatant the secret service was in their recruitment attempts, and that we only heard about their attempts in December of 2017, we can safely assume many people accepted the government’s offer. Therefore, there are likely many informants working for the secret service already.
Second, this talk was about the Netherlands - a relatively small country not known for their excessive surveillance regimes like the Five Eyes. If the Netherlands has a large group of informants spying on hackerspaces and conferences around the globe, then many other countries will as well, not to mention more extreme measures likely taken by countries with more resources.
From this, we can conclude there are likely informants in every talk at significant conferences. Every hackerspace with more than token attendance is monitored. This is not unprecedented - the FBI had a vast array of informants during the COINTELPRO era that infiltrated leftist movements throughout the United States (along with much less savory groups like the KKK), and since shortly after 9/11 has used a large group of Muslim informants to search for would-be terrorists.