Not so long ago, we wrote about methods that Mordechai Guri and his colleagues at Ben-Gurion University devised to extract information from a device that is not only not connected to the Internet, but also physically isolated from the network. At the Black Hat USA 2020 conference, another researcher from Ben-Gurion University presented a report on a related topic. Ben Nassi spoke about a visual eavesdropping method that he and his colleagues call Lamphone.
We’ll talk about how Lamphone works below, but let’s start with a short digression into the history of the issue.
How is it possible to see sound?
One well-known technology for remotely recording sound using so-called visual methods is the laser microphone. This technique is pretty straightforward.
The people wiretapping a conversation direct a laser beam operating in the infrared range (i.e., invisible to the human eye) at a suitable surface (typically window glass) in the room where the conversation is taking place. The beam reflects off the surface and hits the receiver. Sound waves create vibrations on the surface of the object, which in turn change the behavior of the reflected laser beam. The receiver records the changes, which are eventually converted into a sound recording of the conversation.
The technology has been in use since the Cold War era, and it has turned up in many spy films. You have probably seen it depicted in one of them. Several companies produce ready-made devices for laser eavesdropping, and their declared operating range extends to 500 or even 1,000 meters. For those worried about being the target of laser eavesdropping, however, here are two pieces of good news: First, laser microphones are very expensive; and second, manufacturers sell laser microphones only to government agencies (or so they claim).
However, according to Nassi, the active nature of laser microphones is a serious drawback. For that form of eavesdropping to work, you need to “illuminate” a surface with a laser beam, and that means an IR detector can discover it.
Several years ago, a group of researchers at the Massachusetts Institute of Technology proposed an alternative method of “visual recording” that was completely passive. Their idea was largely the same: Sound waves create vibrations on the surface of an object. Vibrations, of course, can be recorded.
To register the vibrations, the researchers used a high-speed camera at several thousand frames per second. By comparing the frames from the camera (with the help of a computer), they were able to replicate sound from the sequence of video frames.
That method also has a drawback, however, and it is a biggie. The amount of computing resources required to convert the massive amount of visual information from the high-speed camera into sound was extraordinary. Even using an extremely powerful workstation, the MIT researchers needed 2–3 hours to analyze a 5-second video recording, so the approach is clearly not a good one for picking up conversations on the fly.
How Lamphone works
Nassi and his colleagues have come up with a new “visual eavesdropping” technique they call Lamphone. The main idea of the method is using a lightbulb (hence the name of the technique) as an object from which you can capture the vibrations caused by sound.
A lightbulb is not only a very ordinary object, but it is also a bright one. Therefore, someone using a lightbulb’s vibrations does not need to waste computing resources on analyzing extremely subtle changes in the image. All they need to do is direct a powerful telescope at the lightbulb. The telescope directs the light flux from the lightbulb to an electro-optical sensor.
The lightbulb does not emit light in different directions perfectly uniformly (interestingly, the unevenness also varies across the different types of lightbulbs, being quite high for incandescent and LED bulbs but much lower for fluorescent ones). This unevenness causes the vibrations of the lightbulb (caused by sound waves) to slightly alter the intensity of the light flux that the electro-optical sensor captures. And those changes are sufficiently perceptible for recording. Having recorded the changes and made a number of simple transformations, the researchers were able to restore the sound from the resulting “light recording.”
To test their method, the researchers installed a listening device on a pedestrian bridge 25 meters from the window of the testing room, in which sound was played through a speaker. By pointing a telescope at a lightbulb in the room, the researchers were able to record the light variations and convert them into a sound recording.
The resulting recordings turned out to be quite comprehensible. For example, Shazam successfully identified the test songs “Let It Be” by the Beatles and “Clocks” by Coldplay, and Google’s speech recognition service correctly transcribed the words of Donald Trump from one of his campaign speeches.
Does Lamphone present a practical threat?
Nassi and his colleagues have succeeded in developing a truly functional method of “visual eavesdropping.” More important, the method is completely passive and therefore cannot be registered by any detector.
Note as well that unlike with the method pioneered by researchers at MIT, the calculations for decoding Lamphone recordings are extremely simple. Because the processing does not require vast computing resources, Lamphone can be used in real time.
However, Nassi admits that during the experiment, the sound in the test room was played at a very high volume. Therefore, for the moment, the results of the experiment may be mainly of theoretical interest. On the other hand, we should not underestimate the simplicity of the methods used to convert the “light recording” into sound. The technique might possibly be further refined using machine-learning algorithms, for example, which excel at these types of tasks.
At this point, the researchers assess the current feasibility of applying this technique in practice as neither extremely difficult nor easy, but somewhere in between. However, they foresee the method potentially becoming more practical — if someone can apply sophisticated algorithms of converting the electro-optical sensor’s readings into sound recordings.