Hackers can take control of the world’s most popular voice assistants by whispering to them in frequencies humans can’t hear.
Chinese researchers have discovered a terrifying vulnerability in voice assistants from Apple, Google, Amazon, Microsoft, Samsung, and Huawei. It affects every iPhone and Macbook running Siri, any Galaxy phone, any PC running Windows 10, and even Amazon’s Alexa assistant.
Using a technique called the DolphinAttack, a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants. This relatively simple translation process lets them take control of gadgets with just a few words uttered in frequencies none of us can hear.
The researchers didn’t just activate basic commands like “Hey Siri” or “Okay Google,” though. They could also tell an iPhone to “call 1234567890” or tell an iPad to FaceTime the number. They could force a Macbook or a Nexus 7 to open a malicious website. They could order an Amazon Echo to “open the backdoor” (a pin would also be required, an August spokesperson clarifies). Even an Audi Q3 could have its navigation system redirected to a new location. “Inaudible voice commands question the common design assumption that adversaries may at most try to manipulate a [voice assistant] vocally and can be detected by an alert user,” the research team writes in a paper just accepted to the ACM Conference on Computer and Communications Security.
In other words, Silicon Valley has designed human-friendly UI with a huge security oversight. While we might not hear the bad guys talking, our computers clearly can. “From a UX point of view, it feels like a betrayal,” says Ame Elliott, design director at the nonprofit SimplySecure. “The premise of how you interact with the device is ‘tell it what to do,’ so the silent, surreptitious command is shocking.”
To hack each voice assistant, the researchers used a smartphone with about $3 of additional hardware, including a tiny speaker and amp. In theory, their methods, which are now public, are duplicatable by anyone with a bit of technical know-how and just a few bucks in their pocket.
In some cases, these attacks could only be made from inches away, though gadgets like the Apple Watch were vulnerable from within several feet. In that sense, it’s hard to imagine an Amazon Echo being hacked with DolphinAttack. An intruder who wanted to “open the backdoor” would already need to be inside your home, close to your Echo. But hacking an iPhone seems like no problem at all. A hacker would nearly need to walk by you in a crowd. They’d have their phone out, playing a command in frequencies you wouldn’t hear, and you’d have your own phone dangling in your hand. So maybe you wouldn’t see as Safari or Chrome loaded a site, the site ran code to install malware, and the contents and communications of your phone were open season for them to explore.
The exploit is enabled by a combination of hardware and software problems, the researchers explain in their paper. The microphones and software that power voice assistants like Siri, Alexa, and Google Home can pick up inaudible frequencies–specifically, frequencies above the 20KhZ limits of human ears.
According to Gadi Amit, founder of NewDealDesign and industrial designer of products like the Fitbit, the design of such microphones make them difficult to secure from this type of attack. “Microphones’ components themselves vary in type, but most use air pressures that probably cannot be blocked from ultrasounds,” Amit explains. Basically, the most popular mics of today transform turbulent air–or sound waves–into electrical waves. Blocking those super-hearing capabilities might be impossible.
That means it’s up to software to decipher what’s human speech and what’s machine speech. In theory, Apple or Google could just command their assistants to never obey orders from someone speaking at 20kHz with a digital audio filter: “Wait, this human is telling me what to do in a vocal range they can’t possibly speak! I’m not going to listen to them!” But according to what the Zhejiang researchers found, every major voice assistant company exhibited vulnerability with commands stated above 20kHz.
Why would the Amazons and Apples of the world leave such a gaping hole that could, potentially, be so easily plugged by software? We don’t know yet, though we’ve reached out to Apple, Google, Amazon, Microsoft, Samsung, and Huawei for comment. But at least two theories are perfectly plausible, and both come down to making voice assistants more user-friendly.
The first is that voice assistants actually need ultrasonics just to hear people well, compared to analyzing a voice without those high frequencies. “Keep in mind that the voice analyzing software might need every bit of ‘hint’ in your voice to create its understanding,” says Amit of filtering out the highest frequencies in our voice systems. “So there might be a negative effect that lowers the comprehension score of the whole system.” Even though people don’t need ultrasonics to hear other people, maybe our computers rely upon them as a crutch.
The second is that some companies are already exploiting ultrasonics for their own UX, including phone-to-gadget communication. Most notably, Amazon’s Dash Button pairs with the phone at frequencies reported to be around 18kHz, and Google’s Chromecast uses ultrasonic pairing, too.