The tinfoil interpretatio that LLMs can spy on you is shortsighted and a bit paranoid, it would require LLM providers to actually run a prompt asking what you are doing.
However, any system with a mic, like your cellphone listening for a "Hey Siri" prompt, or your fridge, could theoretically be coupled with an llm on an adhoc basis to get a fuller picture of what's going on.
Pretty cool, if an attacker or govt force with a warrant can get an audio stream they can get some clues although of course not probatory evidence.
If you're interested in this concept, it's not new and the alarm has been sounded since the android Facebook app required motion sensor permissions in android 4.
> alarm has been sounded since the android Facebook app required motion sensor permissions in android 4.
Serves as a useful reminder that just because someone may not care if these companies collect this data now, they are storing it, sometimes indefinitely, and as technology advances, will be able to make more use of it than they were at the time you agreed to share it with them.
It's like all the ransomware gangs hoarding the encrypted data they stole, waiting for a quantum computing breakthrough to be able to decrypt it.
Not sure what to do about it, if anything, but the average person is severely under-equipped and undereducated to deal with this and protect themselves from the levels of surveillance that are soon to come.
I tried denying the Sensor perm to most apps and my battery tanked. My guess is there are a few that sit in a busy loop trying to get the data with no handling of the permission not being granted, because it's expected on 99.99999% of devices
> The researchers ran the audio and motion data through smaller models that generated text captions and class predictions, then fed those outputs into different LLMs (Gemini-2.5-pro and Qwen-32B) to see how well they could identify the activity.
Maybe I'm not understanding it, but as I get it, LLMs weren't really important: all they did was further interpreting outputs of a fronting audio-to-text classifier model.
Doesn't the smartphone already far surpass the Telescreen's capabilities and presence? It does more and we carry them in our pockets.
Do people not realize we're beyond 1984? In 1984 the tech wasn't always listening, rather it had the capacity to. Much of it was about how not knowing meant you'd act as if you were just in case. It was making reference to totalitarian states where you don't know if you can freely talk to your neighbor or if they'd turn you in, where people end up creating a double speak
In 1984 the idea was there were not enough people to listen to everyone, all the time, but the mere possibility was enough. Of course, for us with AI, things are considerable worse. Also, tele screens were mandatory. We are not there with cell phones in a de jure sense, but certainly there in a de facto sense. Of course, if enough people carry phones, it doesn't matter if a few stragglers don't, they will get caught in the net unless they live as hermits, in which case who cares about them. All the pieces are in place, there is no reason we cannot have a global North Korea.
Something to note here that annoys me about the title is that the LLMs aren't taking in the raw data (LLM's are for text, after all). The raw data is fed through audio and motion models that then produce natural language descriptions, that are then fed to the LLM.
Unrelated: yeah, this article is a little creepy, but damn is it interesting technically.
One more positive interpretation of Apple's research interests here is that devices like the Watch can better differentiate between "the wearer just fell and we should call 911" and "the wearer is playing with their kids".
However, any system with a mic, like your cellphone listening for a "Hey Siri" prompt, or your fridge, could theoretically be coupled with an llm on an adhoc basis to get a fuller picture of what's going on.
Pretty cool, if an attacker or govt force with a warrant can get an audio stream they can get some clues although of course not probatory evidence.
https://par.nsf.gov/servlets/purl/10028982
https://arxiv.org/pdf/2109.13834.pdf
Serves as a useful reminder that just because someone may not care if these companies collect this data now, they are storing it, sometimes indefinitely, and as technology advances, will be able to make more use of it than they were at the time you agreed to share it with them.
It's like all the ransomware gangs hoarding the encrypted data they stole, waiting for a quantum computing breakthrough to be able to decrypt it.
Not sure what to do about it, if anything, but the average person is severely under-equipped and undereducated to deal with this and protect themselves from the levels of surveillance that are soon to come.
Maybe I'm not understanding it, but as I get it, LLMs weren't really important: all they did was further interpreting outputs of a fronting audio-to-text classifier model.
You don't need them, but they are one way to do it that people know how to implement.
Identifying patterns is fairly amenable to analytic approaches, interpreting them, less so.
Do people not realize we're beyond 1984? In 1984 the tech wasn't always listening, rather it had the capacity to. Much of it was about how not knowing meant you'd act as if you were just in case. It was making reference to totalitarian states where you don't know if you can freely talk to your neighbor or if they'd turn you in, where people end up creating a double speak
Unrelated: yeah, this article is a little creepy, but damn is it interesting technically.