> According to the report, Anthropic was holding talks with Amazon, the company’s major investor and partner, and voice-focused AI startup ElevenLabs, to possibly drive future voice features for Claude.
> It’s unclear which of those partnerships, if any, came to fruition.
1. Start and stop button. I love this explicit control over who is talking when.
2. Ability to upload files while the voice chat is going. Great idea. Often times I use gpt voice chat for studying, and it's annoying when I need to add another PDF to the context, since I need to stop the chat, upload, and then restart the voice session.
3. Real-time text display during voice chat. I asked you to take the derivative of a function I described, and it outlined its steps, but it wasn't just the transcription of what it was saying.
Things I hate:
1. The transcription is terrible. It took me 10 tries during the conversation to describe f(x) = x^2. Looking back on the transcriptions, it's literally nonsense.
2. There was a buggy moment when the voice conversation started but it was still demoing all the voice options simultaneously. Need some polishing.
I thought transcription was a solved problem now. I run whisper at home and it's blazing fast and accurate with the large model <3. If anthropic is much worse they need to up their game. Or just use Whisper until they do.
Yet, using Abacus.AIs mobile app, you do not need a.. talk.. no talk UI control. It detects when you interject. Would be a nice feature for Claude as well.
There was a seemingly odd quick sequence of announcements from elevenlabs the last 24 hours, makes me think it's them - notably, I believe they launched 2.0 of their conversational AI today.
I have no idea how anyone can go through that many tokens and maintain coherent code. Really, I think I’m missing something I would love to see a video of this being done live. My own experience (since 2022) is having to keep a very close eye on everything that’s happening. Refactoring manually. Going between models. Reformulating the prompt. Etc.
Having a design doc, implementation and testing plan, strict linter, and strict compiler helps keep the robots on the rails IMHO.
But even then, I never let it git add or git commit, and about half the time it runs in “ask me before you do any edits” mode and re-guiding it in real time as I see things going sideways.
They’re dead to me until they fix their over-aggressive auto-ban. Having done nothing more than traveling frequently, rarely using VPN and only using it for coding, I was caught up in a random inexplicable auto-ban. Zero customer service. Appeal process that leads to a black hole. Whatever their technical advances, their user experience when something goes awry is terrible.
Yeah but it's XML not pydantic which means it doesn't play well with failovers to other providers. It would be tolerable if Anthropic didn't have such abysmal API uptime but at this point no way will I use them for my SaaS.
I really want to like Claude, but I hit their limit WAY too early when I PAID for it, 9 months ago, WAY before I hit any type of limit on gippity. (gippity - gpt , gimminy - gemini).
> According to the report, Anthropic was holding talks with Amazon, the company’s major investor and partner, and voice-focused AI startup ElevenLabs, to possibly drive future voice features for Claude.
> It’s unclear which of those partnerships, if any, came to fruition.
Here's an easy way to confirm that: check Anthropic's "Trust Center" and review any recent updates. https://trust.anthropic.com/updates
Sure enough, on May 29th they have a subprocessor change:
> As of May 29th, 2025, we have added ElevenLabs, which supports text to speech functionality in Claude for Work mobile apps.
I wonder what they're using for speech-to-text?
1. Start and stop button. I love this explicit control over who is talking when.
2. Ability to upload files while the voice chat is going. Great idea. Often times I use gpt voice chat for studying, and it's annoying when I need to add another PDF to the context, since I need to stop the chat, upload, and then restart the voice session.
3. Real-time text display during voice chat. I asked you to take the derivative of a function I described, and it outlined its steps, but it wasn't just the transcription of what it was saying.
Things I hate:
1. The transcription is terrible. It took me 10 tries during the conversation to describe f(x) = x^2. Looking back on the transcriptions, it's literally nonsense.
2. There was a buggy moment when the voice conversation started but it was still demoing all the voice options simultaneously. Need some polishing.
Now just wait until people address a single other person with youse, and then have to make up yous'all to address groups.
(Evolution of language is fascinating. I'm just pretending to be upset.)
Thou was second-person singular. Y’all is second-person plural.
I know it’s a massive challenge and might take years to get right but the endless copy and paste is wearing me down.
Their MAX 20x is double the cost $~6/day for quadruple the quota.
Keep in mind that Opus chows quota at 5x+ the rate of sonnet.
But even then, I never let it git add or git commit, and about half the time it runs in “ask me before you do any edits” mode and re-guiding it in real time as I see things going sideways.
They’re dead to me until they fix their over-aggressive auto-ban. Having done nothing more than traveling frequently, rarely using VPN and only using it for coding, I was caught up in a random inexplicable auto-ban. Zero customer service. Appeal process that leads to a black hole. Whatever their technical advances, their user experience when something goes awry is terrible.
https://docs.anthropic.com/en/docs/test-and-evaluate/strengt...
Just like world-wide-web and www.