Show HN: openai-realtime-embedded-SDK Build AI assistants on microcontrollers

(github.com)

51 points | by Sean-Der 3 days ago

5 comments

kaycebasques 19 hours ago
Took a bit of poking to figure out what the use case is. Doesn't seem to be mentioned in the README (usage section is empty) or the intro above. Looks like the main use case is speech-to-speech. Which makes sense since we're talking about embedded products, and text-to-speech (for example) wouldn't usually be relevant (because most embedded products don't have a keyboard interface). Congrats on the launch! Cool to see WebRTC applied to embedded space. Streaming speech-to-speech with WebRTC could make a lot of sense.
[-]
- Sean-Der 19 hours ago
  Sorry I forgot to put use cases in! Here are the ones I am excited about.
  * Making a toy. I have had a lot of fun putting a silly/sarcastic voice in toys. My 4 year old thinks it is VERY funny.
  * Smart Speaker/Assistant. I want to put one in each room. If I am in the kitchen it has a prompt to assist with recipes.
  I have A LOT more in the future I want to do. The microcontrollers I was using can't do video yet BUT ESP32 does have newer ones that can. When I pull that I can do smart cameras, then it gets really fun :)
  [-]
  - kaycebasques 18 hours ago
    "Use case" perhaps wasn't the right word for me to use. Maybe "applications" would have been a better word. What this enables is speech-to-speech applications in embedded devices. (From my quick scan) it doesn't seem to do anything around other ML applications that OpenAI could potentially be involved in, such as speech-to-text, text-to-speech, or computer vision.
    But yeah, once I figured out that this enables streaming speech-to-speech applications on embedded devices, then it's easy to think up use cases.
    [-]
    - swatcoder 16 hours ago
      It doesn't help that this was posted to HN with the "Usages" section of the README left blank. That alone would probably have addressed your question. The submission is just a little prematue.
      Beyond that, while it does seem like its primarily vision is for speech-to-speech interfaces, it could easily be stretched to do things like send a templatized text prompt that was constructed based on toggle states, sensor readings, etc and (optimistically) asking for a structured response that could control lights or servos or whatever.
      Generally, this looks like a very early stage in a hobby project (the code practices fall short of my expectations for good embedded work, being presented as a library would be better than as an application, the README needs lots of work, etc), but something more sophisticated isn't too far out of reach.
      [-]
      - Sean-Der 14 hours ago
        I will work on making it better! This was announced Tuesday [0] I still need to give it lots of love.
        Even though the README isn’t completely done, give it a chance I bet you can have fun with it :)
        [0] https://youtu.be/14leJ1fg4Pw?t=625&si=aqHm1UAdDEz91TnD
jonathan-adly 18 hours ago
Here is a nice use-case. Put this in a pharmacy - have people hit a button, and ask questions about over-the-counter medications.
Really - any physical place where people are easily overwhelmed, have something like that would be really nice.
With some work - you can probably even run RAG on the questions and answer esoteric things like where the food court in an airport or the ATM in a hotel.
[-]
- swatcoder 16 hours ago
  > Put this in a pharmacy - have people hit a button, and ask questions about over-the-counter medications.
  Even if you trust OpenAI's models more than your trained, certified, and insured pharmacist -- the pharmacists, their regulators, and their insurers sure won't!
  They've got a century of sunk costs to consider (and maybe even some valid concern over the answers a model might give on their behalf...)
  Don't be expecting anything like that in an traditional regulated medical setting any time soon.
  [-]
  - dymk 16 hours ago
    The last few doctors appointments I’ve had, the clinician used a service to record and summarize the visit. It was using some sort of TTS and LLM to do so. It’s already in medical settings.
    [-]
    - swatcoder 16 hours ago
      Transcription and summary is a vastly different thing than providing medical advice to patients.
- pixelsort 18 hours ago
  Thanks for digging that out. Yes, that makes sense to me as someone who made a fully local speech-2-speech prototype with Electron, including VAD and AEC. It was responsive but taxing. I had to use a mix of specialty models over onnx/wasm in the renderer and llama.cpp in the main process. One day, multimodal model will just do it all.
roland35 13 hours ago
Favorited and starred! I wonder if the real power of this could be in integrating large low cost sensor networks? I think with things like video and audio it might make more sense to bump up to a single board Linux board - but maybe the AI could help parse or create notifications based on sensor readings, and push back events to the real world (lights, solenoids, etc)
I think it would help to either have a freertos example, or if you want to go real crazy create a zephyr integration! It would be a lot of fun to work on AI and microcontroller combination - what a cool niche!
[-]
- Sean-Der 13 hours ago
  I’m very curious about what a LLM could deduce if you sent in lots of sensor data.
  I love my Airthings. It don’t know if it’s actionable, but it would be cool to see what conclusions would come up from sending co2 and radon readings in. Could make understanding your home a lot easirr
johanam 19 hours ago
Love this! Excited to give it a try.
[-]
- Sean-Der 19 hours ago
  Thank you! If you run into problems shoot me a message. I really want to make this easy enough for everyone to build with it.
  I have talked with incredibly creative developers that are hampered by domain knowledge requirements. I hope to see an explosion of cool projects if we get this right :)