ChatGPT Image and Voice Capabilities

ChatGPT Image and Voice Capabilities are Coming: Artificial Intelligence Trends

Up to now, the primary way to prompt ChatGPT is typed in commands. But ChatGPT image and voice capabilities are coming as soon as 2 weeks!

According to The Verge (You can now prompt ChatGPT with pictures and voice commands, written by David Pierce and available here), OpenAI is rolling out a new version of the service that allows you to prompt the AI bot not just by typing sentences into a text box but by either speaking aloud or just uploading a picture. The new features are rolling out to those who pay for ChatGPT in the next two weeks, and everyone else will get it “soon after,” according to OpenAI.

The voice chat part is pretty familiar: you tap a button and speak your question, ChatGPT converts it to text and feeds it to the large language model, gets an answer back, converts that back to speech, and speaks the answer out loud. It should feel just like talking to Alexa or Google Assistant, only — OpenAI hopes — the answers will be better thanks to the improved underlying tech. It appears most virtual assistants are being rebuilt to rely on LLMs — OpenAI is just ahead of the game.

OpenAI’s excellent Whisper model does a lot of the speech-to-text work, and the company is rolling out a new text-to-speech model it says can generate “human-like audio from just text and a few seconds of sample speech.” You’ll be able to choose ChatGPT’s voice from five options, but OpenAI seems to think the model has vastly more potential than that. OpenAI is working with Spotify to translate podcasts into other languages, for instance, all while retaining the sound of the podcaster’s voice. There are lots of interesting uses for synthetic voices, and OpenAI could be a big part of that industry.

But the fact that you can build a capable synthetic voice with just a few seconds of audio also opens the door for all kinds of problematic use cases. “These capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud,” the company says in a blog post announcing the new features. OpenAI says the model isn’t available for broad use for precisely that reason; it’s going to be much more controlled and restrained to specific use cases and partnerships.

The image search, meanwhile, is a bit like Google Lens. You snap a photo of whatever you’re interested in, and ChatGPT will try to suss out what you’re asking about and respond accordingly. You can also use the app’s drawing tool to help make your query clear or speak or type questions to go along with the image. This is where ChatGPT’s back-and-forth nature is helpful; rather than doing a search, getting the wrong answer, and then doing another search, you can prompt the bot and refine the answer as you go. (This is a lot like what Google is doing with multimodal search, too.)

Obviously, image search has its potential issues. One is what could happen when you prompt a chatbot about a person. OpenAI says it has deliberately limited ChatGPT’s “ability to analyze and make direct statements about people” both for accuracy and privacy reasons. That means one of the most sci-fi visions for AI — the ability to look at someone and say, “Who is that?” — isn’t coming anytime soon. Which is probably a good thing.

The ChatGPT image and voice capabilities illustrate just how much OpenAI is rapidly enhancing their product, which means more capabilities for us! Hopefully, more good than bad, but we’ll see once they’re out! I will look to test them out when they’re rolled out to paid users and let you know my thoughts.

So, what do you think? Are you excited about the upcoming ChatGPT image and voice capabilities? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

One comment

Leave a Reply