ChatGPT has been around for quite a while now. Since it has been open to the public, this AI generative chatbot has been very much controversial to the public especially to the education field. Now, OpenAI has introduced new and more advanced features for its ChatGPT Plus users on both Android and iOS.
OpenAI through their X/Twitter announced one of its new features on 25 September 2023. Since then, this topic has blown out worldwide. It’s the number one trending topic across various countries and social media platforms. But what does the new feature do? You can have a look at the summary for the image feature in this X/Twitter video:
OpenAI introduces a multimodal input where it can speak, hear, and see. This makes users’ experience ten times easier than ever. Below are some of the ways you can use it:
Photo Query: Using Image To Get Prompts
As shown in the video, take a photo of an object that you need help with then type in your question. ChatGPT will list down all the things that you need to do for you to solve your issues. In the video, the user snapped a photo of his bike and asked ChatGPT to help him to the bike’s seat. The user then used the same method to ask several questions to get clarification. I got a pleasant surprise when the chatbot answered even the simplest question. It feels like having a very patient teacher sitting next to you describing and untying confusion. I love it! This image input feature will be available for mobile apps and desktops.
According to arstechnica.com, OpenAI has not disclosed the technical details of how ChatGPT-4 or its multimodal capabilities function internally. However, drawing from existing AI research, including collaborations with OpenAI Microsoft, it is generally understood that multimodal AI models typically involve the conversion of both text and images into a shared encoding space.
Hyper-realistic Text-to-Speech Model
Similar to Siri for iOS and Google Assistant for Android, this new feature enables you to speak with it. How does it work? To incorporate audio into ChatGPT, you start the process by tapping a headphone icon to verbalize your prompt and subsequently, you will receive an audible response. Unlike photo query, this feature is only available in mobile apps.
As stated, this feature is somewhat similar to what we already have and accustomed to for our phones. However, axios.com suggested that this feature would be a big deal if it is incorporated into a speaker or a car system.
You can see how this feature works here:
Text-to-Speech Model
Have you ever missed listening to someone reading you a bedtime story? If your answer is yes, then this new development is something that you would appreciate. According to the OpenAI website, this feature is “capable of generating human-like audio from just text and a few seconds of sample speech”. A few seconds only? I guess, no bedtime story then. But it’s still a good development though.
OpenAI reported that they work with professional voice actors in creating each of the voices. Additionally, they also collaborated with Whisper to create these voices; Juniper, Sky, Cove, Ember, and Breeze. These voices can be used to read out stories, poems, recipes, speeches and explanations.
I think it would be cool if this text-to-speech model could read out longer texts as I learn best from listening. But I guess I just stick to podcast shows for now.
AI Translated Podcast
As a noob in tech, this feature intrigues me the most. On Monday, Spotify announced that it will release a pilot for the ‘Voice Translation’ feature. What this new update will do is, it will translate the voice of their podcasters into other languages while maintaining the speaker’s real voice and style. Awesome, right?
CNBC.com reported that the company worked with big name podcasters such as Dax Shephard (Armchair Expert), Monica Padman (Armchair Expert), Steven Bartlett (A Diary of A CEO) and a few more. Their past and upcoming episodes were initially translated into Spanish, French and German. These translated episodes will soon become available to both free users and paid subscribers over the next few days and weeks.
I, initially, wondered why only these three languages are available. According to my research, these are some of the most popular languages among users of Spotify’s platform. To try out this AI voice translation feature, you can click here.
Conclusion
These developments mentioned above will be available for ChatGPT Plus and Enterprise users in the next two weeks. That being said, we will be able to see how OpenAI has changed artificial intelligence games. It is true that some of the features might sound a bit ‘meh’, but at least it’s a start.