{"id":169352,"date":"2024-05-15T02:36:32","date_gmt":"2024-05-15T06:36:32","guid":{"rendered":"https:\/\/www.immortalitymedicine.tv\/gpt-4o-delivers-human-like-ai-interaction-with-text-audio-and-vision-integration-ai-news\/"},"modified":"2024-08-18T12:53:40","modified_gmt":"2024-08-18T16:53:40","slug":"gpt-4o-delivers-human-like-ai-interaction-with-text-audio-and-vision-integration-ai-news","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/ai\/gpt-4o-delivers-human-like-ai-interaction-with-text-audio-and-vision-integration-ai-news.php","title":{"rendered":"GPT-4o delivers human-like AI interaction with text, audio, and vision integration &#8211; AI News"},"content":{"rendered":"<p><p>    OpenAI has launched its new    flagship model, GPT-4o, which seamlessly integrates text,    audio, and visual inputs and outputs, promising to enhance the    naturalness of machine interactions.  <\/p>\n<p>    GPT-4o, where the o stands for omni, is designed to cater    to a broader spectrum of input and output modalities. It    accepts as input any combination of text, audio, and image and    generates any combination of text, audio, and image outputs,    OpenAI announced.  <\/p>\n<p>    Users can expect a response time as quick as 232 milliseconds,    mirroring human conversational speed, with an impressive    average response time of 320 milliseconds.  <\/p>\n<p>    The introduction of GPT-4o marks a leap from its predecessors    by processing all inputs and outputs through a single neural    network. This approach enables the model to retain critical    information and context that were previously lost in the    separate model pipeline used in earlier versions.  <\/p>\n<p>    Prior to GPT-4o, Voice Mode could handle audio interactions    with latencies of 2.8 seconds for GPT-3.5 and 5.4 seconds for    GPT-4. The previous setup involved three distinct models: one    for transcribing audio to text, another for textual responses,    and a third for converting text back to audio. This    segmentation led to loss of nuances such as tone, multiple    speakers, and background noise.  <\/p>\n<p>    As an integrated solution, GPT-4o boasts notable improvements    in vision and audio understanding. It can perform more complex    tasks such as harmonising songs, providing real-time    translations, and even generating outputs with expressive    elements like laughter and singing. Examples of its broad    capabilities include preparing for interviews, translating    languages on the fly, and generating customer service    responses.  <\/p>\n<p>    Nathaniel Whittemore, Founder and CEO of Superintelligent, commented: Product    announcements are going to inherently be more divisive than    technology announcements because its harder to tell if a    product is going to be truly different until you actually    interact with it. And especially when it comes to a different    mode of human-computer interaction, there is even more room for    diverse beliefs about how useful its going to be.  <\/p>\n<p>    That said, the fact that there wasnt a GPT-4.5 or GPT-5    announced is also distracting people from the technological    advancement that this is a natively multimodal model. Its not    a text model with a voice or image addition; it is a multimodal    token in, multimodal token out. This opens up a huge array of    use cases that are going to take some time to filter into the    consciousness.  <\/p>\n<p>    GPT-4o matches GPT-4 Turbo performance levels in English text    and coding tasks but outshines significantly in non-English    languages, making it a more inclusive and versatile model. It    sets a new benchmark in reasoning with a high score of 88.7% on    0-shot COT MMLU (general knowledge questions) and 87.2% on the    5-shot no-CoT MMLU.  <\/p>\n<p>    The model also excels in audio and translation benchmarks,    surpassing previous state-of-the-art models like Whisper-v3.    In multilingual and vision evaluations, it demonstrates    superior performance, enhancing OpenAIs multilingual, audio,    and vision capabilities.  <\/p>\n<p>    OpenAI has incorporated robust safety measures into GPT-4o by    design, incorporating techniques to filter training data and    refining behaviour through post-training safeguards. The model    has been assessed through a Preparedness Framework and complies    with OpenAIs voluntary commitments. Evaluations in areas like    cybersecurity, persuasion, and model autonomy indicate that    GPT-4o does not exceed a Medium risk level across any    category.  <\/p>\n<p>    Further safety assessments involved extensive external red    teaming with over 70 experts in various domains, including    social psychology, bias, fairness, and misinformation. This    comprehensive scrutiny aims to mitigate risks introduced by the    new modalities of GPT-4o.  <\/p>\n<p>    Starting today, GPT-4os text and image capabilities are    available in ChatGPTincluding a free tier and extended    features for Plus users. A new Voice Mode powered by GPT-4o    will enter alpha testing within ChatGPT Plus in the coming    weeks.  <\/p>\n<p>    Developers can access GPT-4o through the API for text and vision tasks,    benefiting from its doubled speed, halved price, and enhanced    rate limits compared to GPT-4 Turbo.  <\/p>\n<p>    OpenAI plans to expand GPT-4os audio and video functionalities    to a select group of trusted partners via the API, with broader    rollout expected in the near future. This phased release    strategy aims to ensure thorough safety and usability testing    before making the full range of capabilities publicly    available.  <\/p>\n<p>    Its hugely significant that theyve made this model available    for free to everyone, as well as making the API 50% cheaper.    That is a massive increase in accessibility, explained    Whittemore.  <\/p>\n<p>    OpenAI invites community feedback to continuously refine    GPT-4o, emphasising the importance of user input in identifying    and closing gaps where GPT-4 Turbo might still outperform.  <\/p>\n<p>    (Image Credit: OpenAI)  <\/p>\n<p>    See also:     OpenAI takes steps to boost AI-generated content    transparency  <\/p>\n<p>    Want to learn more about AI and big data from industry    leaders? Check out AI & Big Data Expo taking place    in Amsterdam, California, and London. The comprehensive event    is co-located with other leading events including Intelligent    Automation Conference, BlockX, Digital    Transformation Week, and Cyber Security &    Cloud Expo.  <\/p>\n<p>    Explore other upcoming enterprise technology events and    webinars powered by TechForge here.  <\/p>\n<p>    Tags: ai, api, artificial intelligence, benchmarks, chatgpt, coding, developers, development, gpt-4o, Model, multimodal, openai, performance, programming  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Original post: <\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.artificialintelligence-news.com\/2024\/05\/14\/gpt-4o-human-like-ai-interaction-text-audio-vision-integration\" title=\"GPT-4o delivers human-like AI interaction with text, audio, and vision integration - AI News\">GPT-4o delivers human-like AI interaction with text, audio, and vision integration - AI News<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> OpenAI has launched its new flagship model, GPT-4o, which seamlessly integrates text, audio, and visual inputs and outputs, promising to enhance the naturalness of machine interactions. GPT-4o, where the o stands for omni, is designed to cater to a broader spectrum of input and output modalities. It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs, OpenAI announced.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/ai\/gpt-4o-delivers-human-like-ai-interaction-with-text-audio-and-vision-integration-ai-news.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[1234935],"tags":[],"class_list":["post-169352","post","type-post","status-publish","format-standard","hentry","category-ai"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/169352"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=169352"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/169352\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=169352"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=169352"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=169352"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}