{"id":1123910,"date":"2024-04-12T05:52:51","date_gmt":"2024-04-12T09:52:51","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/googles-gemini-pro-1-5-can-now-hear-as-well-as-see-what-it-means-for-you-toms-guide\/"},"modified":"2024-04-12T05:52:51","modified_gmt":"2024-04-12T09:52:51","slug":"googles-gemini-pro-1-5-can-now-hear-as-well-as-see-what-it-means-for-you-toms-guide","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/google\/googles-gemini-pro-1-5-can-now-hear-as-well-as-see-what-it-means-for-you-toms-guide\/","title":{"rendered":"Google&#8217;s Gemini Pro 1.5 can now hear as well as see  what it means for you &#8211; Tom&#8217;s Guide"},"content":{"rendered":"<p><p>    Google has updated its incredibly powerful    Gemini Pro    1.5 artificial intelligence model to give it the ability to    hear the contents of an audio or video file for the first    time.  <\/p>\n<p>    The update was announced at     Google Next, with the search giant confirming the model can    listen to an updloaded clip and provide information without the    need for a written transcript.  <\/p>\n<p>    What this means is you could     give it a documentary or video presentation and ask it    questions about any moment, both audio and video, within the    clip.  <\/p>\n<p>    This is part of a wider push from     Google to create more multimodal models that can understand    a variety of input types beyond just text. The move is possible    due to the Gemini family of models being trained on audio,    video, text and code at the same time.  <\/p>\n<\/p>\n<p>    Google launched Gemini Pro 1.5 in February with a 1 million    token context window. This, combined with the multimodal    training data means it can process videos.  <\/p>\n<p>    The tech giant has now added sound to the options for input.    This means you can give it a podcast and have it listen through    for key moments or specific mentions. It can do the same for    audio attached to a video file, while also analysing the video    content.  <\/p>\n<p>        The update also means Gemini can now generate transcripts        for video clips regardless of how long they might run and        find a specific moment within the audio or video file.      <\/p>\n<p>    The new update is part of the middle-tier of the Gemini family,    which comes in three form factors  the tiny Nano for    on-device, Pro powering the free version of the Gemini chatbot    and Ultra powering Gemini Advanced.  <\/p>\n<p>            Upgrade your life with a daily dose of the biggest tech            news, lifestyle hacks and our curated analysis. Be the            first to know about cutting-edge gadgets and the            hottest deals.          <\/p>\n<p>    For some reason Google only released the 1.5 update to Gemini    Pro rather than Ultra, meaning their middle-tier model now out    performs the more advanced version. It isnt clear if there    will be a Gemini Ultra 1.5 or when it will be accessible if it    launches.  <\/p>\n<p>    The massive context window  starting at 250,000 (similar to    Claude 3 Opus) and rising to over a million for certain    approved users  means you also dont need to fine tune a model    on specific data. You can load that data in at the start of a    chat and just ask questions.  <\/p>\n<p>    The update also means Gemini can now generate transcripts for    video clips regardless of how long they might run and find a    specific moment within the audio or video file.  <\/p>\n<\/p>\n<p>    I imagine at some point Google will update its Gemini chatbot    to use the 1.5 models, possibly after the Google I\/O developer    conference next month. For now it is only available through the    Google Cloud developer dashboard VertexAI.  <\/p>\n<p>    While VertexAI is a powerful tool for interacting with a range    of models, building out AI applications and testing what is    possible it isn't widely accessible and mainly targeted    at developers, enterprise and researchers rather than    consumers.  <\/p>\n<p>    Using VertexAI you can insert any form of visual or audio media    such as a short film or someone giving a talk and add a text    prompt. This could be \"give me five bullet points summing up    the speech\" or \"how many times did they say Gemini\".  <\/p>\n<p>    Google's main audience for Gemini Pro 1.5 is enterprise with    partnerships already in the works with TBS, REplit and others    who are using it for metadata tagging and creating code.  <\/p>\n<p>    Google has also started using Gemini Pro 1.5 in its own    products including the Generative AI coding assistant Code    Assist to track changes across large-scale codebases.  <\/p>\n<p>    The changes to Gemini Pro 1.5 were announced at Google Next    along with a big update to the DeepMind AI image model Imagen 2    that powers the Gemini image-generation capabilities.  <\/p>\n<p>    This is getting inpainting and outptaining where users can    remove or add any element from a generated image. This is    similar to updates OpenAI made to its    DALL-E model recently.  <\/p>\n<p>    Google is also going to starts grounding its AI responses    across Gemini and other platforms with Google Search so they    always contain up to date information.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>The rest is here:<\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.tomsguide.com\/ai\/google-gemini\/googles-gemini-pro-15-can-now-hear-as-well-as-see-what-it-means-for-you\" title=\"Google's Gemini Pro 1.5 can now hear as well as see  what it means for you - Tom's Guide\">Google's Gemini Pro 1.5 can now hear as well as see  what it means for you - Tom's Guide<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Google has updated its incredibly powerful Gemini Pro 1.5 artificial intelligence model to give it the ability to hear the contents of an audio or video file for the first time. The update was announced at Google Next, with the search giant confirming the model can listen to an updloaded clip and provide information without the need for a written transcript.  <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/google\/googles-gemini-pro-1-5-can-now-hear-as-well-as-see-what-it-means-for-you-toms-guide\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[345634],"tags":[],"class_list":["post-1123910","post","type-post","status-publish","format-standard","hentry","category-google"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1123910"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1123910"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1123910\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1123910"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1123910"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1123910"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}