{"id":1124728,"date":"2024-05-11T14:07:07","date_gmt":"2024-05-11T18:07:07","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/nvidias-dreureka-outperforms-humans-in-training-robotics-systems-venturebeat\/"},"modified":"2024-05-11T14:07:07","modified_gmt":"2024-05-11T18:07:07","slug":"nvidias-dreureka-outperforms-humans-in-training-robotics-systems-venturebeat","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/robotics\/nvidias-dreureka-outperforms-humans-in-training-robotics-systems-venturebeat\/","title":{"rendered":"Nvidia&#8217;s DrEureka outperforms humans in training robotics systems &#8211; VentureBeat"},"content":{"rendered":"<p><p>      Join us in returning to NYC on June 5th to      collaborate with executive leaders in exploring comprehensive      methods for auditing AI models regarding bias, performance,      and ethical compliance across diverse organizations. Find out      how you can attend       here.    <\/p>\n<p>    Large language models (LLMs) can accelerate the training of    robotics systems in super-human ways, according to a new study by    scientists at Nvidia, the University of Pennsylvania and the    University of Texas, Austin.  <\/p>\n<p>    The study introduces DrEureka, a technique that can    automatically create reward functions and randomization    distributions for robotics systems. DrEureka stands for Domain    Randomization Eureka. DrEureka only requires a high-level    description of the target task and is faster and more efficient    than human-designed rewards in transferring learned policies    from simulated environments to the real world.  <\/p>\n<p>    The implications can be great for the     fast-moving world of robotics, which has recently gotten a    renewed boost from the advances in language and vision models.  <\/p>\n<p>    When designing robotics models for new tasks, a policy is    usually trained in a simulated environment and deployed to the    real world. The difference between simulation and real-world    environments, referred to as the sim-to-real gap, is one of    the big challenges of any robotics system. Configuring and    fine-tuning the policy for optimal performance usually requires    a bit of back and forth between simulation and real-world    environments.  <\/p>\n<p>            The AI Impact Tour: The AI Audit          <\/p>\n<p>                                  Join us as we return to NYC on                                  June 5th to engage with top                                  executive leaders, delving into                                  strategies for auditing AI models                                  to ensure fairness, optimal                                  performance, and ethical                                  compliance across diverse                                  organizations. Secure your                                  attendance for this exclusive                                  invite-only event.                                <\/p>\n<p>    Recent works have shown that LLMs can combine their vast world    knowledge and reasoning capabilities with the physics engines    of virtual simulators to learn complex low-level skills. For    example, LLMs can be used to design reward functions, the    components that steer the robotics reinforcement learning (RL)    system to find the correct sequences of actions for the desired    task.  <\/p>\n<p>    However, once a policy is learned in simulation, transferring    it to the real world requires a lot of manual tweaking of the    reward functions and simulation parameters.  <\/p>\n<p>    The goal of DrEureka is to use LLMs to automate the intensive    human efforts required in the sim-to-real transfer process.  <\/p>\n<p>    DrEureka builds on     Eureka, a technique that was introduced in October 2023.    Eureka takes a robotic task description and uses an LLM to    generate software implementations for a reward function that    measures success in that task. These reward functions are then    run in simulation and the results are returned to the LLM,    which reflects on the outcome and modifies it to the reward    function. The advantage of this technique is that it can be run    in parallel with hundreds of reward functions, all generated by    the LLM. It can then pick the best functions and continue to    improve them.  <\/p>\n<p>    While the reward functions of Eureka are great for training RL    policies in simulation, it does not account for the messiness    of the real world and therefore requires manual sim-to-real    transfer. DrEureka addresses this shortcoming by automatically    configuring domain randomization (DR) parameters.  <\/p>\n<p>    DR techniques randomize the physical parameters of the    simulation environment so that the RL policy can generalize to    the unpredictable perturbances it meets in the real world. One    of the important challenges of DR is choosing the right    parameters and range of perturbations. Adjusting parameters    requires commonsense physical reasoning and knowledge of the    target robot.  <\/p>\n<p>    These characteristics of designing DR parameters make it an    ideal problem for LLMs to tackle because of their strong grasp    of physical knowledge and effectiveness in generating    hypotheses, providing good initializations to complex search    and black-box optimization problems in a zero-shot manner, the    researchers wrote.  <\/p>\n<p>    DrEureka uses a multi-step process to break down the complexity    of optimizing reward functions and domain randomization    parameters at the same time. First, an LLM generates reward    functions based on a task description and safety instructions    about the robot and the environment. DrEureka uses these    instructions to create an initial reward function and learn a    policy as in the original Eureka. The model then runs tests    with the policy and reward function to determine the suitable    range of physics parameters, such as friction and    gravity.  <\/p>\n<p>    The LLM then uses this information to select the optimal domain    randomization configurations. Finally, the policy is retrained    with the DR configurations to become robust against the    noisiness of the real world.  <\/p>\n<p>    The researchers described DrEureka as a language-model driven    pipeline for sim-to-real transfer with minimal human    intervention.  <\/p>\n<p>    The researchers evaluated DrEureka on quadruped and dexterous    manipulator platforms, although the method is general and    applicable to diverse robots and tasks. Their findings show    that in quadruped locomotion, policies trained with DrEureka    outperform the classic human-designed systems by 34% in forward    velocity and 20% in distance traveled across various real-world    evaluation terrains. They also tested DrEureka on dexterous    manipulation with robotic hands. Given a fixed amount of time,    the best policy trained by DrEureka performed 300% more cube    rotations than human-developed policies.  <\/p>\n<p>    But the most interesting finding was the application of    DrEureka on the novel task of having a robo-dog balancing and    walking on a yoga ball. The LLM was able to design a reward    function and DR configurations that allowed the trained policy    to be transferred to the real world with no extra    configurations and perform well enough on diverse indoor and    outdoor terrains with minimal safety support.  <\/p>\n<p>    Interestingly the study found that the safety instruction    included in the task description plays an important role in    ensuring that the LLM generates logical instructions that    transfer to the real world.  <\/p>\n<p>    We believe that DrEureka demonstrates the potential of    accelerating robot learning research by using foundation models    to automate the difficult design aspects of low-level skill    learning, the researchers wrote.  <\/p>\n<p>          VB Daily        <\/p>\n<p>          Stay in the know! Get the latest news in your inbox daily        <\/p>\n<p>          By subscribing, you agree to VentureBeat's Terms of Service.        <\/p>\n<p>          Thanks for subscribing. Check out more VB newsletters here.        <\/p>\n<p>          An error occured.        <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Read the rest here: <\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/venturebeat.com\/automation\/nvidias-dreureka-outperforms-humans-in-training-robotics-systems\" title=\"Nvidia's DrEureka outperforms humans in training robotics systems - VentureBeat\">Nvidia's DrEureka outperforms humans in training robotics systems - VentureBeat<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations.  <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/robotics\/nvidias-dreureka-outperforms-humans-in-training-robotics-systems-venturebeat\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187746],"tags":[],"class_list":["post-1124728","post","type-post","status-publish","format-standard","hentry","category-robotics"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1124728"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1124728"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1124728\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1124728"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1124728"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1124728"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}