DeepMind Shows AI Has Trouble Seeing Homer Simpson’s Actions – IEEE Spectrum

The best artificial intelligence still has trouble visually recognizing many of Homer Simpsons favorite behaviors such as drinking beer, eating chips, eating doughnuts, yawning, and the occasional face-plant. Those findings from DeepMind, the pioneering London-based AI lab, also suggest the motive behind why DeepMind has created a huge new dataset of YouTube clips to help train AI on identifying human actions in videos that go well beyond Mmm, doughnuts or Doh!

The most popular AI used by Google, Facebook, Amazon, and other companies beyond Silicon Valley is based on deep learning algorithms that can learn to identify patterns in huge amounts of data. Over time, such algorithms can become much better at a wide variety of tasks such as translating between English and Chinese for Google Translateor automatically recognizing the faces of friends in Facebook photos.But even the most finely tuned deep learning relies on having lots of quality data to learn from.To help improve AIscapability to recognizehuman actions in motion,DeepMind has unveiled itsKinetics dataset consisting of 300,000 video clips and 400 human action classes.

AI systems are now very good at recognizing objects in images, but still have trouble making sense of videos, says aDeepMind spokesperson.One of the main reasons for this is that the research community has so far lacked a large, high-quality video dataset.

DeepMind enlisted the help of online workers through Amazons Mechanical Turk service to help correctly identify and label the actions inthousands of YouTube clips. Each of the 400 human action classes in the Kinetics dataset has at least 400 video clips, with each clip lasting around 10 seconds and taken from separate YouTube videos. More details can be found in a DeepMind paper on the arXiv preprint server.

The new Kinetics dataset seems likely to represent a new benchmark for training datasets intended to improve AI computer vision for video. It has far more video clips and action classes than the HMDB-51 and UCF-101 datasets that previously formed the benchmarks for the research community. DeepMind also made a point of ensuring it had a diverse datasetone that did not include multiple clips from the same YouTube videos.

Tech giants such as Googlea sister company to DeepMind under the umbrella Alphabet grouparguably have the best access to large amounts of video data that could prove helpful in training AI. Alphabets ownership of YouTube, the incredibly popular, online, video-streaming service, does not hurt either. But other companies and independent research groups must rely on publicly available datasets to train their deep learning algorithms.

Early training and testing with the Kinetics dataset showed some intriguing results. For example, deep learning algorithms showed accuracies of 80percent or greater in classifying actions such as playing tennis, crawling baby, presenting weather forecast, cutting watermelon, and bowling. But the classification accuracy dropped to around 20 percent or less for the Homer Simpson actions, including slapping and headbutting, and an assortment of other actions such as making a cake, tossing coin and fixing hair.

AI faces special challenges with classifying actions such as eating because it may not be able to accurately identify the specific food being consumedespecially if the hot dog or burger is already partially consumed or appears very small within the overall video. Dancing classes and actions focused on a specific part of the body can also prove tricky. Some actions also occur fairly quickly and are only visible for a small number of frames within a video clip, according to a DeepMind spokesperson.

DeepMind also wanted to see if the new Kinetics dataset has enough gender balance to allow for accurate AI training. Past cases have shown how imbalanced training datasets can lead to deep learning algorithms performing worse at recognizing the faces of certain ethnic groups. Researchers have also shown how such algorithms can pick up gender and racial biases from language.

A preliminary study showed that the new Kinetics dataset seems to fairly balanced. DeepMind researchers found that no single gender dominated within 340 out of the 400 action classesor else it was not possible to determine gender in those actions. Those action classes that did end up gender imbalanced included YouTube clips of actionssuch as shaving beard or dunking basketball (mostly male) and filling eyebrows or cheerleading (mostly female).

But even action classes that had gender imbalance did not show much evidence of classifier bias. This means that even the Kinetics action classes featuring mostly male participantssuch as playing poker or hammer throwdid not seem to bias AI to the point where the deep learning algorithms had trouble recognizing female participants performing the same actions.

DeepMind hopes that outside researchers can help suggest new human action classes for the Kinetics dataset. Any improvements may enable AI trained on Kinetics to better recognize both the most elegant of actions and the clumsier moments in videos that lead people to say doh! In turn, that could lead to new generations of computer software and robots with the capacity to recognize what all those crazy humans are doing on YouTube or in other video clips.

Video understanding represents a significant challenge for the research community, and we are in the very early stages with this, according to the DeepMind spokesperson. Any real-world applications are still a really long way off, but you can see potential in areas such as medicine, for example, aiding the diagnosis of heart problems in echocardiograms.

IEEE Spectrums general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.

Sign up for the Tech Alert newsletter and receive ground-breaking technology and science news from IEEE Spectrum every Thursday.

A deep learning approach could make self-driving cars better at adapting to new situations 26Apr2016

A tech startup aims to spread the wealth of deep learning AI to many industries 3Mar2016

Google engineers balanced speed and accuracy to deploy deep learning in Chinese-to-English translations 3Oct2016

If machine learning systems can be taught using simulated data from Grand Theft Auto V instead of data annotated by humans, we could get to reliable vehicle autonomy much faster 8Jun

Adversarial grasping helps robots learn better ways of picking up and holding onto objects 5Jun

Reverse engineering 1 cubic millimeter of brain tissue could lead to better artificial neural networks 30May

The FDA needs computer experts with industry experience to help oversee AI-driven health apps and wearables software 29May

The prototype chip learns a style of music, then composes its own tunes 23May

Crashing into objects has taught this drone to fly autonomously, by learning what not to do 10May

Silicon Valley startup Verdigris cloud-based analysis can tell whether youre using a Chromebook or a Mac, or whether a motor is running fine or starting to fail 3May

An artificial intelligence program correctly identifies 355 more patients who developed cardiovascular disease 1May

MITs WiGait wall sensor can unobtrusively monitor people for many health conditions based on their walking patterns 1May

Facebook's Yael Maguire talks about millimeter wave networks, Aquila, and flying tethered antennas at the F8 developer conference 19Apr

Machine learning uses data from smartphones and wearables to identify signs of relationship conflicts 18Apr

Machine-learning algorithms that readily pick up cultural biases may pose ethical problems 13Apr

AI and robots have to work in a way that is beneficial to people beyond reaching functional goals and addressing technical problems 29Mar

Understanding when they don't understand will help make robots more useful 15Mar

Palo Alto startup twoXAR partners with Santen Pharmaceutical to identify new glaucoma drugs; efforts on rare skin disease, liver cancer, atherosclerosis, and diabetic nephropathy also under way 13Mar

And they have a new piece of hardwarethe Jetson TX2that they hope everyone will use for this edge processing 8Mar

A deep-learning AI has beaten human poker pros with the hardware equivalent of a gaming laptop 2Mar

View post:

DeepMind Shows AI Has Trouble Seeing Homer Simpson's Actions - IEEE Spectrum

Related Posts

Comments are closed.