The Curious Case of Data Annotation and AI – RTInsights

Data annotation takes time. And for in-house teams, labeling data can be the proverbial bottleneck, limiting a companys ability to quickly train and validate machine learning models.

By its very definition, artificialintelligence refers to computer systems that can learn, reason, and act forthemselves, but where does this intelligence come from? For decades, thecollaborative intelligence of humans and machines has produced some of theworlds leading technologies. And while theres nothing glamorous about thedata being used to train todays AI applications, the role of data annotationin AI is nonetheless fascinating.

See also: New Tool Offers Help with Data Annotation

Poorly Labeled Data Leads to Compromised AI

Imagine reviewing hours of video footage sortingthrough thousands of driving scenes, to label all of the vehicles that comeinto frame, and youve got data annotation. Data annotation is the process oflabeling images, video, audio, and other data sources, so the data isrecognizable to computer systems programmed for supervised-learning. This isthe intelligence behind AI algorithms.

For companies using AI to solve worldproblems, improve operations, increase efficiencies, or otherwise gain acompetitive edge, training an algorithm is more than just collecting annotateddata, its sourcing superior quality training data and ensuring that data iscontributing to model validation, so applications can be brought to marketquickly, safely, and ethically.

Data is the most crucial element of machine learning.Without data annotation, computers couldnt be trained to see, speak, orperform intelligent functions, yet obtaining datasets, and labeling trainingdata are among the top limitations to adopt AI, according to the McKinsey Global Institute. Another knownlimitation is data bias, which can creep in at any stage of the training datalifecycle, but more often than not occurs from poor quality or inconsistentdata labeling.

The IDC shared that 50 percent of IT and dataprofessionals surveyed report data quality as a challenge in deploying AIworkloads, but where does quality data come from?

Open-source datasets are one way to collectdata for an ML model, but since many are curated for a specific use case, itmay not be useful for highly specialized needs. Also, the amount of data neededto train your algorithm may vary based on the complexity of the problem youretrying to solve, and the complexity of your model.

The Waymo Open Dataset is the largest, mostdiverse autonomous driving dataset to date, consisting of thousands of imageslabeled with millions of bounding boxes and object classes12 million 3Dbounding box labels and 1.2 million 2D bounding box labels, to be exact. Still,Waymo has plans to continuously grow the size of this dataset even further.

Why? Because current, accurate, and refresheddata is necessary to continuously train, validate, and maintain agile machinelearning models. There are always edge cases, and for some use cases, even moredata is needed. If the data is lacking in any way, those gaps compromise theintelligence of the algorithm in the form of bias, false positives, poorperformance, and other issues.

Lets say youre searching for a new laptop.When you type your specifications into the search bar, the results that come upare the work of millions of labeled and indexed data points, from product SKUsto product photos.

If your search returns results for a lunchbox,a briefcase, or anything else mistaken for the signature clamshell of a laptop,youve got a problem. You cant find it, so you cant buy it, and that companyjust lost a sale.

This is why quality annotated data is soimportant. Poor quality data has a direct correlation to biased and inaccuratemodels, and in some cases, improving data quality is as simple as making sureyou have the right data in the first place.

Vulcan Inc., experienced the challenge of diversity in their datasetfirst-hand while working to develop AI-enabled products that could record andmonitor African wildlife. While trying to detect cows in imagery, they realizedtheir model could not recognize cows in Africa, based on their dataset of cowsfrom Washington, alone. To get their ML model operating at peak performance,they needed to create a training dataset of their own.

Labeling Data, Demanding for AI Teams

As you might expect, data annotation takestime. And for in-house teams, labeling data can be the proverbial bottleneck,limiting your ability to quickly train and validate machine learning models.

Labeling datasets is arguably one of the hardest parts of building AI. Cognilytica reports that 80 percent of AIproject time is spent aggregating, cleaning, labeling, and augmenting data tobe used in machine learning models. Thats before any model development or AItraining even begins.

And while labeling data is not an engineeringchallenge, nor is it a data science problem, data annotation can provedemanding for several reasons.

The first is the sheer amount of time it takesto prepare large volumes of raw data for labeling. Its no secret, human effortis required to create datasets, and sorting irrelevant data from the desireddata is a task in and of itself.

Then, theres the challenge of getting theclean data labeled efficiently and accurately. A short video could take severalhours to annotate, depending on the object classes represented and theirdensity for the model to learn effectively.

An in-house team may not have enough dedicatedpersonnel to process the data in a timely manner, leaving model development ata standstill until this task is complete. In some cases, the added pressure ofkeeping the AI pipeline moving can lead to incomplete or partially labeleddata, or worse, blatant errors in the annotations.

Even in instances where existing personnel canserve as the in-house data annotation team, and they have the training andexpertise to do it well, few companies have the technology infrastructure tosupport an AI pipeline from ingestion to algorithm, securely and smoothly.

This is why organizations lacking the time fordata annotation, annotation expertise, clear strategies for AI adoption, ortechnology infrastructure to support the training data lifecycle partner withtrusted providers to build smarter AI.

To improve its retail item coverage from 91 to 98percent, Walmart worked with a specializeddata annotation partner to evaluate their data and ensure its accuracyto train Walmart systems. With more than 2.5 million items cataloged during thepartnership, the Walmart team has been able to focus on model development,rather than aggregating data.

How Data Annotation Providers Combine Humans and Tech

Data annotation providers have access to toolsand techniques that can help expedite the annotation process and improve theaccuracy of data labeling.

For starters, working day in and day out withtraining data means these companies see a range of scenarios where dataannotation is seamless and where things could be improved. They can then passthese learnings on to their clients, helping to create effective training datastrategies for AI development.

For organizations unsure of how tooperationalize AI in their business, an annotation provider can serve as atrusted advisor to your machine learning teamasking the right questions, atthe right time, under the right circumstances.

A recent report shared that organizations spend 5x moreon internal data labeling, for every dollar spent on third-party services. Thismay be due, in part, to the expense of assigning data scientists and ML engineerslabeling tasks. Still, theres also something to be said about the establishedplatforms, workflows, and trained workforce that allow annotation serviceproviders to work more efficiently.

Working with a trusted partner often meansthat the annotators assigned to your project receive training to understand thecontext of the data being labeled. It also means you have a dedicatedtechnology platform for data labeling. Over time, your dedicated team oflabelers can begin to specialize in your specific use-case, and this expertiseresults in lower costs and better scalability of your AI programs.

Technology platforms that incorporateautomation and reporting, such as automated QA, can also help improve labelingefficiency by helping to prevent logical fallacies, expedite training for datalabelers, and ensure a consistent measure of annotation quality. This alsohelps reduce the amount of manual QA time required by clients, as well as theannotation provider.

Few-click annotation is another example, whichuses machine learning to increase accuracy and reduce labeling time. Withfew-click annotation, the time it would take a human to annotate several pointscan be reduced down from two minutes to a few seconds. This combination ofmachine learning and the support of a human, who does a few clicks, produces alevel of labeling precision previously not possible with human effort alone.

The human in the loop is not going away in theAI supply chain. However,more data annotation providers are also using pre and post-processingtechnologies to support humans training AI. In pre-processing, machine learningis used to convert raw data into clean datasets, using a script. This does notreplace or reduce data labeling, but it can help improve the quality of theannotations and the labeling process.

There are no shortcuts to train AI, but a dataannotation provider can help expedite the labeling process, by leveragingin-house technology platforms, and acting as an extension of your team, toclose the loop between data scientists and data labelers.

See the rest here:

The Curious Case of Data Annotation and AI - RTInsights

Classic reasoning systems like Loom and PowerLoom vs. more modern systems based on probalistic networks [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Using Amazon's cloud service for computationally expensive calculations [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Software environments for working on AI projects [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
New version of my NLP toolkit [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Semantic Web: through the back door with HTML and CSS [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Java FastTag part of speech tagger is now released under the LGPL [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Defining AI and Knowledge Engineering [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Great Overview of Knowledge Representation [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Something like Google page rank for semantic web URIs [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
My experiences writing AI software for vehicle control in games and virtual reality systems [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
The URL for this blog has changed [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
I have a new page on Knowledge Management [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
N-GRAM analysis using Ruby [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Good video: Knowledge Representation and the Semantic Web [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Using the PowerLoom reasoning system with JRuby [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Machines Like Us [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
RapidMiner machine learning, data mining, and visualization tool [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
texai.org [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
NLTK: The Natural Language Toolkit [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
My OpenCalais Ruby client library [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Ruby API for accessing Freebase/Metaweb structured data [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Protégé OWL Ontology Editor [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
New version of Numenta software is available [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Very nice: Elsevier IJCAI AI Journal articles now available for free as PDFs [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Verison 2.0 of OpenCyc is available [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
What’s Your Biggest Question about Artificial Intelligence? [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Minimax Search [Knowledge] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Decision Tree [Knowledge] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
More AI Content & Format Preference Poll [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
New Planners Solve Rescue Missions [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Neural Network Learns to Bluff at Poker [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Pushing the Limits of Game AI Technology [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Mining Data for the Netflix Prize [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Interview with Peter Denning on the Principles of Computing [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Decision Making for Medical Support [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Neural Network Creates Music CD [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
jKilavuz - a guide in the polygon soup [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Artificial General Intelligence: Now Is the Time [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Apply AI 2007 Roundtable Report [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
What Would You do With 80 Cores? [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Software Finds Learning Language Child's Play [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Artificial Intelligence in Games [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Artificial Intelligence Resources [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Alan Turing: Mathematical Biologist? [Last Updated On: April 25th, 2012] [Originally Added On: April 25th, 2012]
BBC Horizon: The Hunt for AI ( Artificial Intelligence ) - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Can computers have true artificial intelligence" Masonic handshake" 3rd-April-2012 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Kevin B. Korb - Interview - Artificial Intelligence and the Singularity p3 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Artificial Intelligence - 6 Month Anniversary - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Science Breakthroughs [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Hitman: Blood Money - Part 49 - Stupid Artificial Intelligence! - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Research Members Turned Off By HAARP Artificial Intelligence - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Artificial Intelligence Lecture No. 5 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
The Artificial Intelligence Laboratory, 2012 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Charlie Rose - Artificial Intelligence - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
Expert on artificial intelligence to speak at EPIIC Nights dinner [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
Filipino software engineers complete and best thousands on Stanford’s Artificial Intelligence Course [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
Vodafone xone™ Hackathon Challenges Developers and Entrepreneurs to Build a New Generation of Artificial Intelligence ... [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
Rocket Fuel Packages Up CPG Booster [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
2 Filipinos finishes among top in Stanford’s Artificial Intelligence course [Last Updated On: May 5th, 2012] [Originally Added On: May 5th, 2012]
Why Your Brain Isn't A Computer [Last Updated On: May 5th, 2012] [Originally Added On: May 5th, 2012]
2 Pinoy software engineers complete Stanford's AI course [Last Updated On: May 7th, 2012] [Originally Added On: May 7th, 2012]
Percipio Media, LLC Proudly Accepts Partnership With MIT's Prestigious Computer Science And Artificial Intelligence ... [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
Google Driverless Car Ok'd by Nevada [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
Moving Beyond the Marketing Funnel: Rocket Fuel and Forrester Research Announce Free Webinar [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
Rocket Fuel Wins 2012 San Francisco Business Times Tech & Innovation Award [Last Updated On: May 13th, 2012] [Originally Added On: May 13th, 2012]
Internet Week 2012: Rocket Fuel to Speak at OMMA RTB [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
How to Get the Most Out of Your Facebook Ads -- Rocket Fuel's VP of Products, Eshwar Belani, to Lead MarketingProfs ... [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
The Digital Disruptor To Banking Has Just Gone International [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
Moving Beyond the Marketing Funnel: Rocket Fuel Announce Free Webinar Featuring an Independent Research Firm [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
MASA Showcases Latest Version of MASA SWORD for Homeland Security Markets [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
Bluesky Launches Drones for Aerial Surveying [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
Artificial Intelligence: What happened to the hunt for thinking machines? [Last Updated On: May 25th, 2012] [Originally Added On: May 25th, 2012]
Bubble Robots Move Using Lasers [VIDEO] [Last Updated On: May 25th, 2012] [Originally Added On: May 25th, 2012]
UHV assistant professors receive $10,000 summer research grants [Last Updated On: May 27th, 2012] [Originally Added On: May 27th, 2012]
Artificial intelligence: science fiction or simply science? [Last Updated On: May 28th, 2012] [Originally Added On: May 28th, 2012]
Exetel taps artificial intelligence [Last Updated On: May 29th, 2012] [Originally Added On: May 29th, 2012]
Software offers brain on the rain [Last Updated On: May 29th, 2012] [Originally Added On: May 29th, 2012]
New Dean of Science has high hopes for his faculty [Last Updated On: May 30th, 2012] [Originally Added On: May 30th, 2012]
Cognitive Code Announces "Silvia For Android" App [Last Updated On: May 31st, 2012] [Originally Added On: May 31st, 2012]
A Rat is Smarter Than Google [Last Updated On: June 5th, 2012] [Originally Added On: June 5th, 2012]