(Dusit/Shutterstock)
Artificial intelligence is becoming more ubiquitous and necessary these days. From preventing fraud, real-time anomaly detection to predicting customer churn, enterprise customers are finding new applications of machine learning (ML) every day. What lies under the hood of ML, how does this technology make predictions and which secret ingredient makes the AI magic work?
In the data science community, the focus is typically on algorithm selection and model training, and indeed those are important, but the most critical piece in the AI/ML workflow is not how we select or tune algorithms but what we input to AI/ML, i.e., feature engineering.
Feature engineering is the holy grail of data science and the most critical step that determines the quality of AI/ML outcomes. Irrespective of the algorithm used, feature engineering drives model performance, governs the ability of machine learning to generate meaningful insights, and ultimately solve business problems.
Feature engineering is the process of applying domain knowledge to extract analytical representations from raw data, making it ready for machine learning. It is the first step in developing a machine learning model for prediction.
Feature engineering involves the application of business knowledge, mathematics, and statistics to transform data into a format that can be directly consumed by machine learning models. It starts from many tables spread across disparate databases that are then joined, aggregated, and combined into a single flat table using statistical transformations and/or relational operations.
(NicoElNino/Shutterstock)
For example, predicting customers likely to churn in any given quarter implies having to identify potential customers who have the highest probability of no longer doing business with the company. How do you go about making such a prediction? We make predictions about the churn rate by looking at the underlying causes. The process is based on analyzing customer behavior and then creating hypotheses. For example, customer A contacted customer support five times in the last month implying customer A has complaints and is likely to churn. In another scenario, customer As product usage might have dropped by 30% in the previous two months, again, implying that customer A has a high probability of churning. Looking at the historical behavior, extracting some hypothesis patterns, testing those hypotheses is the process of feature engineering.
Feature engineering is about extracting the business hypothesis from historical data. A business problem that involves predictions such as customer churn is a classification problem.
There are several ML algorithms that you can use, such as classical logistic regression, decision tree, support vector machine, boosting, neural network. Although all these algorithms require a single flat matrix as their inputs, raw business data is stored in disparate tables (e.g., transactional, temporal, geo-locational, etc.) with complex relationships.
(Semisatch/Shutterstock)
We may join two tables first and perform temporal aggregation on the joined table to extract temporal user behavior patterns. Practical FE is far more complicated than simple transformation exercises such as One-Hot Encoding (transform categorical values into binary indicators so that ML algorithms can utilize). To implement FE, we are writing hundreds or even thousands of SQL-like queries, performing a lot of data manipulation, as well as a multitude of statistical transformations.
In the machine learning context, if we know the historical pattern, we can create a hypothesis. Based on the hypothesis, we can predict the likely outcome like which customers are likely to churn in a given time period. And FE is all about finding the optimal combination of hypotheses.
Feature Engineering is critical because if we provide wrong hypotheses as an input, ML cannot make accurate predictions. The quality of any provided hypothesis is vital for the success of an ML model. Quality of feature is critically important from accuracy and interpretability.
Feature engineering is the most iterative, time-consuming, and resource-intensive process, involving interdisciplinary expertise. It requires technical knowledge but, more importantly, domain knowledge.
The data science team builds features by working with domain experts, testing hypotheses, building and evaluating ML models, and repeating the process until the results become acceptable for businesses. Because in-depth domain knowledge is required to generate high-quality features, feature engineering is widely considered the black-arts of experts, and not possible to automate even when a team often spends 80% of their effort on developing a high-quality feature table from raw business data.
Feature engineering automation has vast potential to change the traditional data science process. It significantly lowers skill barriers beyond ML automation alone, eliminating hundreds or even thousands of manually-crafted SQL queries, and ramps up the speed of the data science project even without a full light of domain knowledge. It also augments our data insights and delivers unknown- unknowns based on the ability to explore millions of feature hypotheses just in hours.
Recently, ML automation (a.k.a. AutoML) has received large attention. AutoML is tackling one of the critical challenges that organizations struggle with: the sheer length of the AI and ML project, which usually takes months to complete, and the incredible lack of qualified talent available to handle it.
While current AutoML products have undoubtedly made significant inroads in accelerating the AI and machine learning process, they fail to address the most significant step, the process to prepare the input of machine learning from raw business data, in other words, feature engineering.
To create a genuine shift in how modern organizations leverage AI and machine learning, the full cycle of data science development must involve automation. If the problems at the heart of data science automation are due to lack of data scientists, poor understanding of ML from business users, and difficulties in migrating to production environments, then these are the challenges that AutoML must also resolve.
AutoML 2.0, which automates the data and feature engineering, is emerging streamlining FE automation and ML automation as a single pipeline and one-stop-shop. With AutoML 2.0, the full-cycle from raw data through data and feature engineering through ML model development takes days, not months, and a team can deliver 10x more projects.
Feature engineering helps reveal the hidden patterns in the data and powers the predictive analytics based on machine learning. Algorithms need high-quality input data containing relevant business hypotheses and historical patterns and feature engineering provides this data. However, it is the most human-dependent and time-consuming part of AI/ML workflow.
AutoML 2.0, streamlines feature engineering automation and ML automation, is a new technology breakthrough to accelerate and simplify AI/ML for enterprises. It enables more people, such as BI engineers or data engineers to execute AI/ML projects and makes enterprise AI/ML more scalable and agile.
About the author: Ryohei Fujimaki, Ph.D., is the founder and CEO of dotData. Prior to founding dotData, he was the youngest research fellow ever in NEC Corporations 119-year history, the title was honored for only six individuals among 1000+ researchers. During his tenure at NEC, Ryohei was heavily involved in developing many cutting-edge data science solutions with NECs global business clients, and was instrumental in the successful delivery of several high-profile analytical solutions that are now widely used in industry. Ryohei received his Ph.D. degree from the University of Tokyo in the field of machine learning and artificial intelligence.
Related Items:
Are We Asking Too Much from Citizen Data Scientists?
NECs AutoML Spinoff Takes Off
Making ML Explainable Again
Read the original:
What is Feature Engineering and Why Does It Need To Be Automated? - Datanami
- The Automation Conference - December 9th, 2016 [December 9th, 2016]
- The Best Home Automation Systems of 2016 | Top Ten Reviews - December 24th, 2016 [December 24th, 2016]
- Compact Automation - Actuators, Hydraulic Cylinders, Linear ... - December 24th, 2016 [December 24th, 2016]
- What is Home Automation? | Home Automation Systems - December 24th, 2016 [December 24th, 2016]
- Job Seekers - Automation Personnel Services - December 24th, 2016 [December 24th, 2016]
- iAutomation - December 25th, 2016 [December 25th, 2016]
- Beyond Automation - hbr.org - December 25th, 2016 [December 25th, 2016]
- Automation The Car Company Tycoon Game on Steam - December 25th, 2016 [December 25th, 2016]
- Automation - Wikipedia - December 25th, 2016 [December 25th, 2016]
- Build automation - Wikipedia - December 26th, 2016 [December 26th, 2016]
- Home - Enerwave Home Automation - December 27th, 2016 [December 27th, 2016]
- Automation | Technologies | Systems | Integrator ... - December 27th, 2016 [December 27th, 2016]
- Automation - DESHAZO - December 27th, 2016 [December 27th, 2016]
- Custom Automation & Machine Design | Automation GT - December 27th, 2016 [December 27th, 2016]
- IT Automation - BMC - December 27th, 2016 [December 27th, 2016]
- Werner Electric | Automation - January 28th, 2017 [January 28th, 2017]
- Automationtechies | Automation Engineering Recruiting - January 28th, 2017 [January 28th, 2017]
- Automation - Mazak Corporation - January 28th, 2017 [January 28th, 2017]
- Automation | Food Engineering - January 28th, 2017 [January 28th, 2017]
- Test Automation Services for Development of Regression ... - January 28th, 2017 [January 28th, 2017]
- UI Automation Overview - msdn.microsoft.com - February 5th, 2017 [February 5th, 2017]
- The Evolution of Automation and What It Means for the Integration Industry - Commercial Integrator - February 7th, 2017 [February 7th, 2017]
- Automation, robots could replace 250000 public sector workers in the next 15 years - Computer Business Review - February 7th, 2017 [February 7th, 2017]
- New telecom transformation goals require service automation - TechTarget - February 7th, 2017 [February 7th, 2017]
- Automation expected to displace insurance underwriters, real estate brokers - CIO Dive - February 7th, 2017 [February 7th, 2017]
- The Perks Of Automation And The Risks: Why To Think Twice About Getting Into That Driverless Uber - Forbes - February 7th, 2017 [February 7th, 2017]
- Voices Reinventing enterprise finance by overhauling AP automation - Accounting Today - February 7th, 2017 [February 7th, 2017]
- DFLabs Launches the First Security Automation and Orchestration Platform based Upon Supervised Active Intelligence - Business Wire (press release) - February 7th, 2017 [February 7th, 2017]
- VIDEO: Going Big on Automation in a Small Footprint Facility - ENGINEERING.com - February 7th, 2017 [February 7th, 2017]
- Building a better model of human-automation interaction - Phys.org - Phys.Org - February 7th, 2017 [February 7th, 2017]
- Cruise Automation Is Testing an App For Hailing Self-Driving Cars - Fortune - February 8th, 2017 [February 8th, 2017]
- AlixPartners examines automation in manufacturing and logistics management - Logistics Management - February 8th, 2017 [February 8th, 2017]
- Women need to look out for each other in automated workplaces - The Guardian - February 8th, 2017 [February 8th, 2017]
- Automation vs. the H-1B visa program: Which matters to employees? - TechTarget - February 8th, 2017 [February 8th, 2017]
- Automation is the unavoidable future of the economy - The Daily Cougar - February 8th, 2017 [February 8th, 2017]
- Speeders beware: Legislation would allow automation crackdown ... - SFGate - February 9th, 2017 [February 9th, 2017]
- Robots versus bureaucrats: Why public sector work is ripe for automation - Financial Post - February 9th, 2017 [February 9th, 2017]
- Rockwell Automation Surged 10% in January as Growth Picked Up Steam - Motley Fool - February 9th, 2017 [February 9th, 2017]
- Global Medical Automation Market to Reach Approximately $75.6 Billion by 2025 - By End User, Application ... - PR Newswire (press release) - February 10th, 2017 [February 10th, 2017]
- Automation 'key' to advancing Thai production - The Nation - February 10th, 2017 [February 10th, 2017]
- WorkWave Releases New Lead Management And Marketing ... - PR Newswire (press release) - February 10th, 2017 [February 10th, 2017]
- 'We employ insane levels of automation' Kris Canekeratne - Times of India - February 10th, 2017 [February 10th, 2017]
- Most people are optimistic about workplace automation, social data suggests - ZDNet - February 10th, 2017 [February 10th, 2017]
- Yes, there's a job creation argument for automation and technology ... - The Hill (blog) - February 10th, 2017 [February 10th, 2017]
- Technobabble: Automation and the modern worker - CIO Dive - February 10th, 2017 [February 10th, 2017]
- Improving Behavior Through Automation of Vehicle Systems - School Transportation News (blog) - February 11th, 2017 [February 11th, 2017]
- Automation Nightmare: Philosopher Warns We Are Creating a World Without Consciousness - Big Think - February 11th, 2017 [February 11th, 2017]
- Why Don't We See More Automation in Federal Networks? - Nextgov - February 11th, 2017 [February 11th, 2017]
- Automation can revitalize the US workforce - Fox News - February 11th, 2017 [February 11th, 2017]
- Readers Write (Feb. 12): The moose population; jobs, start-ups and automation; diversity in the funny pages - Minneapolis Star Tribune - February 12th, 2017 [February 12th, 2017]
- Automation can replace bureaucrats and save taxpayers money - Hot Air - February 12th, 2017 [February 12th, 2017]
- TigerStop hopes to ride automation to new heights - The Columbian - February 12th, 2017 [February 12th, 2017]
- Your Most Valuable Resource is Time Get More of it through Automation - CMS Critic (press release) (blog) - February 13th, 2017 [February 13th, 2017]
- What Does Device Automation Mean for Users? - Medical Device and Diagnostics Industry (blog) - February 13th, 2017 [February 13th, 2017]
- How To Beat Automation And Not Lose Your Job - Forbes - February 13th, 2017 [February 13th, 2017]
- Logistics firm gets automation boost - The Straits Times - February 14th, 2017 [February 14th, 2017]
- PP Control & Automation launch new video to kick-start exciting plans for 2017 - Manufacturer.com - February 14th, 2017 [February 14th, 2017]
- Automation's Impace on Data Center Monitoring Alerts - The Data Center Journal - February 14th, 2017 [February 14th, 2017]
- Hollysys Automation Technologies Reports Unaudited Financial Results for the First Half Year and the Second Quarter ... - PR Newswire (press release) - February 15th, 2017 [February 15th, 2017]
- 4 Automation Hacks to Save You Money and Manpower - Yahoo Finance - February 15th, 2017 [February 15th, 2017]
- Istuary Innovation Group and Bluewrist Partner to Bring Robotics and Automation into China's Manufacturing Sector - Yahoo Finance - February 15th, 2017 [February 15th, 2017]
- Redwood Software Named a Strong Performer in Independent Robotic Process Automation (RPA) Report - Yahoo Finance - February 15th, 2017 [February 15th, 2017]
- Boeing ramps up automation, innovation as it readies 737MAX | The ... - The Seattle Times - February 15th, 2017 [February 15th, 2017]
- Robots and AI are coming for our jobs, but can augmentation save us from automation? - Digital Trends - February 15th, 2017 [February 15th, 2017]
- The Impact of Bad Data in Automation: Why Quality Management is Critical - R & D Magazine - February 16th, 2017 [February 16th, 2017]
- Automation: Are We Empowering Human Interaction Or Displacing It? - Business 2 Community - February 16th, 2017 [February 16th, 2017]
- Life in the Fast LaneAutomation with Software-Defined Intelligence - InfoWorld - February 16th, 2017 [February 16th, 2017]
- Luddite Lefty Journalists Apparently Think Workplace Automation is Conservatives' Fault [VIDEO] - Daily Caller - February 16th, 2017 [February 16th, 2017]
- Will automation define the future of network technology? - TechTarget - February 16th, 2017 [February 16th, 2017]
- Editorial: Improving automation - The Motorship - February 17th, 2017 [February 17th, 2017]
- TigerText Unveils Role-based Scheduling Automation, Amazon Alexa integration - HIT Consultant - February 17th, 2017 [February 17th, 2017]
- 89% people want automation at workplace: Adobe - Economic Times - February 18th, 2017 [February 18th, 2017]
- Delta veers to EV parts, automation - Bangkok Post - February 18th, 2017 [February 18th, 2017]
- Robotic process automation makes nearshore outsourcing more ... - CIO - February 18th, 2017 [February 18th, 2017]
- The working-class job that Trump could save from automation - Washington Post - February 18th, 2017 [February 18th, 2017]
- China must be ready for automation - Basic Income News - February 18th, 2017 [February 18th, 2017]
- Bill Gates Says Robots Should Be Taxed Like Workers - Fortune - February 18th, 2017 [February 18th, 2017]
- Trump and automation challenge India's IT industry - VentureBeat - February 18th, 2017 [February 18th, 2017]
- Both Trump and Automation Are Challenging India's IT Industry - Fortune - February 20th, 2017 [February 20th, 2017]
- 89% people want automation at workplace: Adobe - ETCIO.com - February 20th, 2017 [February 20th, 2017]