If youve ever called on Siri or Alexa for help, or generated a self-portrait in the style of a Renaissance painter, you have interacted with deep learning, a form of artificial intelligence that extracts patterns from mountains of data to make predictions. Though deep learning and AI have become household terms, the breakthroughs in statistics that have fueled this revolution are less known. In a recent paper,Andrew Gelman, a statistics professor at Columbia, andAki Vehtari, a computer science professor at Finlands Aalto University,published a listof the most important statistical ideas in the last 50 years.
Below, Gelman and Vehtari break down the list for those who may have snoozedthrough Statistics 101. Each idea can be viewed as a stand-in for an entire subfield, they say, with a few caveats: science is incremental; by singling out these works, they do not mean to diminish the importance of similar, related work.They have also chosen to focus on methods in statistics and machine learning, rather than equally important breakthroughs in statistical computing, and computer science and engineering, which have provided the tools and computing power for data analysis and visualization to become everyday practical tools. Finally, they have focused on methods, while recognizing that developments in theory and methods are often motivated by specific applications.
See something important thats missing? Tweet it at @columbiascience and Gelman and Vehtari will consider adding it to the list.
The 10 articles and books below all were published in the last 50 years and are listed in chronological order.
1.Hirotugu Akaike (1973).Information Theory and an Extension of the Maximum Likelihood Principle.Proceedings of the Second International Symposium on Information Theory.
This is the paper that introduced the term AIC (originally called An Information Criterion but now known as Akaike Information Criterion), for evaluating a models fit based on its estimated predictive accuracy.AIC was instantly recognized as a useful tool, and this paper was one of several published in the mid-1970s placing statistical inference within a predictive framework. We now recognize predictive validation as a fundamental principle in statistics and machine learning. Akaike was an applied statistician, who in the 1960s, tried to measure the roughness of airport runways, in the same way that Benoit Mandelbrot's early papers on taxonomy and Pareto distributions led to his later work on the mathematics of fractals.
2.John Tukey (1977).Exploratory Data Analysis.
This book has been hugely influential and is a fun read that can be digested in one sitting. Traditionally, data visualization and exploration were considered low-grade aspects of practical statistics; the glamour was in fitting models, proving theorems, and developing the theoretical properties of statistical procedures under various mathematical assumptions or constraints.Tukey flipped this notion on its head. He wrote about statistical tools not for confirming what we already knew (or thought we knew), and not for rejecting hypotheses that we never, or should never have, believed, but for discovering new and unexpected insights from data.His work motivated advances in network analysis, software, and theoretical perspectives that integrate confirmation, criticism, and discovery.
3.Grace Wahba (1978).Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression.Journal of the Royal Statistical Society.
Spline smoothing is an approach for fitting nonparametric curves. Another of Wahba's papers from this period is called "An automatic French curve," referring to a class of algorithms that can fit arbitrary smooth curves through data without overfitting to noise, or outliers. The idea may seem obvious now, but it was a major step forward in an era when the starting points for curve fitting were polynomials, exponentials, and other fixed forms.In addition to the direct applicability of splines, this paper was important theoretically. It served as a foundation for later work in nonparametric Bayesian inference by unifying ideas of regularization of high-dimensional models.
4. Bradley Efron (1979).Bootstrap Methods: Another Look at the Jackknife.Annals of Statistics.
Bootstrapping is a method for performing statistical inference without assumptions. The data pull themselves up by their bootstraps, as it were.But you can't make inference without assumptions; what made the bootstrap so useful and influential is that the assumptions came implicitly with the computational procedure: the audaciously simple idea of resampling the data.Each time you repeat the statistical procedure performed on the original data.As with many statistical methods of the past 50 years, this one became widely useful because of an explosion in computing power that allowed simulations to replace mathematical analysis.
5.Alan Gelfand and Adrian Smith (1990).Sampling-based Approaches to Calculating Marginal Densities.Journal of the American Statistical Association.
Another way that fast computing has revolutionized statistics and machine learning is through open-ended Bayesian models.Traditional statistical models are static: fit distribution A to data of type B.But modern statistical modeling has a more Tinkertoy quality that lets you flexibly solve problems as they arise by calling on libraries of distributions and transformations.We just need computational tools to fit these snapped-together models.In their influential paper, Gelfand and Smith did not develop any new tools; they demonstrated how Gibbs sampling could be used to fit a large class of statistical models.In recent decades, the Gibbs sampler has been replaced by Hamiltonian Monte Carlo, particle filtering, variational Bayes, and more elaborate algorithms, but the general principle of modular model-building has remained.
6.Guido Imbens and Joshua Angrist (1994).Identification and Estimation of Local Average Treatment Effects.Econometrica.
Causal inference is central to any problem in which the question isnt just a description (How have things been?) or prediction (What will happen next?), but a counterfactual (If we do X, what would happen to Y?).Causal methods have evolved with the rest of statistics and machine learning through exploration, modeling, and computation. But causal reasoning has the added challenge of asking about data that are impossible to measure (you can't both do X and not-X to the same person).As a result, a key idea in this field is identifying what questions can be reliably answered from a given experiment. Imbens and Angrist are economists who wrote an influential paper on what can be estimated when causal effects vary, and their ideas form the basis for much of the later work on this topic.
7.Robert Tibshirani (1996).Regression Shrinkage and Selection Via the Lasso.Journal of the Royal Statistical Society.
In regression, or predicting an outcome variable from a set of inputs or features, the challenge lies in including lots of inputs along with their interactions; the resulting estimation problem becomes statistically unstable because of the many different ways of combining these inputs to get reasonable predictions. Classical least squares or maximum likelihood estimates will be noisy and might not perform well on future data, and so various methods have been developed to constrain or regularize the fit to gain stability.In this paper, Tibshirani introduced lasso, a computationally efficient and now widely used approach to regularization, which has become a template for data-based regularization in more complicated models.
8.Leland Wilkinson (1999).The Grammar of Graphics.
In this book, Wilkinson, a statistician who's worked on several influential commercial software projects including SPSS and Tableau, lays out a framework for statistical graphics that goes beyond the usual focus on pie charts versus histograms, how to draw a scatterplot, and data ink and chartjunk, to abstractly explore how data and visualizations relate.This work has influenced statistics through many pathways, most notably through ggplot2 and the tidyverse family of packages in the computing language R. Its an important step toward integrating exploratory data and model analysis into data science workflow.
9.Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio (2014).Generative Adversarial Networks.Proceedings of the International Conference on Neural Information Processing Systems.
One of machine learnings stunning achievements in recent years is in real-time decision making through prediction and inference feedbacks. Famous examples include self-driving cars and DeepMinds AlphaGo, which trained itself to become the best Go player on Earth.Generativeadversarial networks, or GANs, are a conceptual advance that allow reinforcement learning problems to be solved automatically. They mark a step toward the longstanding goal of artificial general intelligence while also harnessing the power of parallel processing so that a program can train itself by playing millions of games against itself.At a conceptual level, GANs link prediction with generative models.
10.Yoshua Bengio, Yann LeCun, and Geoffrey Hinton (2015).Deep Learning.Nature.
Deep learning is a class of artificial neural network models that can be used to make flexible nonlinear predictions using a large number of features.Its building blockslogistic regression, multilevel structure, and Bayesian inferenceare hardly new. What makes this line of research so influential is the recognition that these models can be tuned to solve a variety of prediction problems, from consumer behavior to image analysis.As with other developments in statistics and machine learning, the tuning process was made possible only with the advent of fast parallel computing and statistical algorithms to harness this power to fit large models in real time.Conceptually, were still catching up with the power of these methods, which is why theres so much interest in interpretable machine learning.
More here:
Top 10 Ideas in Statistics That Have Powered the AI Revolution - Columbia University
- Chinese national arrested and charged with stealing AI trade secrets from Google - NPR - March 8th, 2024 [March 8th, 2024]
- President Biden Calls for Ban on AI Voice Impersonations During State of the Union - Variety - March 8th, 2024 [March 8th, 2024]
- Revolutionize Your Business with AWS Generative AI Competency Partners | Amazon Web Services - AWS Blog - March 8th, 2024 [March 8th, 2024]
- Broadcom Expects AI Demand to Help Offset Weakness Elsewhere - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- Micron Hits Record High With Analysts Calling It an 'Under-Appreciated AI Beneficiary' - Investopedia - March 8th, 2024 [March 8th, 2024]
- The Adams administration quietly hired its first AI czar. Who is he? - City & State New York - March 8th, 2024 [March 8th, 2024]
- AI likely to increase energy use and accelerate climate misinformation report - The Guardian - March 8th, 2024 [March 8th, 2024]
- This Artificial Intelligence (AI) Stock Could Double, and It Is Way Cheaper Than Nvidia - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- Fake images made to show Trump with Black supporters highlight concerns around AI and elections - The Associated Press - March 8th, 2024 [March 8th, 2024]
- Artificial intelligence and illusions of understanding in scientific research - Nature.com - March 8th, 2024 [March 8th, 2024]
- Analysis | House AI task force leaders take long view on regulating the tools - The Washington Post - March 8th, 2024 [March 8th, 2024]
- Don't Give Your Business Data to AI Companies - Dark Reading - March 8th, 2024 [March 8th, 2024]
- NIST, the lab at the center of Bidens AI safety push, is decaying - The Washington Post - March 8th, 2024 [March 8th, 2024]
- Essay | AI is Coming! Tips for Staying Calm and Carrying On - The Wall Street Journal - March 8th, 2024 [March 8th, 2024]
- AI can be easily used to make fake election photos - report - BBC.com - March 8th, 2024 [March 8th, 2024]
- 5 Artificial Intelligence (AI) Stocks That Could Make You a Millionaire - Yahoo Finance - March 8th, 2024 [March 8th, 2024]
- AI could be an extraordinary force for good. So why do our politicians still not have a plan? - The Guardian - March 8th, 2024 [March 8th, 2024]
- Mapping Disease Trajectories from Birth to Death with AI - Neuroscience News - March 8th, 2024 [March 8th, 2024]
- India plans 10,000-GPU sovereign AI supercomputer - The Register - March 8th, 2024 [March 8th, 2024]
- SAP enhances Datasphere and SAC for AI-driven transformation - CIO - March 8th, 2024 [March 8th, 2024]
- Jim Cramer names companies and sectors poised to rally on the AI wave - CNBC - March 8th, 2024 [March 8th, 2024]
- The job applicants shut out by AI: The interviewer sounded like Siri - The Guardian - March 8th, 2024 [March 8th, 2024]
- Microsoft confirms Surface and Windows AI event for March 21st - The Verge - March 8th, 2024 [March 8th, 2024]
- Adobes new Express app brings Firefly AI tools to iOS and Android - The Verge - March 8th, 2024 [March 8th, 2024]
- A Google AI Watched 30,000 Hours of Video GamesNow It Makes Its Own - Singularity Hub - March 8th, 2024 [March 8th, 2024]
- Palantir CEO Karp on TITAN, AI Warfare Technology - Bloomberg - March 8th, 2024 [March 8th, 2024]
- Elliptic Curve Murmurations Found With AI Take Flight - Quanta Magazine - March 8th, 2024 [March 8th, 2024]
- 5 AI Stocks to Buy in March 2024, According to Analysts - TipRanks.com - TipRanks - March 8th, 2024 [March 8th, 2024]
- Wix's new AI chatbot builds websites in seconds based on prompts - The Verge - March 8th, 2024 [March 8th, 2024]
- Amid record high energy demand, America is running out of electricity - The Washington Post - March 8th, 2024 [March 8th, 2024]
- AI Crypto Tokens in 5 Minutes: What to Know and Where to Start - Inc. - February 26th, 2024 [February 26th, 2024]
- 'The Worlds I See' by AI visionary Fei-Fei Li '99 selected as Princeton Pre-read - Princeton University - February 26th, 2024 [February 26th, 2024]
- AI is having a 1995 moment, analyst says - Business Insider - February 26th, 2024 [February 26th, 2024]
- Vatican research group's book outlines AI's 'brave new world' - National Catholic Reporter - February 26th, 2024 [February 26th, 2024]
- Honor's Magic 6 Pro launches internationally with AI-powered eye tracking on the way - The Verge - February 26th, 2024 [February 26th, 2024]
- Google explains Gemini's embarrassing AI pictures of diverse Nazis - The Verge - February 26th, 2024 [February 26th, 2024]
- Google cut a deal with Reddit for AI training data - The Verge - February 26th, 2024 [February 26th, 2024]
- What's the point of Elon Musk's AI company? - The Verge - February 26th, 2024 [February 26th, 2024]
- AI agents like Rabbit aim to book your vacation and order your Uber - NPR - February 26th, 2024 [February 26th, 2024]
- Announcing Microsofts open automation framework to red team generative AI Systems - Microsoft - February 26th, 2024 [February 26th, 2024]
- After Nvidia's latest blowout, here are 20 AI stocks expected to rise as much as 44% - Yahoo Finance - February 26th, 2024 [February 26th, 2024]
- 1 Exceptional AI Chip Stock Investors Need to Know About in 2024 - The Motley Fool - February 26th, 2024 [February 26th, 2024]
- Nvidia briefly hits $2 trillion valuation as AI frenzy grips Wall Street - Reuters - February 26th, 2024 [February 26th, 2024]
- AI Chatbots Can Guess Your Personal Information From What You ... - WIRED - October 18th, 2023 [October 18th, 2023]
- Harvard IT Launches Pilot of AI Sandbox to Enable Walled-Off Use ... - Harvard Crimson - October 18th, 2023 [October 18th, 2023]
- Advancing policing through AI: Insights from the global law ... - Police News - October 18th, 2023 [October 18th, 2023]
- Hochul announces new SUNY, IBM investments in AI - Olean Times Herald - October 18th, 2023 [October 18th, 2023]
- Nvidia's banking on TensorRT to expand its generative AI dominance - The Verge - October 18th, 2023 [October 18th, 2023]
- AI expands from MRFs to vehicles - Plastics Recycling Update - October 18th, 2023 [October 18th, 2023]
- AI Reads Ancient Scroll Charred by Mount Vesuvius in Tech First - Scientific American - October 18th, 2023 [October 18th, 2023]
- A DEEPer (squared) dive into AI Harvard Gazette - Harvard Gazette - October 18th, 2023 [October 18th, 2023]
- Florida bar weighs whether lawyers using AI need client consent - Reuters - October 18th, 2023 [October 18th, 2023]
- Cognizant and Vianai Systems Announce Strategic Partnership to ... - PR Newswire - October 18th, 2023 [October 18th, 2023]
- How AI could speed up scientific discoveries, from proteins to ... - NPR - October 18th, 2023 [October 18th, 2023]
- AI challenge to deliver better healthcare | Western Australian ... - Government of Western Australia - October 18th, 2023 [October 18th, 2023]
- Henry Kissinger: The Path to AI Arms Control - Foreign Affairs Magazine - October 18th, 2023 [October 18th, 2023]
- Stability AI releases StableStudio in latest push for open-source AI - The Verge - May 18th, 2023 [May 18th, 2023]
- Google CEO Sundar Pichai Predicts That This Profession Will Be ... - The Motley Fool - May 18th, 2023 [May 18th, 2023]
- Frances privacy watchdog eyes protection against data scraping in AI action plan - TechCrunch - May 18th, 2023 [May 18th, 2023]
- Investing in Hippocratic AI - Andreessen Horowitz - May 18th, 2023 [May 18th, 2023]
- As Alphabet flexes its AI prowess, there's a 'new elephant in the room' for Google - MarketWatch - May 18th, 2023 [May 18th, 2023]
- The Boring Future of Generative AI | WIRED - WIRED - May 18th, 2023 [May 18th, 2023]
- OpenAI readies new open-source AI model, The Information reports - Reuters.com - May 18th, 2023 [May 18th, 2023]
- What every CEO should know about generative AI - McKinsey - May 18th, 2023 [May 18th, 2023]
- AI creates images of the 'perfect' man and woman - Sky News - May 18th, 2023 [May 18th, 2023]
- Audit AI search tools now, before they skew research - Nature.com - May 18th, 2023 [May 18th, 2023]
- 3 Reasons C3.ai Stock Could Be Your Golden Ticket to the AI ... - InvestorPlace - May 18th, 2023 [May 18th, 2023]
- Zoom makes a big bet on AI with investment in Anthropic - VentureBeat - May 18th, 2023 [May 18th, 2023]
- AI voice phone scams are on the rise. Here's how to avoid them - USA TODAY - May 18th, 2023 [May 18th, 2023]
- Amazon is building an AI-powered conversational experience for ... - The Verge - May 18th, 2023 [May 18th, 2023]
- AI speculators need to 'differentiate between actual spending and investment' and hype: Strategist - Yahoo Finance - May 18th, 2023 [May 18th, 2023]
- AI Can Be Both Accurate and Transparent - HBR.org Daily - May 18th, 2023 [May 18th, 2023]
- You're Probably Underestimating AI Chatbots | WIRED - WIRED - May 18th, 2023 [May 18th, 2023]
- AI presents political peril for 2024 with threat to mislead voters - The Associated Press - May 18th, 2023 [May 18th, 2023]
- We need AI to help us face the challenges of the future - The Guardian - May 18th, 2023 [May 18th, 2023]
- End Of Googles Dominance? Stock Gets Rare Analyst Downgrade Over AI Fears - Forbes - May 18th, 2023 [May 18th, 2023]
- Watch 44 million atoms simulated using AI and a supercomputer - New Scientist - May 18th, 2023 [May 18th, 2023]
- AI Is The New Electricity: Bank Of America Picks 20 Stocks To Cash In On ChatGPT Hype - Forbes - March 2nd, 2023 [March 2nd, 2023]
- Tech Giants Are Barreling Headfirst Into an AI Arms Race - February 20th, 2023 [February 20th, 2023]
- Bing's AI Is Threatening Users. That's No Laughing Matter - TIME - February 20th, 2023 [February 20th, 2023]