If you’re a business leader with access to a technology budget, there are a handful of phrases that have suddenly become impossible to ignore over the past decade. You have no choice but to act like you understand what they mean.
Here’s an inexhaustive list, in roughly the order that they blew up:
- Big data;
- Predictive analytics;
- Data science;
- Machine learning;
- Deep learning;
- Artificial intelligence;
At every conference, you’ll find some industry leaders declaring “X is dead, Y is the future”—where “X” and “Y” are both items from the list above, and where both X and Y are neither dead nor the future.
The newest terms that have been thrown into the fray are probably Deep Learning and Artificial Intelligence. And they are bandied about with reckless abandon. What are they? What’s the difference between them? What’s the difference between both of them and machine learning?
If these sound like questions you’re asking yourself, here’s what you need to know to understand what’s going on.
Big data, predictive analytics, data science, and machine learning
All of these terms appear to have peaked in their popularity. They started to blow up in roughly 2006, 2008, 2010, 2012 and 2020, respectively, and are so oversaturated in their use in marketing material that people have stopped using them. We’ve discussed each of them in future detailed articles.
The hot terms these days are machine learning, deep learning, and artificial intelligence. I’d venture to say that machine learning is on its way out. Nonetheless, keep reading Voxpow blog to understand what each of them means.
Big data is a field that treats ways to analyze, systematically extract information from, or otherwise, deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy, and data source. Big data was initially associated with three key concepts: volume, variety, and velocity. When we handle big data, we may not sample but simply observe and track what happens. Therefore, big data often includes data with sizes that exceed traditional software's capacity to process within an acceptable time and value.
Predictive analytics is an area of statistics that deals with extracting information from data and predicting trends and behavior patterns. The enhancement of predictive web analytics calculates statistical probabilities of future events online. Predictive analytics statistical techniques include data modeling, machine learning, AI, deep learning algorithms, and data mining. Often, the unknown event of interest is in the future, but predictive analytics can be applied to any unknown type, whether in the past, present, or future. For example, identifying suspects after a crime has been committed, or credit card fraud occurs. The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences and exploiting them to predict the unknown outcome. However, it is essential to note that the accuracy and usability of results will depend significantly on the level of data analysis and the quality of assumptions.
Predictive analytics is often defined as predicting at a more detailed level of granularity, i.e., generating predictive scores (probabilities) for each individual, organizational element. This distinguishes it from forecasting. For example, "Predictive analytics—Technology that learns from experience (data) to predict individuals' future behavior to drive better decisions." In future industrial systems, the value of predictive analytics will be to predict and prevent potential issues from achieving near-zero break-down and further be integrated into prescriptive analytics for decision optimization.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning, and big data.
Data science is a "concept to unify statistics, data analysis, machine learning, domain knowledge, and their related methods" to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within mathematics, statistics, computer science, domain knowledge, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational, and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.
Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data" to make predictions or decisions without being explicitly programmed. Machine learning algorithms are used in various applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.
Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory, and application domains to machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also referred to as predictive analytics.