I was trying to find some good domain name for our upcoming business science website, when something suddenly became clear to me. Many of us have been confused for a long time about what data science means, how it is different from statistics, machine learning, data mining, or operations research, and the rise of the data scientist light - a new species of coders who call themselves data scientist after a few hours of Python/R training, working on a small project at best, and spending $200 for their training. Thedata scientist light is not a real one, even though I believe that you can learn data science from scratch on the job, just as I did.
This introduction brings me to the ABCD's, and the arguments are further developed in my conclusion below. These four domains are certainly overlapping. But I believe that identifying them brings more clarity about roles differentiation and collaboration.
- Analytics Science. Deals with modern statistical modeling, predictive modeling, model-free (data-driven) statistics, root cause analysis, defining and selecting metrics, and traditional techniques such as clustering, SVM, linear regression, K-NN (whether you call it machine learning, AI, statistics or data science). Analytics scientists are true geeks with generic knowledge applicable to many domains. They may work on small or big data. Analytics science, unlike data science, is well documented in college textbooks.
- Business Science. Deals with principles, both theoretical and applied, where domain expertise and deep cross-departments business understanding is critical. The purpose is to leverage analytics to deliver added value or increased profits. Business scientists might spend little time coding, unlike the three other categories. Examples of business science applications can be found in this article, and also in this article. It may overlap with BI, six sigmas and operations research. So it can definitely involve a great deal of statistical modeling, modern or not.
- Computer Science: Deals with architecture (including real-time, distributed and cross-platforms such as IoT), algorithm design and refinement, platform design, Internet and communications protocols, data standards, systems engineering, software engineering and prototyping.
- Data Science: Deals with data identification, collection, cleaning, summarizing, and insights extraction - even dashboards and visualizations that help with the executive decision process. Also includes advanced algorithms for big data, sensor data (IoT), black-box analytics, batch-mode analytics, automation of analytics processes. API's, analytics-driven systems based on machine-to-machine communications (for instance, automated bidding, fraud monitoring.) Typically, simple black-box, machine-controlled techniques are more difficult to design than complex man-controlled analyses, because they must be made very robust, as opposed to very accurate. Many data scientists also know quite a bit of analytics science, especially modern principles (typically not published in college textbooks) to deal with big, unstructured, fast flowing data. While in some ways computer scientists make data alive, data scientists take it from there and make it intelligent.
I finally decided to call myself business scientist, as my experience is more and more aligned with this domain (being an entrepreneur), though, like many of us here, I have significant knowledge and expertise in all four domains, especially in data science and analytics science. My motivation to call myself abusiness scientist is also partly to not be confused with a data scientist light. This erroneous statement is sometimes brought against us (real) data scientists, by a minority of vocal analytics scientists. I believe that we need to dispel this myth. Part of the reason, I believe, is because math-free solutions that in addition, trade accuracy for robustness (in order to fit in black-box systems or be usable by the layman) are not respected by some traditional statisticians, who erroneously believe that automation and/or removing statistical jargon and mathematical background, is not possible. Maybe because it could jeopardize their jobs?
In the end, I want to make data science accessible to everyone, not to an elite of initiated, change-adverse professionals. It requires a new, unified, simple, efficient, math-free or math-light (but not data science light) approach to analytics problems and solutions, as well as algorithmic ingeniosity. This is feasible, but more difficult than producing extremely complicated statistical models - which is what I was doing earlier in my career.
About the author: Vincent Granville worked for Visa, eBay, Microsoft, Wells Fargo, NBC, a few startups and various organizations, to optimize business problems, boost ROI or to develop ROI attribution models, developing new techniques and systems to leverage modern big data and deliver added value. Vincent owns several patents, published in top scientific journals, raised VC funding, and founded a few startups. Vincent also manages his own self-funded research lab, focusing on simplifying, unifying, modernizing, automating, scaling, and dramatically optimizing statistical techniques. Vincent's focus is on producing robust, automatable tools, API's and algorithms that can be used and understood by the layman, and at the same time adapted to modern big, fast-flowing, unstructured data. Vincent is a post-graduate from Cambridge University.