October 21, 2020
What is Data Science, and what does a Data Scientist do?
Data is growing faster than ever before and is now considered to be the lifeblood of almost all businesses. It is been said that more than 90% of the world’s data has been created in the last few years alone. This availability of vast amounts of data has led companies in almost every industry to focus on exploiting data for competitive advantage. This is where Data Science comes into play.
What Is Data Science?
The term ‘Data Science’ could be referred to as a set of fundamental techniques that support and guide the extraction of insights and information from data. In other words, the goal of Data Science is to discover hidden patterns in raw data.
A trip into the history of Data Science
- 1962 – John W. Tukey wrote about the effect of modern-day electronic computing on data analysis as an empirical science in ‘The Future Of Data Analysis’
- 1974 – Peter Naur wrote ‘The Concise Survey Of Computer Methods’, using the term ‘Data Science’ repeatedly
- 1977 – The International Association for Statistical Computing (IASC) was formed
- 1989 – The Knowledge Discovery in Databases, which would mature into the ACM SIGKDD Conference in Knowledge Discovery and Data Mining, organized its first workshop
- 1999 – Jacob Zahavi showed the need for new tools to handle a vast number of data in businesses
- 2001 – SaaS was created and Willian S. Cleveland laid out plans for training data scientists to meet the needs of the future
- 2006 – Hadoop 0.1.0, an open-source, non-relational database was released
- 2008 – ‘ Data Scientist’ became a buzzword
- 2015 – Google’s speech recognition, Google Voice experienced a dramatic performance jump of 49% using Deep Learning techniques
Data Science shouldn’t be confused with Business Analytics. It’s true that both involve in extracting insights from data to enhance business performance, but knowing how these two fields differ is important. Simply put, Data Science is the study of data using statistics, algorithms, and technology to find solutions and predict outcomes for a business problem, whereas Business Analytics is the study of business data to provide data-driven recommendations to the company. Let’s look further into the difference between the two.
|The Science of data–study using statistics, algorithms & technology|
|Finds solutions and predict outcomes for a business problem|
|Deals with both structured & unstructured data|
|Studies trends and various patterns in data|
|The study of business data|
|Provides data-driven recommendations to the company|
|Deals mostly with structured data|
|Studies on various business problems|
Life cycle of Data Science
Each and every data-driven business decision-making problem is unique and would take several months to complete but they all carry a similar workflow. From understanding the business problem to data mining, analysis, and presenting the results, each step should be given proper attention, time, and effort. The following reflects the stages of a Data Science life cycle. It is important to know that this process is never straightforward; it is a repetitive process that tries to get the best possible results in order to satisfy the client.
- Understanding the business problem – This is a vital step since understanding the problem very well will make the objectives of the analysis clear and will eventually lead to the final goal.
- Data mining – Here relevant data is collected/ scraped for analysis.
- Data cleaning and exploration – The data collected is usually messy. Therefore, it is necessary to clean the data; treating missing values, removing irrelevant data etc. Before building the model, a proper descriptive analysis is carried out to visualize the data.
- Model building and evaluation – Train Machine Learning models (ML) and evaluate their performances using unseen data.
- Model deployment – The final model is deployed in the desired format and channel.
figure 1: The life cycle of Data Science
Role of ML & AI in the field Of Data Science
Obtaining solutions to business problems with a considerable level of accuracy is challenging. To lessen the burden, Machine Learning which is a subset of Artificial Intelligence is used. ML provides the ability to automatically learn and improve from data, identify patterns, and make decisions with minimal human intervention. When working with datasets, various Machine Learning algorithms are used to learn from the data extracted and then forecast future trends.
Through the use of ML in Data Science, it is made possible to obtain high-value predictions that can guide better decisions and smart actions in real-time outside of human interaction.
What does a Data Scientist do?
A Data Scientist is seen as someone who is always curious, creative and systematic at work, willing to experiment and as someone who could communicate his/her ideas easily. These traits will be an added advantage for one, as they work closely with business stakeholders in the field to understand their expectations. They then look into ways to fish out answers to address business questions from data.
A Data Scientist is not the same as a Business Analyst. They differ significantly. Business Analysts are responsible in understanding business requirements and developing actionable insights whereas a Data Scientist is the person responsible for analyzing, preparing, formatting and maintaining information. The following table summarizes the required set of skills for both parties.
|Data Scientist||Business Analyst|
|Machine Learning||Very Important||Not that Important|
|Data Wrangling||Very Important||Not that Important|
|Statistics||Very Important||Somewhat Important|
|Data Intuition||Very Important||Somewhat Important|
|Data Visualization & Communication||Very Important||Very Important|
|Programming Tools||Very Important||Very Important|
Figure II: Data Scientist Vs Business Analyst
As seen in the table above, the role of a Data Scientist requires a combination of technical, mathematical, and storytelling skills. Going one level deeper, a Data Scientist should gear up with the knowledge for;
- Statistics & Machine Learning
- Coding languages such as R or Python
- Databases such as MySQL
- Data Visualization techniques etc.
But for a company planning to capitalize on big data, having a Data Scientist is crucial since it is a Data Scientist who helps decision makers shift from ad hoc analysis to an ongoing conversation with data.
Big data shows no sign of slowing and so does the importance of a Data Scientist. It is been said in an article by Forbes that by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. This is a massive number! This very fact emphasizes on the importance of knowing your data and what information could be extracted from it. After all, here is where our future lies.