Senuri Gunaratne
October 21, 2020

What is Data Science, and what does a Data Scientist do?

Data is growing faster than ever before and is now considered to be the lifeblood of almost all businesses. It is been said that more than 90% of the world’s data has been created in the last few years alone. This availability of vast amounts of data has led companies in almost every industry to focus on exploiting data for competitive advantage. This is where Data Science comes into play.

What Is Data Science?   

The term ‘Data Science’ could be referred to as a set of fundamental techniques that support and guide the extraction of insights and information from data. In other words, the goal of Data Science is to discover hidden patterns in raw data. 

A trip into the history of Data Science  

  • 1962 –  John W. Tukey wrote about the effect of modern-day electronic computing on data analysis as an empirical science in ‘The Future Of Data Analysis’ 
  • 1974 –  Peter Naur wrote ‘The Concise Survey Of Computer Methods’, using the term ‘Data Science’ repeatedly 
  • 1977 –  The International Association for Statistical Computing (IASC) was formed 
  • 1989 –  The Knowledge Discovery in Databases, which would mature into the ACM SIGKDD Conference in Knowledge Discovery and Data Mining, organized its first workshop 
  • 1999 –  Jacob Zahavi showed the need for new tools to handle a vast number of data in businesses 
  • 2001 –  SaaS was created and Willian S. Cleveland laid out plans for training data scientists to meet the needs of the future 
  • 2006 –  Hadoop 0.1.0, an open-source, non-relational database was released 
  • 2008 – ‘ Data Scientist’ became a buzzword  
  • 2015 –  Google’s speech recognition, Google Voice experienced a dramatic performance jump of 49% using Deep Learning techniques 
Data Science Vs Business Analytics    

Data Science shouldn’t be confused with Business Analytics. It’s true that both involve in extracting insights from data to enhance business performance, but knowing how these two fields differ is important. Simply put, Data Science is the study of data using statistics, algorithms, and technology to find solutions and predict outcomes for a business problem, whereas Business Analytics is the study of business data to provide data-driven recommendations to the company. Let’s look further into the difference between the two. 

Data Science
The Science of datastudy using statistics, algorithms & technology  
Finds solutions and predict outcomes for a business problem 
Deals with both structured & unstructured data 
Studies trends and various patterns in data 
Business Analytics
The study of business data  
Provides data-driven recommendations to the company 
Deals mostly with structured data 
Studies on various business problems 

Life cycle of Data Science

Each and every data-driven business decision-making problem is unique and would take several months to complete but they all carry a similar workflow. From understanding the business problem to data mining, analysis, and presenting the results, each step should be given proper attention, time, and effort. The following reflects the stages of a Data Science life cycle. It is important to know that this process is never straightforward; it is a repetitive process that tries to get the best possible results in order to satisfy the client. 

lifecycle of Data Science
  1. Understanding the business problem – This is a vital step since understanding the problem very well will make the objectives of the analysis clear and will eventually lead to the final goal. 
  2. Data mining – Here relevant data is collected/ scraped for analysis. 
  3. Data cleaning and exploratio– The data collected is usually messy. Therefore, it is necessary to clean the data; treating missing values, removing irrelevant data etc. Before building the model, a proper descriptive analysis is carried out to visualize the data. 
  4. Model building and evaluatio– Train Machine Learning models (ML) and evaluate their performances using unseen data.  
  5. Model deployment – The final model is deployed in the desired format and channel. 

                           figure 1: The life cycle of Data Science

Role of ML & AI in the field Of Data Science 

Obtaining solutions to business problems with a considerable level of accuracy is challenging. To lessen the burden, Machine Learning which is a subset of Artificial Intelligence is used. ML provides the ability to automatically learn and improve from data, identify patterns, and make decisions with minimal human intervention. When working with datasets, various Machine Learning algorithms are used to learn from the data extracted and then forecast future trends.  

Through the use of ML in Data Science, it is made possible to obtain high-value predictions that can guide better decisions and smart actions in real-time outside of human interaction.  

What does a Data Scientist do? 

Data is like a massive wave starting to crest. If you want to catch it, you need people who can surf and Data Scientists are the ones for it. According to the Harvard Business Review, ‘Data Scientist’ is said to be the sexiest job of the 21st century and it is now fast gaining prominence in organizations.  
Who Is A Data Scientist? 

A Data Scientist is seen as someone who is always curious, creative and systematic at work, willing to experiment and as someone who could communicate his/her ideas easily. These traits will be an added advantage for one, as they work closely with business stakeholders in the field to understand their expectations. They then look into ways to fish out answers to address business questions from data.  

Data Scientist Vs Business Analyst 

Data Scientist is not the same as Business Analyst. They differ significantly. Business Analysts are responsible in understanding business requirements and developing actionable insights whereas a Data Scientist is the person responsible for analyzing, preparing, formatting and maintaining information. The following table summarizes the required set of skills for both parties. 

Data Scientist Business Analyst
Machine Learning  Very Important Not that Important 
Data Wrangling Very Important Not that Important 
Statistics Very Important Somewhat Important 
Data Intuition Very Important Somewhat Important 
Data Visualization & Communication Very Important Very Important 
Programming Tools Very Important  Very Important 

Figure II: Data Scientist Vs Business Analyst  

Skills & Tools Of A Data Scientist 

As seen in the table above, the role of a Data Scientist requires a combination of technical, mathematical, and storytelling skills. Going one level deeper, a Data Scientist should gear up with the knowledge for; 

  • Statistics & Machine Learning 
  • Coding languages such as R or Python 
  • Databases such as MySQL 
  • Data Visualization techniques etc.  

But for a company planning to capitalize on big data, having a Data Scientist is crucial since it is a Data Scientist who helps decision makers shift from ad hoc analysis to an ongoing conversation with data. 

 

Conclusion 

Big data shows no sign of slowing and so does the importance of a Data Scientist. It is been said in an article by Forbes that by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. This is a massive number! This very fact emphasizes on the importance of knowing your data and what information could be extracted from it. After all, here is where our future lies. 

Sri Lanka
No. 63/1,
Dhrmapala Mawatha,
Colombo-07,
Sri Lanka.
Singapore
No. 9, Raffles Place,
#27-00,
Republic Plaza,
Singapore, 048619.
Indonesia
Menara Digitaraya,
Jl. Raden Saleh Raya,
No. 46A, Jakarta,
Indonesia.