| by Wiley

What is Data Science

What is Data ScienceWhat is Data Science

The Data Scientist
The data scientist has a unique role in industry, government, and other organizations. That role is different from others such as statistician or business analyst.

Data Scientist Versus Statistician
Many statisticians think that data science is about analyzing data, but it is more than that. Data science also involves implementing algorithms that process data automatically, and to provide automated predictions and actions, such as the following:

  • Analyzing NASA pictures to find new planets or asteroids
  • Automated bidding systems
  • Automated piloting (planes and cars)
  • Book and friend recommendations on Amazon.com or Facebook
  • Client-customized pricing system (in real time) for all hotel rooms
  • Computational chemistry to simulate new molecules for cancer treatment
  • Early detection of an epidemic
  • Estimating (in real time) the value of all houses in the United States (Zillow.com)
  • High-frequency trading
  • Matching a Google Ad with a user and a web page to maximize chances of conversion
  • Returning highly relevant results to any Google search
  • Scoring all credit card transactions (fraud detection)
  • Tax fraud detection and detection of terrorism
  • Weather forecasts

All of these involve both statistical science and terabytes of data. Most people doing these types of projects do not call themselves statisticians. They call themselves data scientists.

Statisticians have been gathering data and performing linear regressions for several centuries. DAD performed by statisticians 300 years ago, 20 years ago, today, or in 2015 for that matter, has little to do with DAD performed by data scientists today. The key message here is that eventually, as more statisticians pick up on these new skills and more data scientists pick up on statistical science the frontier between data scientist and statistician will blur. Indeed, I can see a new category of data scientist emerging: data scientists with strong statistical knowledge.

What also makes data scientists different from computer scientists is that they have a much stronger statistics background, especially in computational statistics, but sometimes also in experimental design, sampling, and Monte Carlo simulations.

Data Scientist Versus Business Analyst
Business analysts focus on database design (database modeling at a high level, including defining metrics, dashboard design, retrieving and producing executive reports, and designing alarm systems), ROI assessment on various business projects and expenditures, and budget issues. Some work on marketing or finance planning and optimization, and risk management. Many work on high-level project management, reporting directly to the company’s executives.

Some of these tasks are performed by data scientists as well, particularly in smaller companies: metric creation and definition, high-level database design (which data should be collected and how), or computational marketing, even growth hacking (a word recently coined to describe the art of growing Internet traffic exponentially fast, which can involve engineering and analytic skills).

There is also room for data scientists to help the business analyst, for instance by helping automate the production of reports, and make data extraction much faster. You can teach a business analyst FTP and fundamental UNIX commands:

ls -l, rm -i, head, tail, cat, cp, mv, sort, grep, uniq -c, and the pipe and redirect operators (|, >). Then you write and install a piece of code on the database server (the server accessed by the business analyst traditionally via a browser or tools such as Toad or Brio) to retrieve data. Then, all the business analyst has to do is:

  1. Create an SQL query (even with visual tools) and save it as an SQL text file.
  2. Upload it to the server and run the program (for instance a Python script, which reads the SQL file and executes it, retrieves the data, and stores the results in a CSV file).
  3. Transfer the output (CSV file) to his machine for further analysis.

Such collaboration is win-win for the business analyst and the data scientist. In practice, it has helped business analysts extract data 100 times bigger than what they are used to, and 10 times faster.

In summary, data scientists are not business analysts, but they can greatly help them, including automating the business analyst’s tasks. Also, a data scientist might find it easier get a job if he/she can bring the extra value and experience described here, especially in a company where there is a budget for one position only, and the employer is unsure whether hiring a business analyst (carrying overall analytic and data tasks) or a data scientist (who is business savvy and can perform some of the tasks traditionally assigned to business analysts). In general, business analysts are hired first, and if data and algorithms become too complex, a data scientist is brought in. If you create your own startup, you need to wear both hats: data scientist and business analyst.

Vincent Granville
Developing Analytic Talent: Becoming a Data Scientist

ISBN: 978-1-118-81008-8
May 2014

Industry Insights features highly accessible and practical content from experts in the marketing services sector providing you with tools and resources to improve your business performance. If you would like to submit a report to the section contact sales@thedrum.com