Data Science: A Simple Introduction – What You Need to Know

People often use the term “Data Science” to mean anything from general data analysis to machine learning or Ai. But what really is data science? In this post, I will explain my view of data science as one of the first recipients of a data science Ph.D. I will provide a broad overview of data science. But, this blog focuses on STEM education and has more detailed posts on many of these topics if you would like to learn more.

So What Is It?

Data science is a three-part field, with elements from computer science, mathematics/statistics, and some domain of application. This Venn Diagram from thedatascientist.com gives a great visual:

Venn Diagram showing the overlap between computer science, math, and domain expertise resulting in data science

The image shows these three parent fields and how they overlap to form data science. I also like that this image depicts how those fields overlap in pairs to form the subfields of machine learning, software, and data analysis. Though people often use the term “data science” synonymously with “machine learning” or “data analysis”, it is its own field that uses elements from these subfields.

What sets data science apart is that it is an applied field. This means that (most) data scientists spend more time using machine learning and statistics tools than they spend developing them. The “domain knowledge” a data scientist has often determines the kind of data they work with. As opposed to a traditional statistician or data analyst, data scientists are trained in the field of application they work with. This means that data scientists who want to study, for example, public health data must learn public health methods. Domain knowledge is extremely important for data science, and I will discuss it more later in this post!

What truly sets data science apart is that it is an applied field.

That doesn’t mean data scientists know everything statisticians or computer scientists do. (I will be the first to admit that I have no business building a PC!) In the next few sections, I will explain the portions of data science that come from each of these domains.

I want to preface this with the disclaimer that data science is a diverse field. Each data scientist has a different set of skills. These are my observations and generalizations.

Data Science vs Computer Science

Many data science programs have evolved out of existing computer science departments. This is one of the reasons the two fields are so intertwined. Data scientists need a baseline knowledge of coding languages like Python or R to do their work. However, they don’t necessarily need to be expert software developers or computer architects. Job postings for data scientists more often ask for an applicant who is an expert in the SQL database language rather than game animation using JavaScript.

Though working with databases is certainly part of the job, data scientists do more than just manage and store data. They also manipulate and model the data. Often, they use machine learning algorithms to analyze data. Referring to the Venn Diagram above, machine learning sits in between math and computer science.

Data Science vs Mathematics and Statistics

Machine learning algorithms are based on a lot of mathematics, but this is not the only math a data scientist needs to learn. Most data scientists use applied math in their day-to-day work. Data scientists tend to have a more thorough statistics education than the average computer scientist. They apply statistical tests to their data and make graphs to present their work. They also need to understand how their data was collected, whether their sample is large enough and unbiased, and its limitations. All of these skills come from statistics.

Data Science and Domain Knowledge

It is important for a data scientist
to have deep knowledge of the problem
before working on a solution.

This brings us to the third and most general parent field: domain knowledge. A data scientist needs deep knowledge of the problem before working on a solution. Different fields have their own methods and tools that data scientists should understand before working on problems in those fields. They also have their own ways of collecting data. This is why I believe domain knowledge is just as critical for a data scientist as the other two parent fields.

Data scientists are not trained to simply come into a field, drop a solution, and walk away. Data scientists should work with researchers in other domains, community members, and other stakeholders to ensure results are correct, applicable, and useful for those affected by the work. They should also be able to communicate their results to these stakeholders in an understandable manner and be willing to adjust their models and solutions to fit the needs of stakeholders.

How Can I Learn More?

Many schools are developing data science programs at the undergraduate and graduate levels, and these programs are growing rapidly. (See this article from US News for more information.) There are even initiatives to develop a data science curriculum at the K-12 level, with most focusing on grades 7-12. Some curriculum developers like YouCubed and CourseKata provide their high school data science materials for free online. The material, though written for high schoolers, is relevant for anyone interested in an introduction to data science.

There are also many ways to get involved outside of a classroom.

First, there are a lot of websites that teach basic skills needed for data science. Coursera has a whole list of online programs teaching data science, including one from Johns Hopkins that I took back in 2016. Udemy has similar courses as well.

There are also sites like Codecademy or DataCamp that teach the basic coding skills which are the foundation for doing data science work.

Articles on websites like Towards Data Science and TheDataScientist can provide more information on data science as well as helpful tutorials.

Last but not least, if you enjoyed this post, please subscribe at this link to receive notifications when I post new articles. Also, please check out my YouTube channel for more data science content!

Leave a Reply

Your email address will not be published. Required fields are marked *