Descriptive and inferential statistics in data analysis

Statistics studies the behaviour of data. Thanks to this exact science we can build our conclusions about these data according to the different variables that have been applied. Thus, when dealing with population studies, there are two approaches on which to base analyses: what is the difference between descriptive and inferential statistics, and how do these types of study interfere with big data analysis?

Descriptive and inferential statistics: to describe or to analyse

As a tool in various fields of study, we make use of statistical science, which is indispensable for drawing conclusions on various topics. The object of study may have to do with:

the behaviour of groups of people (as in the case of studies carried out in sociology)
the behaviour of data of a more scientific nature that does not derive from human attitudes.

Once we have identified the objective to work on, we need to collect data on which we will have to decide which approach we will use to analyse it: descriptive or inferential statistics? The former tends to make a description of the data and the latter makes what are called inferences, seeking to go beyond a description.

Features of descriptive and inferential statistics

There is no one statistical method that is more valid than the other, but the choice of which one to use depends on what you want to study or the type of application you are investigating. Let us look at each concept in more detail.

If you think of a census-type population list at a specific time that contains the personal data of the people living at each address in each street of each population nucleus, what you are doing here is descriptive statistics.

If a portion of the same census data is subsequently taken and certain conclusions are drawn from it using arithmetic operations, this is inferential statistics.

Descriptive statistics

Traditional statistics is descriptive statistics. The approach it proposes is the analysis of the variables decided upon in order to then proceed to a description of the data. It is therefore said to be based on precision. This type of statistics aims to organise and establish a classification of the data obtained from a population group for example.

A categorisation can be established within it and the use of certain technical concepts:

Dispersion
Within the framework of a given variable, a distance between values occurs. This difference is called dispersion.

Average
The average is the mean and thus the trend within a variable. It is the result of dividing the sum of the values by the number of values.

Bias
The skewness or kurtosis of a variable is the quality of the data curve. That is, the value that has to do with the distance and closeness of the data to the average.

Graphs
The materialisation or presentation of the data resulting from an analysis takes the form of a graphical representation. There are various types of graphs: bar graphs, circle graphs, line graphs, polygon graphs, etc.

Asymmetry
The different data of a variable are distributed with respect to the average in a certain way; this value is called asymmetry.

Inferential statistics

Inferential statistics observes a sample of data and draws conclusions that it applies to the whole through inferences. This type of approach, being the result of a probabilistic calculation, carries a certain margin of error.

The analyses performed by this type of statistics aim to be able to predict the behaviour of certain information. This is where probability models and machine learning and artificial intelligence techniques come in, as well as predictive models.

Inferential statistics can be categorised into two main groups:

Hypothesis testing
The aim is to validate those conclusions that have been built with respect to that portion of the data studied.

Confidence intervals
These are random values that serve to identify the margins of error that may exist. They are usually a pair of numbers or several pairs of numbers between which it is estimated that a specific value is likely to be found.

Thus, descriptive and inferential statistics are different tools within this science of analysis. The former collects data in order to be able to display them in summary form. The description of the data can be of a set of the entire population or of a subset of the population. The point is that the conclusions drawn are 100% valid because they are based on the description of all the data of a whole defined group.

Inferential statistics, on the other hand, takes a sample and establishes the probability of a conclusion. The data are probabilistic in nature and some error must be assumed.

Descriptive and inferential statistics are not opposite sides of the same coin but different ways of approaching data. Traditionally, statistics has been identified with data collection. However, statistics is advancing and updating with the times, and today it involves tools and approaches that have to do with computation and that, in fact, lay the foundations for the development of artificial intelligence and machine learning.