Data Analysis and Interpretation
Data Analysis
Data analysis is used to examine collected data and form insightful conclusions about them.
Descriptive Statistics
- These are used to describe a dataset
- It can include measures of central tendency (mean, median, mode), that tell us about the typical response.
- It can also include measures of dispersion (range, standard deviation, variance) that tell us how spread out the data is.
- It can help with summarizing data, comparing data, and cleaning data.
Inferential Statistics
- These aid in generalization of data. We can draw conclusions, make predictions, and conduct hypothesis testing.
- Example of hypothesis testing are ANOVA (Analysis of Variance) tests, t tests, chi squared tests, etc.
- Each test is typically suited for different research designs.
- For example, ANOVA tests typically compare results with a p value which indicates the probability that the null hypothesis is true.
- If p = 0.05, and the result is greater than 0.05, the chance that the null hypothesis is true is too high and we reject it.
- Likewise, if p is less than 0.05, we fail to reject the null hypothesis.
- Think of statistics used to describe a photograph of basketball players.
- Descriptive statistics would measure the average height of the players in the image.
- On the other hand, inferential statistics would measure the ability to generalize the heights to outside populations.
Correlation Coefficients
- Used to quantify the direction and strength of the relationship between two variables.
- The most used test is Pearson's Correlation Coefficient, which measures correlations from perfectly negative (-1) to perfectly positive (+1) with 0 denoting no correlation.
- Correlation never implies causation, but these tests can be useful for predicting outcomes.
Thematic Analysis
- A qualitative data analysis method where common themes are identified within answers.
- Typically themes are identified, a coding system is created, researchers systematically go through the data and code it, and the data is reviewed and re-coded repeatedly until there is nothing left.
- After this, themes are polished and presented.
- Descriptive statistics, inferential statistics, and correlation coefficients are all quantitative methods of data analysis, whereas thematic analysis is qualitative.
Methods of Presenting Data
Bar Graphs
- Typically used to compare quantities from different groups.
- The independent variable is along the X-axis (horizontal) and the dependent variable is along the Y-axis (vertical). Each bar is one category of the independent variable.
- However, they can be misleading if done poorly.
Box and Whisker Plots
- Display the distribution of a dataset.
- Typically denotes the median, first quartile, third quartile, as well as the minimum and maximum values.
- It is used to identify outliers and understand skewness of data.
In a box and whisker plot, the box tells us about the interquartile range where the whisker extends to the minimum and maximum values.
Distributions
- Displays the distribution of a dataset as well as assessing skewness.
- A normal distribution is a symmetric, bell-shaped curve, that assume that values taper off equally from the mean on both sides.
IQ tests typically assume a normal distribution of results among the population, with the mean being 100.
Frequency Tables
- Counts the frequency of a certain outcomes in a dataset, typically in the format of a table.
- Typically used for understanding the results of surveys, where the independent variable is categorical and the dependent variable is quantitative.
Histograms
- Demonstrates distribution through bars, where the bars touch each other.
- The X-axis is usually a continuous independent variable, and the Y-axis is the dependent variable of interest.
Do not confuse bar graphs for histograms. Bar graphs typically have categorical independent variables, whereas histograms have continuous numerical data, which is why the bars touch.
Line Graphs
- Typically used to display changes over time.
- They are ideal for longitudinal studies.
- The X-axis is usually a continuous variable (such as time), and the Y-axis is the dependent variable of interest.
Scatterplots
- Used to display correlational data between two continuous variables.
- The general pattern of data points is analyzed, where there can be a positive, a negative, or no correlation.
- Correlation should never imply causation.
- Ice cream sales and crime both increase in the summer, therefore they are correlated.
- Instead of inferring that ice cream purchases cause crime, we can attribute both variables to warm weather.
- This is an example of the third variable problem.
- How are bar graphs and histograms different?
- Is thematic analysis qualitative or quantitative? Can you define it?


