Demystifying the Paired t-Test Formula: A Comprehensive Guide
A t-test, also known as Student’s t-test, serves as a statistical tool to evaluate the difference between the means of two groups. It focuses on a single numerical variable, rather than counts or relationships among multiple variables.
The Genesis of the t-Test
The t-test was devised by William Sealy Gosset, an English statistician employed at the Guinness Brewery in Dublin. In 1908, Gosset published his work on the t-test in the Biometrika journal.
When to Use a Paired t-Test
Opt for the paired t-test when you have two measurements on the same item, person, or thing, or when you have two items measured under a unique condition. Dependent samples are essentially connected - they are tests on the same person or thing. For example, you might be measuring car safety performance in vehicle research and testing and subject the cars to a series of crash tests. The paired sample t-test, sometimes called the dependent sample t-test, is a statistical procedure. Specifically, it determines whether the mean difference between two sets of observations is zero. In a paired sample t-test, one should measure each subject or entity twice, resulting in pairs of observations. Common applications of the paired sample t-test include case-control studies or repeated-measures designs. Suppose if you want to evaluate the effectiveness of a company training program, you might follow the following approach.
In contrast, a “regular” two-sample t-test compares the means of two distinct samples. But if you take a random sample of each group separately and they have different conditions, your samples are independent and you should run an independent samples t test (also called between-samples and unpaired-samples). For example, you might test two separate groups of customer service associates on a business-related test or assess students from two universities on their English skills. The null hypothesis for the independent samples t-test is μ1 = μ2, assuming the means are equal.
Core Assumptions of the Paired t-Test
Like many statistical procedures, the paired sample t-test has two competing hypotheses, the null hypothesis and the alternative hypothesis. The null hypothesis assumes that the true mean difference between the paired samples is zero. Under this model, all observable differences are explained by random variation. Conversely, the alternative hypothesis assumes that the true mean difference between the paired samples is not equal to zero. As a result, the alternative hypothesis can take one of several forms depending on the expected outcome. If the direction of the difference does not matter, one should use a two-tailed hypothesis. Otherwise, the power of the test increases by an upper-tailed or lower-tailed hypothesis. The null hypothesis remains the same for each type of alternative hypothesis.
Read also: Examples of Paired Associate Learning
As a parametric procedure (a procedure which estimates unknown parameters), the paired sample t-test makes several assumptions. Although t-tests are quite robust, it is good practice to evaluate the degree of deviation from these assumptions in order to assess the quality of the results. In a paired sample t-test, the observations are defined as the differences between two sets of values. Hence, each assumption refers to these differences, and not the original data values.
Data Type
As the paired sample t-test is based on the normal distribution, it requires the sample data to be numeric and continuous. So, the continuous data can take on any value within a range (income, height, weight, etc.). The opposite of continuous data is discrete data, which can only take on a few values (Low, Medium, High, etc.).
Independence
You usually cannot test the independence of observations, but you can reasonably assume it if the data collection process was random and without replacement.
Normality
To test the assumption of normality, a variety of methods are available, but the simplest is to inspect the data visually using a tool like a histogram. Real-world data are almost never perfectly normal, so the consideration of this assumption reasonably met if the shape looks approximately symmetric and bell-shaped.
Outliers
Outliers are rare values that appear far away from the majority of the data. They can bias the results and potentially lead to incorrect conclusions if not handled properly. One method for dealing with outliers is to simply remove them. However, removing data points can introduce other types of bias into the results, and potentially result in losing critical information. If outliers seem to have a lot of influence on the results, a nonparametric test such as the Wilcoxon Signed Rank Test may be appropriate to use instead.
Read also: Student Accessibility Services at USF
Calculating the Paired t-Test: A Step-by-Step Approach
Here’s how you can calculate a paired t-test by hand:
- Subtract each Y score from each X score.
- Add up all of the values from Step 1 then set this number aside for a moment.
- Square the differences from Step 1.
- Add up all of the squared differences from Step 3. If you’re unfamiliar with the Σ notation used in the t test, it basically means to “add everything up”.
- Subtract 1 from the sample size to get the degrees of freedom. We have 11 items.
- Find the p-value in the t-table, using the degrees of freedom in Step 6. But if you don’t have a specified alpha level, use 0.05 (5%).
- In conclusion, compare your t-table value from Step 7 (2.228) to your calculated t-value (-2.74). The calculated t-value is greater than the table value at an alpha level of .05. In addition, note that the p-value is less than the alpha level: p <.05. So we can reject the null hypothesis that there is no difference between means. However, note that you can ignore the minus sign when comparing the two t-values as ± indicates the direction; the p-value remains the same for both directions.
Determining Statistical Significance
The procedure for a paired sample t-test involves four steps:
- Calculate the probability of observing the test statistic under the null hypothesis. This value is obtained by comparing t to a t-distribution with ((n\ -\ 1)) degrees of freedom.
- The p-value determines the Statistical significance. The p-value gives the probability of observing the test results under the null hypothesis. The lower the p-value, the lower the probability of obtaining a result like the one that was observed if the null hypothesis was true. Thus, a low p-value indicates decreased support for the null hypothesis. However, the possibility that the null hypothesis is true and that we simply obtained a very rare result can never be ruled out completely.
- The cutoff value for determining statistical significance is ultimately decided on by the researcher, but usually a value of .05 or less is chosen. Practical significance depends on the subject matter specifically. It is not uncommon, especially with large sample sizes, to observe a result that is statistically significant but not practically significant.
- Use a paired t-test when each subject has a pair of measurements, such as a before and after score. A paired t-test determines whether the mean change for these pairs is significantly different from zero. Paired t tests are also known as a paired sample t-test or a dependent samples t test. These names reflect the fact that the two samples are paired or dependent because they contain the same subjects.
Key Considerations for Accurate Paired t-Tests
Data Collection
Drawing a random sample from the population you are studying helps ensure that your data represent the population. Representative samples are vital when drawing inferences about the population. Paired sample t-tests use the same people or items in both groups. Dependent samples can increase the statistical power of your analyses. It’s important to distinguish between independent subjects when drawing a random sample and dependent samples when measuring. When choosing the subjects, selecting one must not affect the probability of choosing the others. However, after selecting your subjects, they will all be in both groups.
Data Type
T tests require continuous data. Continuous variables can take on any numeric value. Values can be meaningfully divided into smaller increments, including fractional and decimal values. Typically, you measure continuous variables on a scale. If you don’t have continuous data, you’ll need to use a different type of hypothesis test.
Normality of Data
All t-tests assume that your data follow the normal distribution. For a paired t test, the normality assumption applies to the distribution of paired differences rather than raw test scores. For a paired sample t test, if you have at least 20 subjects, your test results will be reliable even when your data are skewed.
Read also: Guide to UC Davis Student Housing
Paired t-Tests versus Independent Samples t-Test
Here’s the deciding characteristic for when you should use paired t tests versus an independent samples t test. Does it make sense to assess the difference within a row? In other words, does each row correspond to one person or item? For our dataset, each row in the dataset contains the same subject in the two measurement columns. Consequently, it makes sense to find the difference between the pairs of values. Because we have paired samples, each difference in a row represents how much a subject’s score changed after the training program. Conversely, if each row had contained different subjects, it would not make sense to subtract them. The change between the pretest for one subject and the posttest for another does not provide meaningful information.
Interpreting Results
The output indicates that the mean for the Pretest is 97.06, and for the Posttest it is 107.83. The average difference between the paired pretest and posttest scores is -10.77. Because our p-value (0.002) for the paired sample t-test is less than the standard significance level of 0.05, we can reject the null hypothesis. The results are statistically significant. Our sample data support the notion that the average paired difference does not equal zero. The sample estimate of the difference (-10.77) is unlikely to equal the population difference. The negative values reflect the fact that the Pretest has a lower mean than the Posttest (i.e., Pretest - Posttest < 0).
Practical Application in Stata
The dependent variable (the variable of interest) needs a continuous scale (i.e., the data needs to be at either an interval or ratio measurement). The independent variable must have two groups that are related or have “matched pairs”. Matched pairs, as mentioned above, mean that the same participants are present in both groups. A Paired t-tests examines the within-group differences of a single group. This means that the “related groups” or “matched pairs” are the same participants who have been measured in each of the two groups. This example shows perceived social support as an outcome variable. As you can see, the t-test was significant as circled in red, showing a change from the scores before completing the social skills program, and after. By looking at the means as circled in blue you can tell that the mean level of perceived social support was higher following the completion of the program.
Assumptions When Using Stata
The paired t-test, also referred to as the paired-samples t-test or dependent t-test, is used to determine whether the mean of a dependent variable (e.g., weight, anxiety level, salary, reaction time, etc.) is the same in two related groups (e.g., two groups of participants that are measured at two different "time points" or who undergo two different "conditions"). For example, you could use a paired t-test to understand whether there was a difference in managers' salaries before and after undertaking a PhD (i.e., your dependent variable would be "salary", and your two related groups would be the two different "time points"; that is, salaries "before" and "after" undertaking the PhD). Alternately, you could use a paired t-test to understand whether there was a difference in smokers' daily cigarette consumption 6 week after wearing nicotine patches compared with wearing patches that did not contain nicotine, known as a "placebo" (i.e., your dependent variable would be "daily cigarette consumption", and your two related groups would be the two different "conditions" participants were exposed to; that is, cigarette consumption values after wearing "nicotine patches" (the treatment group) compared to after wearing the "placebo" (the control group)). However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a paired t-test to give you a valid result. There are four "assumptions" that underpin the paired t-test. If any of these four assumptions are not met, you cannot analyse your data using a paired t-test because you will not get a valid result. Since assumptions #1 and #2 relate to your study design and choice of variables, they cannot be tested for using Stata.
- Assumption #1: Your dependent variable should be measured at the interval or ratio level (i.e., they are continuous). Examples of such dependent variables include height (measured in feet and inches), temperature (measured in oC), salary (measured in US dollars), revision time (measured in hours), intelligence (measured using IQ score), reaction time (measured in milliseconds), test performance (measured from 0 to 100), sales (measured in number of transactions per month), and so forth.
- Assumption #2: Your independent variable should consist of two categorical, "related groups" or "matched pairs". "Related groups" indicates that the same subjects are present in both groups. The reason that it is possible to have the same subjects in each group is because each subject has been measured on two occasions on the same dependent variable. For example, you might have measured 50 participants' typing speed using a keyboard (i.e., the dependent variable) before and after they underwent a touch-typing course designed to improve typing speed (i.e., the two "time points" where participants' typing speed was measured - "before" and "after" the touch-typing course - reflect the two "related groups" of the independent variable). Since the same participants were measured at these two time points, the groups are related. It is also common for related groups to reflect to different conditions that all participants undergo (i.e., these conditions are sometimes called interventions, treatments or trials). Fortunately, you can check assumptions #3 and #4 using Stata. When moving on to assumptions #3 and #4, we suggest testing them in this order because it represents an order where, if a violation of the assumption is not correctable, you will no longer be able to use a paired t-test. In fact, do not be surprised if your data fails one or more of these assumptions since this is fairly typical when working with real-world data rather than textbook examples, which often only show you how to carry out a paired t-test when everything goes well. However, don’t worry because even when your data fails certain assumptions, there is often a solution to overcome this (e.g., transforming your data or using another statistical test instead).
- Assumption #3: There should be no significant outliers in the differences between the two related groups. An outlier is simply a single data point within your data that does not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The problem with outliers is that they can have a negative effect on the paired t-test, distorting the differences between the two related groups (whether increasing or decreasing the scores on the dependent variable), which reduces the accuracy of your results. In addition, they can affect the statistical significance of the test.
- Assumption #4: The distribution of the differences in the dependent variable between the two related groups should be approximately normally distributed. We talk about the paired t-test only requiring approximately normal data because it is quite "robust" to violations of normality, meaning that the assumption can be a little violated and still provide valid results. In practice, checking for assumptions #3 and #4 will probably take up most of your time when carrying out a paired t-test. In the section, Procedure, we illustrate the Stata procedure required to perform a paired t-test assuming that no assumptions have been violated.
Example of Paired t-Test in Stata
A company researcher wants to test a new formula for a sports drink that has been designed to improve running performance. Instead of the regular "carbohydrate-only" drink that the company produces, this new sports drink contains a "carbohydrate-protein" formula. The researcher would like to know whether this new carbohydrate-protein sports drink leads to a difference in running performance compared to the carbohydrate-only sports drink. To carry out the experiment, the researcher recruited 20 middle distance runners. All of these participants performed two trials in which they had to run as far as possible for 2 hours on a treadmill. In one of the trials, all 20 participants drank from a bottle containing the carbohydrate-only formula. In the other trial, the same 20 participants drank from a bottle containing the carbohydrate-protein formula. Whilst all participants completed both trials, the order in which they underwent the trials differed, which is known as counterbalancing (i.e., 10 of the 20 participants completed the trial with the carbohydrate-only drink first, and then the trial with the carbohydrate-protein drink second, whilst the other 10 participants started with the carbohydrate-protein trial and then undertook the carbohydrate-only trial).
Note: In actual fact, these are not "variables", but rather the two "related groups" of the independent variable, Conditions. However, in Stata, these two related groups will be referred to as variables when you: (a) create them in the first instance; (b) view them in the Data Editor (Edit) spreadsheet; and (c) carry out the paired t-test using Stata's dialogue boxes, where Stata refers to them as the "First Variable" and "Second Variable". In this section, we show you how to analyse your data using a paired t-test in Stata when the four assumptions in the previous section, Assumptions, have not been violated. You can carry out a paired t-test using code or Stata's graphical user interface (GUI). After you have carried out your analysis, we show you how to interpret your results.
Explanation: You need to think carefully about the variables you select as the First variable: and Second variable:. If you have a study design where you are interested in the differences between two "conditions" (see the assumption on related groups if you are unsure what this means), there will often be a control group and an experimental group. In such a case, you will usually subtract the scores on the dependent variable for the control group from your experimental group (i.e., the experimental group minus the control group). The variable that represents the experimental group acts as the First variable: and the variable that represents the control group acts as the Second variable:. Alternately, if your two related groups are two "time points" (e.g., a pre-post study design), you will often subtract the scores on the dependent variable for the first time point from the second time point (e.g., the scores "before" an intervention has taken place from the scores "after" the intervention). In our example, you are interested in whether a new carbohydrate-protein drink (i.e., the experimental group) leads to a difference in performance compared to an existing carbohydrate-only drink (i.e., the control group, since this reflects the status-quo). This way, any positive differences reflect an improvement in the distance run by participants using the carbohydrate-protein drink (carb_protein) compared to the carbohydrate-only drink (carb), and vice-versa for negative differences.
- Note 1: By default, Stata uses 95% confidence intervals, which equates to declaring statistical significance at the p < .05 level. If you wish to change this you can enter any value from 10 to 99.99.
- Note 2: You need to be precise when entering the code into the box. The code is "case sensitive".
- Note 3: If you're still getting the error message in Note 2: above, it is worth checking the name you gave your two variables in the Data Editor when you set up your file (i.e., see the Data Editor screen above). In the box on the right-hand side of the Data Editor screen, it is the way that you spelt your variables in the section, not the section that you need to enter into the code (see below for our independent variable).
The three steps required to run a paired t-test in Stata 12 - known as a "Mean-comparison test, paired data" in Stata 12 - are shown below. Select carb_protein from within the First variable: drop-down box, and carb from within the Second variable: drop-down box. For this example, keep the default 95% confidence interval by keeping the 95 value in the Confidence level drop-down box.
Important: Whilst it does not matter which of your two variables you enter into the First variable: and Second variable: dialogue boxes, in order to interpret the Stata output in this guide, we suggest a particular order for selecting these variables, which we discuss in the explanation above. If you follow this explanation, it will be much easier to interpret your results. Click on the button. In the -Paired t-test- area, select carb_protein from within the First variable: drop-down box, and carb from within the Second variable: drop-down box. For this example, keep the default 95% confidence interval by keeping the 95 value in the Confidence level drop-down box. Important: Whilst it does not matter which of your two variables you enter into the First variable: and Second variable: dialogue boxes, in order to interpret the Stata output in this guide, we suggest a particular order for selecting these variables, which we discuss in the explanation above. If you follow this explanation, it will be much easier to interpret your results. Click on the button.
Interpreting Stata Output
This output provides useful descriptive statistics for the two groups that you compared, including the mean and standard deviation, as well as actual results from the paired t-test. Looking at the Mean column, you can see that those people who used the nicotine patches had lower cigarette consumptions at the end of the experiment compared to those who received the placebo. You can see that there is a mean difference between the two trials of 0.1355 km (Mean) with a standard deviation of 0.09539 km (Std. Dev.), a standard error of the mean of 0.02133 km (Std. Err.), and 95% confidence intervals of 0.09085 to 0.18015 km (95% Conf. interval). You are presented with an obtained t-value (t) of 6.3524, the degrees of freedom (degrees of freedom), which are 19, and the statistical significance (2-tailed p-value) of the paired t-test (Pr(|T| > |t|) under Ha: mean(diff) != 0), which is 0.0000. As the p-value is less than 0.05 (i.e., p < .05), it can be concluded that there is a statistically significant difference between our two variable scores (carb and carb_protein).
Note: We present the output from the paired t-test above. However, since you should have tested your data for the assumptions we explained earlier in the Assumptions section, you will also need to interpret the Stata output that was produced when you tested for them. This includes: (a) the boxplots you used to check if there were any significant outliers; and (b) the output Stata produces for your Shapiro-Wilk test of normality to determine normality.
Reporting the Results
When you report the output of your paired t-test, it is good practice to include: (a) an introduction to the analysis you carried out; (b) information about your sample, including how many participants there were in your sample; (c) the mean and standard deviation for your two related groups; and (d) the observed t-value, 95% confidence intervals, degrees of freedom, and significance level (or more specifically, the 2-tailed p-value).
For example: A paired t-test was run on a sample of 20 middle distance runners to determine whether there was a statistically significant mean difference between the distance ran when participants imbibed a carbohydrate-protein drink compared to a carbohydrate-only drink.
In addition to the reporting the results as above, a diagram can be used to visually present your results. For example, you could do this using a bar chart with error bars (e.g., where the errors bars could be the standard deviation, standard error or 95% confidence intervals). This can make it easier for others to understand your results. Furthermore, you are increasingly expected to report "effect sizes" in addition to your paired t-test results. Effect sizes are important because whilst the paired t-test tells you whether differences between group means are "real" (i.e., different in the population), it does not tell you the "size" of the difference.
tags: #paired #t-test #formula #explained

