7 Format data in tidyr

How to reshape data between long and wide formats

Authors

Shane McCarty, PhD

Gavin Rualo

Erica Sava

Published

10.20.2025

Abstract

This chapter teaches researchers to reshape data between long and wide formats using tidyr in R. Researchers learn when to use each format: long format for visualization and mixed models (one row per observation) versus wide format for within-person analyses (one row per participant). Through examples with longitudinal wellbeing and sleep data, the chapter demonstrates pivot_wider() for converting repeated measures to separate columns and pivot_longer() for transforming multiple columns into rows. Researchers master key parameters including names_from, values_from, names_glue, and names_pattern to create meaningful variable names during reshaping. The chapter is essential for teams collecting pre/post intervention data or daily diary responses. By the end, researchers will confidently restructure their datasets to match analytical and visualization requirements.

Keywords

tidyr, wide format, long format

📖 tidyr resources

For Teams 2 and 4

This is a required module for Teams 2 and 4 given their long data format.

7.1 Data Formats

7.1.1 Long Form

Imagine you collected data on WELLBEING and SLEEP at Day 1 and Day 7 of a 7-day study for a participant whose participant ID is PASSWORD. This data is correct, but problematic for analyzing within-person change because abc123 appears on its own row twice. Our preference is one row per participant.

*Example of long form data format where each case/row refers to a SURVEY RESPONSE submitted on Day 1 or 7.*
`PASSWORD`	`DAY`	`WELLBEING`	`SLEEP`
abc123	1	3.2	42
gobills7	1	7.8	83
abc123	7	1.2	17
gobills7	7	5.8	68

7.1.2 Wide Form

Using the tidyr package, long form data can be converted to wide form using the DAY variable. In this case, survey responses are removed from a row and moved to new columns that capture Day 7 responses at Time 2 (T2) using the PASSWORD.

*Example of wide form data format where each case/row refers to a specific PARTICIPANT. Now, there are only 2 rows x 4 columns with `DAY`* dropped from the columns because there is no longer a `DAY`.
`PASSWORD`	`WELLBEING_T1`	`WELLBEING_T2`	`SLEEP_T1`	`SLEEP_T2`
abc123	3.2	1.2	42	17
gobills7	7.8	5.8	83	18

7.1.3 R Code

Example 1

The code used to transition from long to wide data format here is as follows:

```{r}
#| label: long-to-wide
library(tidyr)

## Convert to wide format
wide_data <- long_data %>%
  pivot_wider(
    names_from = DAY,
    values_from = c(WELLBEING, SLEEP), 
    names_glue = "{.value}_T{DAY}" 
  )
```

Here, we reference the “DAY” column in the long dataset as the column we are trying to eliminate as we transition to wide format. As seen in the wide format, we now have two separate columns for wellbeing and sleep: one at day 1 and another at day 7.

Example 2

Another example can be found in the next chapter on visualizing pre and posttest scores.

Below is the example code:

```{r}
library(tidyr)

# Create long format with separate columns for each SE type
longdata <- pivot_longer(oralhealthdata_prepost,
                        cols = c("BSE_T1", "BSE_T2", "CSE_T1", "CSE_T2", "ISE_T1", "ISE_T2"),
                        names_to = c("SE_type", "Time"),
                        names_pattern = "(.+)_(.+)",
                        values_to = "SE",
                        values_drop_na = TRUE)

# Convert T1/T2 to before/after
longdata$Time <- ifelse(longdata$Time == "T1", "before", "after")

# Pivot wider to get separate columns for each SE type
longdata <- pivot_wider(longdata,
                       names_from = SE_type,
                       values_from = SE)

# Keep only the desired columns
longdata <- longdata[, c("Time", "BSE", "CSE", "ISE")]

longdata$Time <- factor(longdata$Time, levels = c('before','after'))

longdata_long <- pivot_longer(longdata, cols = c(BSE, CSE, ISE), 
               names_to = "SelfEfficacy", 
               values_to = "Score")

longdata_long$SelfEfficacy <- factor(longdata_long$SelfEfficacy, levels = c("BSE", "CSE", "ISE"))
longdata_long$Time <- factor(longdata_long$Time, levels = c("before", "after"))

longdata_long <- longdata %>%
  pivot_longer(cols = c(BSE, CSE, ISE), 
               names_to = "SelfEfficacy", 
               values_to = "Score")

# Create a new variable combining 'SelfEfficacy' and 'Time'
longdata_long$SelfEfficacyTime <- factor(
  paste(longdata_long$SelfEfficacy, longdata_long$Time),
  levels = c("BSE before", "BSE after", "CSE before", "CSE after", "ISE before", "ISE after")
)

```

In this example, Erica converts from wide to long data in order to more easily compare time 1 vs time 2 data. This data format is not only for binary time categories either, it can be applicable to projects with data across multiple days.

7.2 Teams

Team 1, 3, and 5 will export data in wide format and will not use tidyr. Teams 2 and 4 will need to use tidyr to convert their data from long form into wide form.

Team 2

The Day 1/7 Survey is in long format because a person takes the same survey at Day 1 and 7, resulting in two entries per person. The Daily Survey is in long format because a person takes the same survey 6-7 days, resulting in multiple entries per person.

Team 4

Survey is in long format due to participants completing the survey twice at the start of the intervention (pretest) and at the end (posttest).

7.3 Resources

At present, the best resource is: Reshaping data between long and wide formats (Dai, 2020)