# Load packages (do this each session)
library(NHANES) # CDC health data
library(tidyverse) # Data wrangling (includes readr)
library(readxl) # Read Excel files
library(haven) # Read SPSS/SAS/Stata files5 Import Data Once
How to use readr, readxl, and haven
This chapter supports researchers in importing data from various sources and file formats commonly used in public health research. Researchers learn to load datasets from R packages (NHANES), import Excel spreadsheets, read csv files, and work with SPSS data files. The chapter demonstrates essential data inspection techniques including viewing data structure, examining headers, and understanding data summaries. Researchers also learn to handle missing data by recoding special values (-99, -50) as NA during import. By the end of this chapter, researchers will be equipped to import data from multiple sources and prepare it for analysis in their public health research projects.
data import, csv, excel, SPSS, NHANES, missing data
Section 5.3.3 was updated to handle missing data correctly. Responses (-99, and -50) will be changed to NA. Check View() to ensure there are no -99 and -50 values. You can run a histogram to check too!
5.1 Introduction to Data Import
The first step of the data science workflow is to import your data. Public health researchers work with data from many sources and in many formats. In FRI-PH, we will import data from a survey collected using the survey software Qualtrics.
Common data sources:
Survey platforms (Qualtrics, REDCap)
R packages (e.g.,
NHANES)Other statistical softwares (e.g., SPSS, SAS, Stata)
Spreadsheets (Excel, Google Sheets)
Databases (SQL)
5.2 Datasets
You will learn how to import different datasets:
datasets in R packages
NHANESandYRBSclass dataset “health beliefs (cohort 11) collected in the Research Methods I (ANTH206) course
team dataset “oralhealth”
5.2.1 File Formats
National Health and Nutrition Examination Survey dataset from the Centers for Disease Control and Prevention (CDC) using the R package
NHANESa
.xlsxfile format of the class dataset “health beliefs (cohort 11) collected in the Research Methods I (ANTH206) course03.15.2025.prolifichealthbeliefs.xlsxa
.csvfile format of the class dataset “health beliefs (cohort 11)” collected in the Research Methods I (ANTH206) course03.15.2025.prolifichealthbeliefs.csva
.savfile format from SPSS of the FRI team project on oral health10.31.2024.oralhealth.sav.
5.3 Import Examples
5.3.1 Import from NHANES package
install.packages("NHANES")5.3.2 Load libraries
library(tidyverse)
library(psych)
library(knitr)
library(tibble)
library(dplyr)
library(tidyr)
library(scales) # for number formatting like comma()
library(english) # to convert numbers to words
library(stringr) # for text functions like str_c()
library(NHANES)Column Names
Review the column names (aka the variables) in the dataset.
Data Structure
Now, you will view the structure of the NHANES data
Data Header
Review the first 10 responses for the first 6 columns
View Data
In Posit Cloud, you can use View() in your CONSOLE to view the dataframe.
```{r}
library(NHANES)
# View data
View(NHANESraw)
```5.3.3 Import Excel .xlsx file
Use the readxl package to import Excel files (.xlsx or .xls). The primary advantage of using Excel files is the ability to manually inspect your data using Excel or Google sheets before you import into R. Below are code chunks that can be copied into your report if you are using an excel file with the header in row 1.
Use the .clean data file: 10.20.2025.data.team1.clean.xlsx with a header in row 1 only. If you manually inspect your data and you see labels in row 2, you need to clean the dataset.
library(readxl)
# Import Excel file
safetydata <- read_excel(
"10.20.2025.data.team1.clean.xlsx",
col_names = TRUE)
safetydata[safetydata == -99] <- NA
safetydata[safetydata == -50] <- NA
##explanation: all -99 and -50 data will be treated as missing data# View first 10 rows
head(safetydata, 10)# A tibble: 10 × 88
StartDate EndDate Status IPAddress Progress
<dttm> <dttm> <dbl> <chr> <dbl>
1 2025-09-17 14:41:41 2025-09-17 14:56:14 0 149.125.90.78 100
2 2025-09-17 14:58:26 2025-09-17 15:09:09 0 149.125.71.249 100
3 2025-09-17 15:10:40 2025-09-17 15:52:03 0 149.125.5.92 100
4 2025-09-17 15:58:56 2025-09-17 16:12:55 0 174.254.231.34 100
5 2025-09-17 19:23:33 2025-09-17 19:37:52 0 149.125.118.117 100
6 2025-09-18 11:49:00 2025-09-18 11:56:18 0 149.125.188.203 100
7 2025-09-18 11:56:04 2025-09-18 13:06:41 0 69.207.149.41 100
8 2025-09-17 16:05:31 2025-09-18 13:37:48 0 174.197.199.232 100
9 2025-09-18 13:21:08 2025-09-18 13:40:41 0 149.125.48.210 100
10 2025-09-18 13:35:06 2025-09-18 13:53:33 0 149.125.63.60 100
# ℹ 83 more variables: `Duration (in seconds)` <dbl>, Finished <dbl>,
# RecordedDate <dttm>, ResponseId <chr>, RecipientLastName <lgl>,
# RecipientFirstName <lgl>, RecipientEmail <lgl>, ExternalReference <lgl>,
# LocationLatitude <dbl>, LocationLongitude <dbl>, DistributionChannel <chr>,
# UserLanguage <chr>, Q_RecaptchaScore <dbl>, CONSENT <dbl>, SAFE_QUAL <chr>,
# REDUCEVIOLENCE_QUAL <chr>, FIXED_CRIMES_QUAL <chr>,
# FIXED_CRIMINALS_QUAL <chr>, COMM_FEEL <dbl>, COMM_HELP <dbl>, …
# View column names, which are the VARIABLES
names(safetydata) [1] "StartDate" "EndDate"
[3] "Status" "IPAddress"
[5] "Progress" "Duration (in seconds)"
[7] "Finished" "RecordedDate"
[9] "ResponseId" "RecipientLastName"
[11] "RecipientFirstName" "RecipientEmail"
[13] "ExternalReference" "LocationLatitude"
[15] "LocationLongitude" "DistributionChannel"
[17] "UserLanguage" "Q_RecaptchaScore"
[19] "CONSENT" "SAFE_QUAL"
[21] "REDUCEVIOLENCE_QUAL" "FIXED_CRIMES_QUAL"
[23] "FIXED_CRIMINALS_QUAL" "COMM_FEEL"
[25] "COMM_HELP" "COMM_NEIGHBORS"
[27] "NOTCOMM_UNSAFE" "NOTCOMM_RELY"
[29] "NOTCOMM_DISTRUST" "EFFECT_CARE_COMM"
[31] "EFFECT_CARE_EDUCATION" "EFFECT_CARE_HVIP"
[33] "EFFECT_CARE_MHRESOURCES" "EFFECT_CARE_WELFARE"
[35] "EFFECT_FEAR_LEG" "EFFECT_FEAR_POLICE"
[37] "EFFECT_FEAR_DEPORT" "EFFECT_FEAR_ARMY"
[39] "FIXEDPERSON1_BASIC" "FIXEDPERSON2_DIFF"
[41] "FIXEDPERSON3_CHANGE_R" "FIXEDPERSON4_OLD"
[43] "FIXEDPERSON_ALWAYS_R" "FIXEDPERSON_CERTAIN"
[45] "FIXEDPERSON_MATTER_R" "FIXEDPERSON_ALL_R"
[47] "MOREFIREARMS_01" "GENDER"
[49] "RACIALIZED" "RACIALIZED_8_TEXT"
[51] "POLITICAL_BELIEFS" "POLITICAL_BELIEFS_8_TEXT"
[53] "SOCIALSTATUS" "AGE"
[55] "ZIP" "PREVENTABILITY1_CAN"
[57] "PREVENTABILITY2_HELP" "PREVENTABILITY3_TAUGHT"
[59] "PREVENTABILITY4_DOING" "PREVENTABILITY5_LEARN"
[61] "PREVENTABILITY6_UPSTAND" "PROMOTABILITY1_CARING"
[63] "PROMOTABILITY2_DO" "PROMOTABILITY3_TAUGHT"
[65] "PROMOTABILITY4_SAYING" "PROMOTABILITY5_LEARN"
[67] "PROMOTABILITY6_UPSTAND" "EFFECTIVE_SAFETY_1"
[69] "EFFECTIVE_SAFETY_2" "EFFECTIVE_SAFETY_3"
[71] "EFFECTIVE_SAFETY_4" "EFFECTIVE_SAFETY_5"
[73] "EFFECTIVE_SAFETY_6" "EFFECTIVE_SAFETY_7"
[75] "EFFECTIVE_SAFETY_8" "EFFECTIVE_PREVENT_1"
[77] "EFFECTIVE_PREVENT_2" "EFFECTIVE_PREVENT_3"
[79] "EFFECTIVE_PREVENT_4" "EFFECTIVE_PREVENT_5"
[81] "EFFECTIVE_PREVENT_6" "EFFECTIVE_PREVENT_7"
[83] "EFFECTIVE_PREVENT_8" "HEALTHSTATUS"
[85] "HOPE" "ANXIETY"
[87] "FINDSURVEY" "FINDSURVEY_5_TEXT"
# Get a summary of all variables
summary(safetydata) StartDate EndDate Status
Min. :2025-09-17 14:37:57.00 Min. :2025-09-17 14:38:13.00 Min. :0
1st Qu.:2025-09-18 13:41:55.00 1st Qu.:2025-09-18 13:54:21.00 1st Qu.:0
Median :2025-09-27 16:09:29.00 Median :2025-09-27 16:34:53.50 Median :0
Mean :2025-09-27 18:04:40.67 Mean :2025-09-27 19:15:15.82 Mean :0
3rd Qu.:2025-10-08 12:43:40.75 3rd Qu.:2025-10-08 12:44:09.00 3rd Qu.:0
Max. :2025-10-19 14:02:05.00 Max. :2025-10-19 14:21:53.00 Max. :0
IPAddress Progress Duration (in seconds) Finished
Length:180 Min. : 0.00 Min. : 7.0 Min. :0.0000
Class :character 1st Qu.: 5.00 1st Qu.: 21.0 1st Qu.:0.0000
Mode :character Median : 5.00 Median : 78.5 Median :0.0000
Mean : 42.48 Mean : 4234.6 Mean :0.3556
3rd Qu.:100.00 3rd Qu.: 857.0 3rd Qu.:1.0000
Max. :100.00 Max. :304917.0 Max. :1.0000
RecordedDate ResponseId RecipientLastName
Min. :2025-09-17 14:56:15.22 Length:180 Mode:logical
1st Qu.:2025-09-24 16:16:24.73 Class :character NA's:180
Median :2025-09-29 21:46:56.86 Mode :character
Mean :2025-10-02 07:31:26.51
3rd Qu.:2025-10-10 12:02:35.19
Max. :2025-10-19 16:55:15.32
RecipientFirstName RecipientEmail ExternalReference LocationLatitude
Mode:logical Mode:logical Mode:logical Min. :28.01
NA's:180 NA's:180 NA's:180 1st Qu.:40.83
Median :42.10
Mean :41.62
3rd Qu.:42.10
Max. :43.12
LocationLongitude DistributionChannel UserLanguage Q_RecaptchaScore
Min. :-82.49 Length:180 Length:180 Min. :0.7000
1st Qu.:-75.89 Class :character Class :character 1st Qu.:1.0000
Median :-75.64 Mode :character Mode :character Median :1.0000
Mean :-75.01 Mean :0.9922
3rd Qu.:-73.92 3rd Qu.:1.0000
Max. :-73.48 Max. :1.0000
CONSENT SAFE_QUAL REDUCEVIOLENCE_QUAL FIXED_CRIMES_QUAL
Min. :1.000 Length:180 Length:180 Length:180
1st Qu.:1.000 Class :character Class :character Class :character
Median :1.000 Mode :character Mode :character Mode :character
Mean :1.011
3rd Qu.:1.000
Max. :2.000
FIXED_CRIMINALS_QUAL COMM_FEEL COMM_HELP COMM_NEIGHBORS
Length:180 Min. :1.000 Min. :1.000 Min. :2.000
Class :character 1st Qu.:3.500 1st Qu.:4.000 1st Qu.:4.000
Mode :character Median :4.000 Median :4.000 Median :5.000
Mean :4.147 Mean :4.473 Mean :4.458
3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
Max. :6.000 Max. :6.000 Max. :6.000
NA's :105 NA's :106 NA's :108
NOTCOMM_UNSAFE NOTCOMM_RELY NOTCOMM_DISTRUST EFFECT_CARE_COMM
Min. :1.000 Min. :1.000 Min. :1.000 Min. :-99.00
1st Qu.:1.250 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 4.00
Median :2.000 Median :3.000 Median :2.000 Median : 5.00
Mean :2.284 Mean :3.085 Mean :2.649 Mean : 3.37
3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.: 5.00
Max. :6.000 Max. :5.000 Max. :5.000 Max. : 6.00
NA's :106 NA's :109 NA's :106 NA's :107
EFFECT_CARE_EDUCATION EFFECT_CARE_HVIP EFFECT_CARE_MHRESOURCES
Min. :-99.000 Min. :-99.0000 Min. :-99.000
1st Qu.: 4.000 1st Qu.: 3.0000 1st Qu.: 5.000
Median : 5.000 Median : 4.0000 Median : 5.000
Mean : 3.427 Mean : 0.9559 Mean : 3.703
3rd Qu.: 6.000 3rd Qu.: 5.0000 3rd Qu.: 6.000
Max. : 6.000 Max. : 6.0000 Max. : 6.000
NA's :105 NA's :112 NA's :106
EFFECT_CARE_WELFARE EFFECT_FEAR_LEG EFFECT_FEAR_POLICE EFFECT_FEAR_DEPORT
Min. :-99.000 Min. :-99.000 Min. :1.00 Min. :1.000
1st Qu.: 4.000 1st Qu.: 3.250 1st Qu.:2.00 1st Qu.:1.000
Median : 5.000 Median : 4.000 Median :4.00 Median :1.000
Mean : 1.917 Mean : 1.432 Mean :3.32 Mean :1.795
3rd Qu.: 5.250 3rd Qu.: 5.000 3rd Qu.:4.00 3rd Qu.:2.000
Max. : 6.000 Max. : 6.000 Max. :6.00 Max. :6.000
NA's :108 NA's :106 NA's :105 NA's :107
EFFECT_FEAR_ARMY FIXEDPERSON1_BASIC FIXEDPERSON2_DIFF FIXEDPERSON3_CHANGE_R
Min. :1.000 Min. :1.000 Min. :2.000 Min. :1.000
1st Qu.:1.000 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:2.000
Median :2.000 Median :5.000 Median :4.000 Median :3.000
Mean :2.097 Mean :4.536 Mean :4.029 Mean :2.833
3rd Qu.:3.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:3.000
Max. :6.000 Max. :6.000 Max. :6.000 Max. :6.000
NA's :108 NA's :111 NA's :111 NA's :114
FIXEDPERSON4_OLD FIXEDPERSON_ALWAYS_R FIXEDPERSON_CERTAIN FIXEDPERSON_MATTER_R
Min. :2.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:3.000 1st Qu.:2.000 1st Qu.:4.000 1st Qu.:2.000
Median :5.000 Median :3.000 Median :4.000 Median :3.000
Mean :4.319 Mean :2.794 Mean :4.329 Mean :2.985
3rd Qu.:5.000 3rd Qu.:3.000 3rd Qu.:5.000 3rd Qu.:4.000
Max. :6.000 Max. :6.000 Max. :6.000 Max. :6.000
NA's :111 NA's :112 NA's :110 NA's :114
FIXEDPERSON_ALL_R MOREFIREARMS_01 GENDER RACIALIZED
Min. :-99.000 Min. :0.00000 Min. :-99.000 Length:180
1st Qu.: 2.000 1st Qu.:0.00000 1st Qu.: 0.000 Class :character
Median : 3.000 Median :0.00000 Median : 0.000 Mode :character
Mean : 1.409 Mean :0.01429 Mean : -1.086
3rd Qu.: 3.750 3rd Qu.:0.00000 3rd Qu.: 1.000
Max. : 5.000 Max. :1.00000 Max. : 2.000
NA's :114 NA's :110 NA's :110
RACIALIZED_8_TEXT POLITICAL_BELIEFS POLITICAL_BELIEFS_8_TEXT SOCIALSTATUS
Mode:logical Min. :-99.000 Mode:logical Min. :2.000
NA's:180 1st Qu.: 2.000 NA's:180 1st Qu.:5.000
Median : 3.000 Median :6.000
Mean : -3.898 Mean :5.824
3rd Qu.: 4.000 3rd Qu.:7.000
Max. : 7.000 Max. :9.000
NA's :121 NA's :112
AGE ZIP PREVENTABILITY1_CAN PREVENTABILITY2_HELP
Length:180 Length:180 Min. :2.000 Min. :1.000
Class :character Class :character 1st Qu.:4.000 1st Qu.:5.000
Mode :character Mode :character Median :5.000 Median :5.000
Mean :4.767 Mean :5.098
3rd Qu.:5.000 3rd Qu.:6.000
Max. :6.000 Max. :6.000
NA's :120 NA's :119
PREVENTABILITY3_TAUGHT PREVENTABILITY4_DOING PREVENTABILITY5_LEARN
Min. :2.0 Min. :1.000 Min. :3.000
1st Qu.:5.0 1st Qu.:5.000 1st Qu.:5.000
Median :5.0 Median :5.000 Median :5.000
Mean :5.1 Mean :5.117 Mean :5.233
3rd Qu.:6.0 3rd Qu.:6.000 3rd Qu.:6.000
Max. :6.0 Max. :6.000 Max. :6.000
NA's :120 NA's :120 NA's :120
PREVENTABILITY6_UPSTAND PROMOTABILITY1_CARING PROMOTABILITY2_DO
Min. :3.000 Min. :2.000 Min. :4.000
1st Qu.:5.000 1st Qu.:5.000 1st Qu.:5.000
Median :5.000 Median :5.000 Median :5.000
Mean :5.117 Mean :5.233 Mean :5.217
3rd Qu.:6.000 3rd Qu.:6.000 3rd Qu.:6.000
Max. :6.000 Max. :6.000 Max. :6.000
NA's :120 NA's :120 NA's :120
PROMOTABILITY3_TAUGHT PROMOTABILITY4_SAYING PROMOTABILITY5_LEARN
Min. :1.000 Min. :1.00 Min. :3.000
1st Qu.:5.000 1st Qu.:5.00 1st Qu.:5.000
Median :5.000 Median :5.00 Median :5.000
Mean :5.197 Mean :5.18 Mean :5.148
3rd Qu.:6.000 3rd Qu.:6.00 3rd Qu.:6.000
Max. :6.000 Max. :6.00 Max. :6.000
NA's :119 NA's :119 NA's :119
PROMOTABILITY6_UPSTAND EFFECTIVE_SAFETY_1 EFFECTIVE_SAFETY_2
Min. :3.0 Min. :1.000 Min. :1.0
1st Qu.:5.0 1st Qu.:1.750 1st Qu.:2.0
Median :5.0 Median :4.000 Median :4.0
Mean :5.2 Mean :3.783 Mean :3.9
3rd Qu.:6.0 3rd Qu.:6.000 3rd Qu.:6.0
Max. :6.0 Max. :8.000 Max. :8.0
NA's :120 NA's :120 NA's :120
EFFECTIVE_SAFETY_3 EFFECTIVE_SAFETY_4 EFFECTIVE_SAFETY_5 EFFECTIVE_SAFETY_6
Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.0
1st Qu.:3.00 1st Qu.:5.000 1st Qu.:3.000 1st Qu.:2.0
Median :5.00 Median :7.000 Median :5.000 Median :3.0
Mean :4.35 Mean :5.767 Mean :4.583 Mean :3.1
3rd Qu.:6.00 3rd Qu.:7.000 3rd Qu.:6.000 3rd Qu.:4.0
Max. :8.00 Max. :8.000 Max. :7.000 Max. :8.0
NA's :120 NA's :120 NA's :120 NA's :120
EFFECTIVE_SAFETY_7 EFFECTIVE_SAFETY_8 EFFECTIVE_PREVENT_1 EFFECTIVE_PREVENT_2
Min. :1.000 Min. :3.000 Min. :1.000 Min. :1.00
1st Qu.:2.000 1st Qu.:8.000 1st Qu.:2.000 1st Qu.:2.00
Median :3.000 Median :8.000 Median :4.000 Median :4.00
Mean :2.883 Mean :7.633 Mean :4.121 Mean :3.69
3rd Qu.:4.000 3rd Qu.:8.000 3rd Qu.:6.000 3rd Qu.:6.00
Max. :6.000 Max. :8.000 Max. :8.000 Max. :8.00
NA's :120 NA's :120 NA's :122 NA's :122
EFFECTIVE_PREVENT_3 EFFECTIVE_PREVENT_4 EFFECTIVE_PREVENT_5
Min. :1.000 Min. :1.00 Min. :1.000
1st Qu.:2.000 1st Qu.:5.25 1st Qu.:4.000
Median :4.000 Median :7.00 Median :5.000
Mean :3.672 Mean :5.81 Mean :4.897
3rd Qu.:5.000 3rd Qu.:7.00 3rd Qu.:6.000
Max. :8.000 Max. :8.00 Max. :8.000
NA's :122 NA's :122 NA's :122
EFFECTIVE_PREVENT_6 EFFECTIVE_PREVENT_7 EFFECTIVE_PREVENT_8 HEALTHSTATUS
Min. :1.000 Min. :1 Min. :1.000 Min. :2.000
1st Qu.:2.000 1st Qu.:2 1st Qu.:7.250 1st Qu.:3.000
Median :3.000 Median :3 Median :8.000 Median :3.000
Mean :3.483 Mean :3 Mean :7.328 Mean :3.361
3rd Qu.:5.000 3rd Qu.:4 3rd Qu.:8.000 3rd Qu.:4.000
Max. :8.000 Max. :8 Max. :8.000 Max. :5.000
NA's :122 NA's :122 NA's :122 NA's :119
HOPE ANXIETY FINDSURVEY FINDSURVEY_5_TEXT
Min. :1.000 Min. :1.000 Min. :1.000 Length:180
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 Class :character
Median :3.000 Median :3.000 Median :3.000 Mode :character
Mean :3.131 Mean :3.131 Mean :2.935
3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :5.000 Max. :5.000 Max. :5.000
NA's :119 NA's :119 NA's :118