R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Reading tabular data is a fundamental operation in R, especially given that much of R's strength comes from its data manipulation and statistical modeling capabilities. In this tutorial, we'll focus on importing tabular data from different types of files.
read.table
This is a general function for reading delimited files.
data <- read.table("path/to/your/data.txt", header=TRUE, sep="\t")
Where:
header=TRUE
indicates the first row contains column names.sep="\t"
specifies that the columns are tab-separated. Replace with your specific delimiter, e.g., sep=","
for CSV.read.csv
This is a specialized version of read.table
for reading comma-separated files.
csv_data <- read.csv("path/to/your/data.csv", header=TRUE)
readxl
install.packages("readxl") library(readxl) # Reading the first sheet data <- read_excel("path/to/your/data.xlsx") # Reading a specific sheet data_sheet2 <- read_excel("path/to/your/data.xlsx", sheet = 2)
data.table
PackageThe fread
function from the data.table
package is versatile and often faster than base R functions, especially for large datasets.
install.packages("data.table") library(data.table) data <- fread("path/to/your/data.csv")
readr
PackageThe readr
package, part of the tidyverse, provides several functions to efficiently read tabular data.
install.packages("readr") library(readr) # For delimited files data <- read_delim("path/to/your/data.txt", delim = "\t") # Specifically for CSV csv_data <- read_csv("path/to/your/data.csv") # For fixed width files fwf_data <- read_fwf("path/to/your/fwdata.txt", fwf_widths(c(5, 5, 2))) # specify widths
After importing, it's a good practice to check the first few rows of your data and its structure.
head(data) # Check the first 6 rows str(data) # Check structure
Most of the importing functions allow you to specify how missing data (often represented as NA
in R) is indicated in the file.
For read.csv
and read.table
:
data <- read.csv("path/to/data.csv", na.strings = c("NA", "na", "-"))
For readr
functions:
data <- read_csv("path/to/data.csv", na = c("NA", "na", "-"))
This means whenever the functions encounter "NA", "na", or "-", they'll treat it as a missing value in R.
R offers a multitude of ways to read tabular data, with each method having its own advantages depending on the specifics of your task. Whether you're working with small datasets or big data, R has a solution to efficiently import and process your data. Familiarize yourself with the nuances of each method and choose the one that fits your workflow best.
R read.table function examples:
# Using read.table to read a tab-delimited file table_data <- read.table("my_table_file.txt", header = TRUE, sep = "\t")
Importing CSV files into R:
# Reading a CSV file csv_data <- read.csv("my_csv_file.csv")
Reading tab-separated files in R:
# Reading a tab-separated file tsv_data <- read.delim("my_tsv_file.tsv")
R code for reading Excel files with tabular data:
# Install and load the readxl package install.packages("readxl") library(readxl) # Reading an Excel file with readxl excel_data <- read_excel("my_excel_file.xlsx", sheet = 1)
Using readr package for efficient tabular data reading in R:
# Install and load the readr package install.packages("readr") library(readr) # Reading a CSV file with readr readr_csv_data <- read_csv("my_csv_file.csv")
Importing tabular data from databases in R:
# Install and load DBI and RSQLite packages install.packages(c("DBI", "RSQLite")) library(DBI) library(RSQLite) # Connecting to a SQLite database con <- dbConnect(RSQLite::SQLite(), "my_database.db") # Reading data from a table db_tab_data <- dbGetQuery(con, "SELECT * FROM my_table")
Handling missing values in tabular data in R:
# Handling missing values while reading a CSV file csv_data_missing <- read.csv("my_csv_file_missing.csv", na.strings = c("", "NA"))