R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Data Frames in R

Data frames are a fundamental data structure in R for storing tabular data. They are similar to tables in databases or Excel spreadsheets. This tutorial will walk you through the basics of working with data frames in R.

1. Creating Data Frames

1.1. Using the data.frame Function

students <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(20, 21, 22),
  Major = c("Biology", "Math", "History"),
  stringsAsFactors = FALSE  # Prevents strings from being converted to factors
)

1.2. From Vectors

names <- c("Alice", "Bob", "Charlie")
ages <- c(20, 21, 22)
majors <- c("Biology", "Math", "History")

students <- data.frame(Name = names, Age = ages, Major = majors, stringsAsFactors = FALSE)

2. Accessing Data in Data Frames

2.1. Accessing Columns

You can access columns using the $ operator or double square brackets.

students$Name
students[["Name"]]

2.2. Accessing Rows

Use the row index to access specific rows.

students[1, ]  # First row
students[1:2, ]  # First two rows

2.3. Accessing Specific Cells

students[1, "Name"]  # Name of the first student

3. Modifying Data Frames

3.1. Adding New Columns

students$GPA <- c(3.5, 3.8, 3.6)  # Add a new GPA column

3.2. Modifying Rows or Columns

students$Age <- students$Age + 1  # Increment age by 1

3.3. Removing Columns

students$GPA <- NULL  # Removes the GPA column

4. Useful Data Frame Functions

4.1. head and tail

head(students, n = 2)  # Displays the first two rows
tail(students, n = 2)  # Displays the last two rows

4.2. str

Displays the internal structure of an R object, which is particularly useful for data frames.

str(students)

4.3. summary

Provides summary statistics for each column in a data frame.

summary(students)

4.4. dim, nrow, and ncol

dim(students)  # Dimensions of the data frame (rows, columns)
nrow(students)  # Number of rows
ncol(students)  # Number of columns

4.5. rownames

rownames(students)  # Get or set row names

5. Subsetting Data Frames

subset <- students[students$Age > 20, ]  # Students older than 20

Conclusion

Data frames are one of the main reasons R is such a powerful tool for data analysis. They provide a flexible and intuitive structure for handling and analyzing structured data. The more you work with them, the more efficient your data operations in R will become. For advanced data frame operations, consider exploring packages like dplyr and tidyr.

  1. Creating and initializing data frames in R:

    # Creating a data frame
    my_data <- data.frame(
      ID = c(1, 2, 3, 4, 5),
      Name = c("John", "Alice", "Bob", "Eva", "Mike"),
      Age = c(25, 30, 22, 28, 35)
    )
    
  2. R code for subsetting and indexing data frames:

    # Subsetting rows based on condition
    subset_data <- my_data[my_data$Age > 25, ]
    
    # Indexing columns
    age_column <- my_data$Age
    
  3. Manipulating columns and rows in R data frames:

    # Adding a new column
    my_data$Salary <- c(50000, 60000, 45000, 55000, 70000)
    
    # Removing a column
    my_data <- my_data[, -3]  # Remove the third column (Age)
    
  4. Aggregating and summarizing data frames in R:

    # Aggregating data by Age and calculating mean
    summarised_data <- aggregate(cbind(Salary, ID) ~ Age, data = my_data, FUN = mean)
    
  5. Joining and merging data frames in R:

    # Creating two data frames
    df1 <- data.frame(ID = 1:3, Value = c("A", "B", "C"))
    df2 <- data.frame(ID = 2:4, Score = c(10, 15, 20))
    
    # Inner join
    inner_join(df1, df2, by = "ID")
    
    # Full outer join
    merge(df1, df2, by = "ID", all = TRUE)
    
  6. R data frame operations and functions:

    # Applying a function to a column
    my_data$Bonus <- apply(my_data[, c("Salary", "Age")], 1, function(x) x[1] * 0.1 + x[2])
    
    # Row-wise operations
    row_sums <- apply(my_data[, c("Salary", "Age")], 1, sum)
    
  7. Handling missing values in R data frames:

    # Creating a data frame with missing values
    my_data_missing <- data.frame(ID = 1:5, Name = c("John", NA, "Bob", "Eva", "Mike"), Age = c(25, 30, NA, 28, 35))
    
    # Removing rows with missing values
    my_data_clean <- na.omit(my_data_missing)
    
    # Imputing missing values with mean
    my_data_missing$Age[is.na(my_data_missing$Age)] <- mean(my_data_missing$Age, na.rm = TRUE)
    
  8. Reshaping and transforming data frames in R:

    # Reshaping data frame
    reshaped_data <- spread(my_data, key = Age, value = Salary)
    
    # Transposing data frame
    transposed_data <- t(my_data)
    
  9. Time series analysis with data frames in R:

    • Use time-related functions and packages like zoo or xts for time series data.
    # Creating a time series data frame
    time_series_data <- data.frame(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-05"), by = "days"),
                                    Value = c(10, 15, 20, 25, 30))
    
    # Converting to time series object
    library(xts)
    time_series_xts <- xts(time_series_data$Value, order.by = as.Date(time_series_data$Date))