R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

DataFrame Operations in R

Data frames are one of the primary data structures in R, ideal for storing tabular data. They're similar to matrices but can hold columns of different types (numeric, character, factor, etc.). Here's a tutorial covering basic operations you can perform on data frames in R:

1. Creating a Data Frame:

Using the data.frame() function:

df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 23),
  Score = c(85, 90, 82)
)
print(df)

2. Accessing Columns:

You can access columns in a data frame using the $ operator or the [[ operator:

print(df$Name)
print(df[["Age"]])

3. Accessing Rows:

Use indexing:

# Access the first row
print(df[1, ])

# Access the first and third rows
print(df[c(1, 3), ])

4. Adding Columns:

df$City <- c("London", "Paris", "Berlin")
print(df)

5. Adding Rows:

Use the rbind() function:

new_row <- data.frame(Name = "David", Age = 28, Score = 88, City = "Madrid")
df <- rbind(df, new_row)
print(df)

6. Deleting Columns:

df$City <- NULL  # remove the City column
print(df)

7. Deleting Rows:

df <- df[-2, ]  # remove the second row
print(df)

8. Filtering Rows:

Filter rows based on some condition:

filtered_df <- df[df$Age > 24, ]
print(filtered_df)

9. Ordering Rows:

Use the order() function:

sorted_df <- df[order(df$Age), ]  # sort by Age in ascending order
print(sorted_df)

10. Summary Statistics:

To get a summary of the numeric columns:

summary(df)

11. Number of Rows and Columns:

num_rows <- nrow(df)
num_cols <- ncol(df)
print(paste("Number of rows:", num_rows))
print(paste("Number of columns:", num_cols))

12. Column Names and Data Types:

print(colnames(df))
print(sapply(df, class))

13. Applying Functions:

You can use the apply() function to apply a function over rows or columns. For example, to get the mean of each numeric column:

print(apply(df[, sapply(df, is.numeric)], 2, mean))

14. Merging Data Frames:

Join two data frames by a common column using the merge() function:

df2 <- data.frame(Name = c("Alice", "Charlie", "David"), Grade = c("A", "B", "C"))
merged_df <- merge(df, df2, by = "Name")
print(merged_df)

Conclusion:

These are just some basic operations you can perform on data frames in R. With its rich ecosystem of packages and vast community support, R provides many more advanced functionalities for data frame manipulation, especially with packages like dplyr and tidyr.

  1. Subset, filter, and select in R DataFrame:

    # Subset DataFrame based on a condition
    subset_data <- original_data[original_data$Age > 25, ]
    
    # Filter DataFrame using dplyr
    library(dplyr)
    filtered_data <- original_data %>%
      filter(Age > 25) %>%
      select(Name, Age)
    
  2. Joining DataFrames in R:

    # Join DataFrames using merge
    merged_data <- merge(df1, df2, by = "common_column")
    
    # Join DataFrames using dplyr
    joined_data <- inner_join(df1, df2, by = "common_column")
    
  3. Grouping and aggregation in R DataFrame:

    # Group and aggregate using base R
    grouped_data <- aggregate(Score ~ Group, data = original_data, mean)
    
    # Group and aggregate using dplyr
    library(dplyr)
    grouped_data_dplyr <- original_data %>%
      group_by(Group) %>%
      summarise(mean_score = mean(Score))
    
  4. Sorting and ordering DataFrame in R:

    # Sort DataFrame based on a column using base R
    sorted_data <- original_data[order(original_data$Age), ]
    
    # Sort DataFrame using dplyr
    library(dplyr)
    sorted_data_dplyr <- original_data %>%
      arrange(Age)
    
  5. Reshaping DataFrames in R:

    # Reshape DataFrame using tidyr
    library(tidyr)
    reshaped_data <- spread(original_data, key = Type, value = Value)
    
  6. Handling missing values in R DataFrame:

    # Remove rows with missing values using base R
    cleaned_data <- original_data[complete.cases(original_data), ]
    
    # Remove missing values using dplyr
    library(dplyr)
    cleaned_data_dplyr <- original_data %>%
      drop_na()