R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Descriptive Analysis in R

Descriptive analysis helps in summarizing and understanding the main features of a dataset, often through visual means or summary statistics. In this tutorial, we'll explore some basic descriptive analysis techniques in R.

1. Setup:

You don't need any special packages for basic descriptive analysis, just base R.

2. Sample Data:

For demonstration purposes, we'll use the built-in mtcars dataset:

data(mtcars)
head(mtcars)

3. Descriptive Statistics:

a. Measures of Central Tendency:

These measures provide a central value for the data distribution.

  • Mean:
mean(mtcars$mpg)
  • Median:
median(mtcars$mpg)
  • Mode (R doesn't have a built-in mode function, so we'll define one):
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}
getmode(mtcars$gear)

b. Measures of Dispersion:

These measures describe the spread or variability of the data.

  • Variance:
var(mtcars$mpg)
  • Standard Deviation:
sd(mtcars$mpg)
  • Range:
range(mtcars$mpg)
  • Interquartile Range (IQR):
IQR(mtcars$mpg)

c. Summary of a dataset:

The summary() function provides a statistical summary of all variables:

summary(mtcars)

4. Visual Descriptive Analysis:

a. Histogram:

Shows the distribution of a numerical variable.

hist(mtcars$mpg, main="Histogram of MPG", xlab="MPG", border="blue", col="lightgreen", breaks=10)

b. Boxplot:

Displays the distribution and spread of a numerical variable.

boxplot(mtcars$mpg, main="Boxplot of MPG", ylab="MPG", col="lightblue")

c. Bar Plot:

Useful for categorical data.

barplot(table(mtcars$cyl), main="Bar Plot of Cylinder Counts", xlab="Number of Cylinders", ylab="Frequency", col="lightpink", border="red")

d. Scatter Plot:

Shows the relationship between two numerical variables.

plot(mtcars$hp, mtcars$mpg, main="Scatterplot of HP vs. MPG", xlab="Horsepower", ylab="MPG", pch=19, col="blue")

5. Correlation:

To see how two variables are linearly related:

cor(mtcars$hp, mtcars$mpg)

Conclusion:

Descriptive analysis is the first step in understanding your data before moving on to more complex analyses or hypothesis testing. R provides a variety of functions and visualization tools to conduct a thorough descriptive analysis of your dataset. Always ensure that your interpretations align with the type of data and the nature of your research question.

  1. Summary statistics in R:

    • Generate summary statistics for a numeric vector.
    # Summary statistics in R
    data <- c(23, 45, 12, 67, 89, 34, 56, 78)
    
    summary_stats <- summary(data)
    print(summary_stats)
    
  2. Descriptive analytics in R:

    • Perform basic descriptive analytics on a dataset.
    # Descriptive analytics in R
    data <- data.frame(
      Age = c(25, 30, 22, 35, 28),
      Salary = c(50000, 60000, 45000, 70000, 55000)
    )
    
    summary_stats <- summary(data)
    print(summary_stats)
    
  3. R summary() function examples:

    • Use the summary() function to get a summary of a dataset.
    # Using summary() function in R
    data <- data.frame(
      Height = c(160, 170, 155, 180, 165),
      Weight = c(55, 70, 50, 85, 62)
    )
    
    data_summary <- summary(data)
    print(data_summary)
    
  4. Measures of central tendency in R:

    • Calculate measures of central tendency (mean, median, mode).
    # Measures of central tendency in R
    data <- c(23, 45, 12, 67, 89, 34, 56, 78)
    
    mean_value <- mean(data)
    median_value <- median(data)
    mode_value <- table(data)[which.max(table(data))]
    
    print(paste("Mean:", mean_value))
    print(paste("Median:", median_value))
    print(paste("Mode:", mode_value))
    
  5. R descriptive analysis of data frames:

    • Perform descriptive analysis on a data frame.
    # Descriptive analysis of data frames in R
    data <- data.frame(
      Age = c(25, 30, 22, 35, 28),
      Salary = c(50000, 60000, 45000, 70000, 55000)
    )
    
    summary_stats <- summary(data)
    print(summary_stats)
    
  6. Box plots and whisker plots in R:

    • Create box plots for visualizing the distribution of a dataset.
    # Box plots and whisker plots in R
    data <- data.frame(
      Group = rep(c("A", "B", "C"), each = 50),
      Values = c(rnorm(50), rnorm(50, mean = 2), rnorm(50, mean = 4))
    )
    
    boxplot(Values ~ Group, data = data, main = "Box Plot")
    
  7. Histograms and frequency distributions in R:

    • Generate histograms and frequency distributions.
    # Histograms and frequency distributions in R
    data <- c(23, 45, 12, 67, 89, 34, 56, 78, 67, 45, 56)
    
    hist(data, main = "Histogram", xlab = "Values", col = "lightblue", border = "black")
    
  8. Descriptive analysis of categorical data in R:

    • Analyze and summarize categorical data.
    # Descriptive analysis of categorical data in R
    data <- c("Category1", "Category2", "Category1", "Category3", "Category2")
    
    table_summary <- table(data)
    print(table_summary)