R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Descriptive Analysis in R

Descriptive analysis helps in summarizing and understanding the main features of a dataset, often through visual means or summary statistics. In this tutorial, we'll explore some basic descriptive analysis techniques in R.

1. Setup:

You don't need any special packages for basic descriptive analysis, just base R.

2. Sample Data:

For demonstration purposes, we'll use the built-in mtcars dataset:

data(mtcars)
head(mtcars)

3. Descriptive Statistics:

a. Measures of Central Tendency:

These measures provide a central value for the data distribution.

Mean:

mean(mtcars$mpg)

Median:

median(mtcars$mpg)

Mode (R doesn't have a built-in mode function, so we'll define one):

getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}
getmode(mtcars$gear)

b. Measures of Dispersion:

These measures describe the spread or variability of the data.

Variance:

var(mtcars$mpg)

Standard Deviation:

sd(mtcars$mpg)

Range:

range(mtcars$mpg)

Interquartile Range (IQR):

IQR(mtcars$mpg)

c. Summary of a dataset:

The summary() function provides a statistical summary of all variables:

summary(mtcars)

4. Visual Descriptive Analysis:

a. Histogram:

Shows the distribution of a numerical variable.

hist(mtcars$mpg, main="Histogram of MPG", xlab="MPG", border="blue", col="lightgreen", breaks=10)

b. Boxplot:

Displays the distribution and spread of a numerical variable.

boxplot(mtcars$mpg, main="Boxplot of MPG", ylab="MPG", col="lightblue")

c. Bar Plot:

Useful for categorical data.

barplot(table(mtcars$cyl), main="Bar Plot of Cylinder Counts", xlab="Number of Cylinders", ylab="Frequency", col="lightpink", border="red")

d. Scatter Plot:

Shows the relationship between two numerical variables.

plot(mtcars$hp, mtcars$mpg, main="Scatterplot of HP vs. MPG", xlab="Horsepower", ylab="MPG", pch=19, col="blue")

5. Correlation:

To see how two variables are linearly related:

cor(mtcars$hp, mtcars$mpg)

Conclusion:

Descriptive analysis is the first step in understanding your data before moving on to more complex analyses or hypothesis testing. R provides a variety of functions and visualization tools to conduct a thorough descriptive analysis of your dataset. Always ensure that your interpretations align with the type of data and the nature of your research question.

Summary statistics in R:

Generate summary statistics for a numeric vector.

# Summary statistics in R
data <- c(23, 45, 12, 67, 89, 34, 56, 78)

summary_stats <- summary(data)
print(summary_stats)

Descriptive analytics in R:

Perform basic descriptive analytics on a dataset.

# Descriptive analytics in R
data <- data.frame(
  Age = c(25, 30, 22, 35, 28),
  Salary = c(50000, 60000, 45000, 70000, 55000)
)

summary_stats <- summary(data)
print(summary_stats)

R summary() function examples:

Use the summary() function to get a summary of a dataset.

# Using summary() function in R
data <- data.frame(
  Height = c(160, 170, 155, 180, 165),
  Weight = c(55, 70, 50, 85, 62)
)

data_summary <- summary(data)
print(data_summary)

Measures of central tendency in R:

Calculate measures of central tendency (mean, median, mode).

# Measures of central tendency in R
data <- c(23, 45, 12, 67, 89, 34, 56, 78)

mean_value <- mean(data)
median_value <- median(data)
mode_value <- table(data)[which.max(table(data))]

print(paste("Mean:", mean_value))
print(paste("Median:", median_value))
print(paste("Mode:", mode_value))

R descriptive analysis of data frames:

Perform descriptive analysis on a data frame.

# Descriptive analysis of data frames in R
data <- data.frame(
  Age = c(25, 30, 22, 35, 28),
  Salary = c(50000, 60000, 45000, 70000, 55000)
)

summary_stats <- summary(data)
print(summary_stats)

Box plots and whisker plots in R:

Create box plots for visualizing the distribution of a dataset.

# Box plots and whisker plots in R
data <- data.frame(
  Group = rep(c("A", "B", "C"), each = 50),
  Values = c(rnorm(50), rnorm(50, mean = 2), rnorm(50, mean = 4))
)

boxplot(Values ~ Group, data = data, main = "Box Plot")

Histograms and frequency distributions in R:

Generate histograms and frequency distributions.

# Histograms and frequency distributions in R
data <- c(23, 45, 12, 67, 89, 34, 56, 78, 67, 45, 56)

hist(data, main = "Histogram", xlab = "Values", col = "lightblue", border = "black")

Descriptive analysis of categorical data in R:

Analyze and summarize categorical data.

# Descriptive analysis of categorical data in R
data <- c("Category1", "Category2", "Category1", "Category3", "Category2")

table_summary <- table(data)
print(table_summary)