R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Normal Distribution in R

In this tutorial, we'll explore the normal distribution, its properties, and how to work with it in R.

1. Introduction:

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is symmetric about the mean. The distribution is determined by two parameters: the mean (��) and the standard deviation (��). It is widely used in statistics and the natural sciences due to its desirable properties.

2. Generating Random Numbers from a Normal Distribution:

You can generate random numbers from a normal distribution using the rnorm() function.

set.seed(123)  # For reproducibility
random_numbers <- rnorm(n=1000, mean=0, sd=1)
hist(random_numbers, main="Histogram of Randomly Generated Numbers", xlab="Value", breaks=50)

3. Density, Distribution, Quantile Functions:

  • Density: The dnorm() function gives the height of the probability density function for the normal distribution.

    x <- seq(-4, 4, by=0.1)
    y <- dnorm(x, mean=0, sd=1)
    plot(x, y, type="l", main="Density of Standard Normal Distribution", ylab="Density", xlab="Value")
    
  • Distribution Function: The pnorm() function gives the cumulative distribution function (probability that a normally distributed random number is less than x).

    prob <- pnorm(1, mean=0, sd=1)
    print(prob)  # Probability that X < 1 for standard normal distribution
    
  • Quantile Function: The qnorm() function returns the quantile function, which is the inverse of the distribution function.

    quantile_val <- qnorm(0.95, mean=0, sd=1)
    print(quantile_val)  # Returns the 95th percentile of standard normal distribution
    

4. Checking Normality:

The shapiro.test() function tests the null hypothesis that data is drawn from a normal distribution.

test_result <- shapiro.test(random_numbers)
print(test_result)

5. Transforming Data to be Normally Distributed:

Sometimes, data may need to be transformed to approximate a normal distribution. Common transformations include the log, square root, and Box-Cox transformations.

For instance:

non_normal_data <- rexp(1000)
transformed_data <- log(non_normal_data)
hist(transformed_data, main="Histogram of Log-transformed Data", xlab="Value", breaks=50)

6. Working with Non-Standard Normal Distributions:

The functions mentioned above (rnorm(), dnorm(), pnorm(), qnorm()) all accept mean and sd parameters to work with non-standard normal distributions. The standard normal distribution has mean 0 and standard deviation 1.

Conclusion:

Understanding the normal distribution and knowing how to work with it is fundamental in statistics and many applications of data science. R provides a comprehensive set of functions for working with normal distributions, making it easy to generate, analyze, and visualize normally distributed data.

  1. Generating random numbers from a normal distribution in R:

    • Overview: Introduce the concept of generating random numbers from a normal distribution.

    • Code:

      # Generating random numbers from a normal distribution
      set.seed(123)
      random_numbers <- rnorm(1000, mean = 0, sd = 1)
      
      # Printing the first few random numbers
      print("First few random numbers:")
      print(head(random_numbers))
      
  2. R code for plotting normal distribution curve:

    • Overview: Demonstrate how to create a plot of the normal distribution curve in R.

    • Code:

      # Plotting the normal distribution curve
      x <- seq(-3, 3, by = 0.01)
      y <- dnorm(x, mean = 0, sd = 1)
      
      plot(x, y, type = "l", col = "blue", lwd = 2, main = "Normal Distribution Curve", xlab = "x", ylab = "Density")
      
  3. Calculating probabilities for normal distribution in R:

    • Overview: Explain how to calculate probabilities for a normal distribution in R.

    • Code:

      # Calculating probabilities for normal distribution
      probability <- pnorm(1.96, mean = 0, sd = 1)
      
      # Printing the probability
      print(paste("Probability:", probability))
      
  4. R dnorm function usage for normal distribution:

    • Overview: Discuss the usage of the dnorm function for evaluating the probability density function (PDF) of the normal distribution.

    • Code:

      # Using dnorm function for normal distribution
      density <- dnorm(0, mean = 0, sd = 1)
      
      # Printing the density
      print(paste("Density at 0:", density))
      
  5. Fitting normal distribution to data in R:

    • Overview: Illustrate how to fit a normal distribution to data in R.

    • Code:

      # Fitting normal distribution to data
      data <- rnorm(1000, mean = 2, sd = 1)
      fit_params <- fitdist(data, "norm")
      
      # Printing the fitted parameters
      print("Fitted Parameters:")
      print(fit_params)
      
  6. Statistical tests for normality in R:

    • Overview: Discuss statistical tests for checking the normality of a distribution in R.

    • Code:

      # Performing a normality test
      data <- rnorm(1000, mean = 0, sd = 1)
      normality_test <- shapiro.test(data)
      
      # Printing the test results
      print("Normality Test Results:")
      print(normality_test)
      
  7. R mean and standard deviation for normal distribution:

    • Overview: Calculate the mean and standard deviation for a normal distribution in R.

    • Code:

      # Calculating mean and standard deviation for normal distribution
      mean_value <- mean(random_numbers)
      sd_value <- sd(random_numbers)
      
      # Printing the results
      print(paste("Mean:", mean_value))
      print(paste("Standard Deviation:", sd_value))
      
  8. Normal Q-Q plot in R programming:

    • Overview: Create a Normal Q-Q plot for assessing the normality of data in R.

    • Code:

      # Creating a Normal Q-Q plot
      qqnorm(random_numbers)
      qqline(random_numbers, col = "red")