Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects

View basic statistical details in Pandas

One of the most useful features of pandas is its ability to quickly provide basic statistical details of a DataFrame. Here's how you can achieve that.

1. Setup:

First, make sure you've got pandas installed:

pip install pandas

2. Import Necessary Libraries:

import pandas as pd

3. Create a Sample DataFrame:

data = {
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [9, 8, 7, 6, 5]
}
df = pd.DataFrame(data)
print(df)

4. Use describe() to View Basic Statistical Details:

stats = df.describe()
print(stats)

Here's a breakdown of what describe() returns:

  • count: Number of non-null entries.
  • mean: Mean of the values.
  • std: Standard deviation.
  • min: Minimum value.
  • 25%: 25th percentile.
  • 50%: Median or 50th percentile.
  • 75%: 75th percentile.
  • max: Maximum value.

By default, describe() considers only the numeric columns.

5. Including Object Columns:

If you have columns of type object (like strings) and you want to see statistics on those as well:

data['D'] = ['apple', 'banana', 'cherry', 'apple', 'cherry']
df = pd.DataFrame(data)

stats = df.describe(include='all')
print(stats)

For object-type columns, here's what you get:

  • count: Number of non-null entries.
  • unique: Number of distinct values.
  • top: Most frequent category.
  • freq: Frequency of the top category.

6. Use info() to View General Information:

The info() method provides a concise summary of your DataFrame, including the non-null counts and data types:

print(df.info())

7. Other Useful Statistical Methods:

Apart from describe(), there are individual methods to fetch specific statistics:

  • sum(): Get the sum of values.
  • mean(): Compute the arithmetic mean.
  • median(): Compute the median.
  • mode(): Compute the mode.
  • std(): Compute the standard deviation.
  • var(): Compute the variance.
  • skew(): Compute the skewness.
  • kurt(): Compute the kurtosis.

For example:

print("Mean of column A:", df['A'].mean())
print("Standard Deviation of column B:", df['B'].std())

Summary:

Pandas makes it very convenient to obtain a statistical overview of your data. Using methods like describe(), info(), and other specific statistical methods, you can have an insightful understanding of your dataset in just a few lines of code.

  1. Basic statistics in Pandas DataFrame:

    • Use the describe() method to get basic statistics for numerical columns.
    import pandas as pd
    
    data = {'Age': [25, 30, 28, 35, 32],
            'Salary': [50000, 60000, 55000, 70000, 65000]}
    
    df = pd.DataFrame(data)
    
    basic_stats = df.describe()
    
  2. Summary statistics in Pandas:

    • Get summary statistics using the describe() method.
    summary_stats = df.describe()
    
  3. Viewing mean and median in Pandas DataFrame:

    • Access mean and median directly using mean() and median() methods.
    mean_age = df['Age'].mean()
    median_salary = df['Salary'].median()
    
  4. Pandas DataFrame statistics overview:

    • Use the info() method to get an overview of the DataFrame.
    df_info = df.info()
    
  5. Exploring data with Pandas describe:

    • Explore data distribution using the describe() method.
    data_distribution = df.describe()
    
  6. Statistical summary of Pandas DataFrame:

    • Get a statistical summary of the DataFrame.
    stats_summary = df.describe(include='all')
    
  7. Calculate variance and standard deviation in Pandas:

    • Calculate variance and standard deviation for numerical columns.
    variance_salary = df['Salary'].var()
    std_dev_age = df['Age'].std()
    
  8. Overview of basic stats functions in Pandas:

    • Use various statistical functions like mean(), median(), var(), std(), etc.
    mean_age = df['Age'].mean()
    median_salary = df['Salary'].median()
    
  9. Pandas DataFrame info and statistics:

    • Get an overview of DataFrame information and summary statistics.
    df_info = df.info()
    stats_summary = df.describe()