View basic statistical details in Pandas

One of the most useful features of pandas is its ability to quickly provide basic statistical details of a DataFrame. Here's how you can achieve that.

1. Setup:

First, make sure you've got pandas installed:

pip install pandas

2. Import Necessary Libraries:

import pandas as pd

3. Create a Sample DataFrame:

data = {
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [9, 8, 7, 6, 5]
}
df = pd.DataFrame(data)
print(df)

4. Use `describe()` to View Basic Statistical Details:

stats = df.describe()
print(stats)

Here's a breakdown of what describe() returns:

count: Number of non-null entries.
mean: Mean of the values.
std: Standard deviation.
min: Minimum value.
25%: 25th percentile.
50%: Median or 50th percentile.
75%: 75th percentile.
max: Maximum value.

By default, describe() considers only the numeric columns.

5. Including Object Columns:

If you have columns of type object (like strings) and you want to see statistics on those as well:

data['D'] = ['apple', 'banana', 'cherry', 'apple', 'cherry']
df = pd.DataFrame(data)

stats = df.describe(include='all')
print(stats)

For object-type columns, here's what you get:

count: Number of non-null entries.
unique: Number of distinct values.
top: Most frequent category.
freq: Frequency of the top category.

6. Use `info()` to View General Information:

The info() method provides a concise summary of your DataFrame, including the non-null counts and data types:

print(df.info())

7. Other Useful Statistical Methods:

Apart from describe(), there are individual methods to fetch specific statistics:

sum(): Get the sum of values.
mean(): Compute the arithmetic mean.
median(): Compute the median.
mode(): Compute the mode.
std(): Compute the standard deviation.
var(): Compute the variance.
skew(): Compute the skewness.
kurt(): Compute the kurtosis.

For example:

print("Mean of column A:", df['A'].mean())
print("Standard Deviation of column B:", df['B'].std())

Summary:

Pandas makes it very convenient to obtain a statistical overview of your data. Using methods like describe(), info(), and other specific statistical methods, you can have an insightful understanding of your dataset in just a few lines of code.

Basic statistics in Pandas DataFrame:

Use the describe() method to get basic statistics for numerical columns.

import pandas as pd

data = {'Age': [25, 30, 28, 35, 32],
        'Salary': [50000, 60000, 55000, 70000, 65000]}

df = pd.DataFrame(data)

basic_stats = df.describe()

Summary statistics in Pandas:
- Get summary statistics using the describe() method.
```
summary_stats = df.describe()
```
Viewing mean and median in Pandas DataFrame:
- Access mean and median directly using mean() and median() methods.
```
mean_age = df['Age'].mean()
median_salary = df['Salary'].median()
```
Pandas DataFrame statistics overview:
- Use the info() method to get an overview of the DataFrame.
```
df_info = df.info()
```
Exploring data with Pandas describe:
- Explore data distribution using the describe() method.
```
data_distribution = df.describe()
```
Statistical summary of Pandas DataFrame:
- Get a statistical summary of the DataFrame.
```
stats_summary = df.describe(include='all')
```
Calculate variance and standard deviation in Pandas:
- Calculate variance and standard deviation for numerical columns.
```
variance_salary = df['Salary'].var()
std_dev_age = df['Age'].std()
```
Overview of basic stats functions in Pandas:
- Use various statistical functions like mean(), median(), var(), std(), etc.
```
mean_age = df['Age'].mean()
median_salary = df['Salary'].median()
```
Pandas DataFrame info and statistics:
- Get an overview of DataFrame information and summary statistics.
```
df_info = df.info()
stats_summary = df.describe()
```