Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
Analyzing the selling price of used cars using Python's pandas library can yield valuable insights. Let's walk through a tutorial that demonstrates some basic analysis.
Start by importing the necessary libraries and loading the dataset.
import pandas as pd import matplotlib.pyplot as plt # Load the dataset # Here, let's assume our dataset is named 'used_cars.csv' df = pd.read_csv('used_cars.csv')
Before diving into analysis, it's always a good idea to explore the dataset.
# Check the first few rows to understand the data print(df.head()) # Basic statistics of numerical features print(df.describe()) # Check for missing values print(df.isnull().sum())
Visualizing the distribution of the selling price can help understand its spread.
plt.hist(df['Selling_Price'], bins=50, color='blue', edgecolor='black') plt.title('Distribution of Selling Price') plt.xlabel('Selling Price') plt.ylabel('Number of Cars') plt.grid(True) plt.show()
If the dataset contains a column for the car brand or make, you can analyze the average selling price by brand.
brand_mean_prices = df.groupby('Brand')['Selling_Price'].mean().sort_values() # Plotting brand_mean_prices.plot(kind='barh', figsize=(10,7), color='skyblue') plt.title('Average Selling Price by Brand') plt.xlabel('Average Selling Price') plt.ylabel('Brand') plt.grid(True) plt.show()
Assuming there's a column named 'Year' indicating the manufacturing year of the car:
# Creating a new column for car age df['Car_Age'] = 2023 - df['Year'] # Assuming current year is 2023 plt.scatter(df['Car_Age'], df['Selling_Price'], alpha=0.5) plt.title('Selling Price vs Car Age') plt.xlabel('Car Age (in years)') plt.ylabel('Selling Price') plt.grid(True) plt.show()
If the dataset has a 'Transmission' column (Manual or Automatic):
transmission_prices = df.groupby('Transmission')['Selling_Price'].mean() # Plotting transmission_prices.plot(kind='bar', color=['lightgreen', 'lightcoral']) plt.title('Average Selling Price by Transmission Type') plt.xlabel('Transmission Type') plt.ylabel('Average Selling Price') plt.xticks(rotation=0) plt.grid(axis='y') plt.show()
These are just starting points. Depending on the columns available in your dataset, you can dive deeper into various factors influencing the selling price, like analyzing the impact of mileage, fuel type, engine size, horsepower, etc.
Exploratory data analysis of used car selling prices in Python:
import pandas as pd # Load the used car dataset df = pd.read_csv('used_cars.csv') # Display basic statistics of numerical columns print(df.describe()) # Visualize the distribution of car prices df['Price'].hist()
Data analysis of second-hand car prices using Pandas:
import pandas as pd # Load the used car dataset df = pd.read_csv('used_cars.csv') # Display information about the dataset print(df.info()) # Analyze the distribution of car brands brand_counts = df['Brand'].value_counts()
Predictive modeling of used car prices with Python and Pandas:
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error # Load the used car dataset df = pd.read_csv('used_cars.csv') # Select features and target variable X = df[['Mileage', 'Age']] y = df['Price'] # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train a linear regression model model = LinearRegression() model.fit(X_train, y_train) # Make predictions on the test set predictions = model.predict(X_test) # Evaluate the model performance mse = mean_squared_error(y_test, predictions)
Descriptive statistics for used car sales data in Python:
import pandas as pd # Load the used car dataset df = pd.read_csv('used_cars.csv') # Display summary statistics for numerical columns summary_stats = df.describe() # Display unique values in categorical columns unique_brands = df['Brand'].unique()
Correlation analysis of factors affecting used car prices in Pandas:
import pandas as pd # Load the used car dataset df = pd.read_csv('used_cars.csv') # Calculate correlation matrix correlation_matrix = df.corr() # Visualize the correlation matrix import seaborn as sns sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
Feature engineering for predicting used car prices in Python:
import pandas as pd # Load the used car dataset df = pd.read_csv('used_cars.csv') # Create a new feature 'Car Age' from the 'Year' column df['Car Age'] = 2023 - df['Year'] # Create dummy variables for the 'Fuel Type' column df = pd.get_dummies(df, columns=['Fuel Type'], drop_first=True)
Time-series analysis of historical used car prices using Pandas:
import pandas as pd # Load the used car dataset with a datetime column df = pd.read_csv('used_cars_time_series.csv', parse_dates=['Sale Date']) # Set the 'Sale Date' column as the index df.set_index('Sale Date', inplace=True) # Resample the data to monthly frequency and calculate average prices monthly_avg_prices = df['Price'].resample('M').mean()
Machine learning regression models for predicting car prices:
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error # Load the used car dataset df = pd.read_csv('used_cars.csv') # Select features and target variable X = df[['Mileage', 'Age', 'Power']] y = df['Price'] # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train a random forest regressor model model = RandomForestRegressor() model.fit(X_train, y_train) # Make predictions on the test set predictions = model.predict(X_test) # Evaluate the model performance mse = mean_squared_error(y_test, predictions)
Cleaning and preprocessing used car dataset with Pandas:
import pandas as pd # Load the used car dataset df = pd.read_csv('used_cars_dirty.csv') # Drop rows with missing values df.dropna(inplace=True) # Convert 'Mileage' column to numeric df['Mileage'] = pd.to_numeric(df['Mileage'].str.replace(' km', '').str.replace(',', '')) # Remove duplicates df.drop_duplicates(inplace=True)
Grouping and aggregating data for summarizing car sales statistics:
import pandas as pd # Load the used car dataset df = pd.read_csv('used_cars.csv') # Group by 'Brand' and calculate average prices avg_prices_by_brand = df.groupby('Brand')['Price'].mean() # Group by 'Brand' and 'Fuel Type' and calculate total sales count sales_count_by_brand_fuel = df.groupby(['Brand', 'Fuel Type'])['Price'].count()
Geographic analysis of used car prices using Python and Pandas:
import pandas as pd import geopandas as gpd from shapely.geometry import Point # Load the used car dataset with latitude and longitude columns df = pd.read_csv('used_cars_geo.csv') # Create a GeoDataFrame geometry = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])] gdf = gpd.GeoDataFrame(df, geometry=geometry) # Plot a map of used car prices world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres')) ax = world.plot(figsize=(10, 6)) gdf.plot(ax=ax, color='red', markersize=10)
Python Pandas code examples for analyzing and visualizing car sales data:
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Load the used car dataset df = pd.read_csv('used_cars.csv') # Visualize the distribution of car prices plt.figure(figsize=(10, 6)) sns.histplot(df['Price'], bins=30, kde=True) plt.title('Distribution of Car Prices') plt.xlabel('Price') plt.ylabel('Frequency') plt.show() # Scatter plot of Mileage vs. Price plt.figure(figsize=(10, 6)) sns.scatterplot(x='Mileage', y='Price', data=df) plt.title('Scatter Plot of Mileage vs. Price') plt.xlabel('Mileage') plt.ylabel('Price') plt.show()