Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
Text replacement is a common task when working with data, especially during the data cleaning process. This tutorial will guide you on how to replace text values in a pandas DataFrame or Series.
Make sure you have pandas installed:
pip install pandas
import pandas as pd
df = pd.DataFrame({ 'Fruit': ['Apple', 'Banana', 'Cherry', 'Apple', 'Grape', 'Banana'] }) print(df)
If you want to replace "Apple" with "Mango":
df['Fruit'] = df['Fruit'].replace('Apple', 'Mango') print(df)
You can replace multiple values by passing a dictionary of the old value as the key and the new value as its corresponding value:
replace_values = {'Banana': 'Blueberry', 'Cherry': 'Strawberry'} df['Fruit'] = df['Fruit'].replace(replace_values) print(df)
The replace()
method supports regular expressions, which is handy when you need to replace text based on a pattern rather than exact matches.
To replace any fruit name ending with 'e' with 'Fruit':
df['Fruit'] = df['Fruit'].replace(r'.*e$', 'Fruit', regex=True) print(df)
If you want to replace values across the entire DataFrame and not just a specific column, you can do so by omitting the column reference:
df = df.replace('Grape', 'Pineapple') print(df)
By default, the replacement is case sensitive. If you want to perform a case-insensitive replacement, you might have to incorporate a method involving str.contains
and a lambda function:
df['Fruit'] = df['Fruit'].apply(lambda x: 'Mango' if 'apple' in x.lower() else x) print(df)
This example replaces all case variations of "apple" with "Mango".
Pandas provides versatile tools, such as the replace()
method, for text replacement in a DataFrame or Series. It's efficient for both exact matches and pattern-based replacements using regular expressions. This functionality is incredibly beneficial when cleaning and preprocessing textual data.
Replace text values in Pandas DataFrame:
.replace()
method in Pandas to replace specified text values in a DataFrame.import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': ['apple', 'orange', 'banana'], 'B': ['red', 'yellow', 'yellow']}) # Replace 'yellow' in column 'B' with 'green' df.replace({'B': {'yellow': 'green'}}, inplace=True)
Using replace() method in Pandas:
replace()
method in Pandas is a versatile function that allows you to replace specified values in a DataFrame.import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': ['apple', 'orange', 'banana'], 'B': ['red', 'yellow', 'yellow']}) # Replace 'yellow' in column 'B' with 'green' df.replace({'B': {'yellow': 'green'}}, inplace=True)
Replace specific strings in Pandas Series:
replace()
method on a specific column (Series) to replace specified strings.import pandas as pd # Sample Series data = pd.Series(['apple', 'orange', 'banana']) # Replace 'orange' with 'grape' data.replace('orange', 'grape', inplace=True)
Conditional text replacement in Pandas DataFrame:
numpy.where()
function.import pandas as pd import numpy as np # Sample DataFrame df = pd.DataFrame({'A': [10, 20, 30], 'B': [25, 35, 15]}) # Replace values in column 'A' greater than 20 with 999 df['A'] = np.where(df['A'] > 20, 999, df['A'])
Replace NaN values with a string in Pandas:
import pandas as pd # Sample DataFrame with NaN values df = pd.DataFrame({'A': ['apple', 'orange', pd.NA], 'B': ['red', pd.NA, 'yellow']}) # Replace NaN with 'unknown' df.replace(pd.NA, 'unknown', inplace=True)
Regex-based text replacement in Pandas:
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': ['apple', 'orange', 'banana']}) # Replace words starting with 'a' or 'o' with 'fruit' df.replace(to_replace=r'^[ao].*', value='fruit', regex=True, inplace=True)
Replace multiple values in Pandas DataFrame:
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': ['apple', 'orange', 'banana'], 'B': ['red', 'yellow', 'yellow']}) # Replace 'yellow' with 'green' and 'red' with 'blue' in column 'B' df.replace({'B': {'yellow': 'green', 'red': 'blue'}}, inplace=True)
Replace values with another column in Pandas:
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': ['apple', 'orange', 'banana'], 'B': ['red', 'yellow', 'green']}) # Replace values in column 'B' with values from column 'A' df['B'] = df['A'].replace({'apple': 'red', 'orange': 'yellow', 'banana': 'green'})
Case-insensitive text replacement in Pandas:
case
parameter in the replace()
method.import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': ['apple', 'Orange', 'banana']}) # Replace 'orange' with 'grape' (case-insensitive) df.replace({'A': {'orange': 'grape'}}, inplace=True, case=False)