Skip to content

Renaming and Removing Columns in pandas DataFrames

Renaming columns

Four methods for renaming columns in a pandas DataFrame:

1. Rename specific columns

foo.rename(columns={}, inplace=True)

Pass a dict of columns to be renamed. For example:

foo.rename(columns={'old_col1': 'new_col1',
                    'old_col2': 'new_col2'},
                    inplace=True)

2. Rename all the column names

Set .columns to a list of all of the new column names. For example:

foo.columns = ['new_col1', 'new_col2']

3. Rename the columns when reading in a file

For example:

foo_new_names = ['new_col1', 'new_col2']
foo = pd.read_csv(data_file.csv,
                  names=foo_new_names,
                  header=0)

Note that in addition to setting the names parameter to a list of the new column names, you must also set header=0 to indicate that you're replacing the existing column names in the 0th row (if the 0th row is a header row).

4. Replace existing spaces in column names with underscores:

foo.columns = foo.columns.str.replace(' ', '_')

Removing columns (and rows)

Here is the general syntax:

foo.drop(col_str_or_list_of_strs,
         axis=1,  # 0 axis is rows, 1 axis is cols
         inplace=True)

to drop a single column:

foo.drop('col_name',
         axis=1,
         inplace=True)

to drop multiple columns:

foo.drop(['col1', 'col2'],
         axis=1,
         inplace=True)

to drop a single row (data row 0):

foo.drop(0,  # int identifying the data row
         axis=0,
         inplace=True)

to drop multiple rows (data rows 1 and 2):

foo.drop([1, 2],  # list of ints identifying the data rows
         axis=0,
         inplace=True)