Skip to content

pandas string methods

This blog post is based on lesson 12 ("How do I use string methods in pandas?") from Data School's pandas video series.

pandas has many string methods available on a Series via .str.some_method()

For example: - df.series_name.str.upper() changes all the strings in the Series called series_name (in the DataFrame called df) to uppercase - df.series_name.str.title() changes the strings to title case (first character of each word is capitalized) - String methods on a Series return a Series. In the case of

df.series_name.str.contains('bar')

the .contains() method returns a Series of Trues and Falses, in which True is returned if the string in the Series series_name contains bar and False is returned if the string in the Series series_name does not contain bar. - You could easily use the True/False Series returned by the .contains() method above to filter a DataFrame. For example:

df[df.series_name.str.contains('bar')]

will return a new DataFrame filtered to only those rows in which the series_name Series (aka the column called series_name) contains the string bar.

You can see all of the str methods available in the pandas API reference.

String methods can be chained together

For example:

df.series_name.str.replace('[', '').str.replace(']', '')

will operate on the Series called series_name in the DataFrame called df. The first .replace() method will replace [ with nothing and the second .replace() method will replace ] with nothing, allowing you to remove the brackets from the strings in the Series.

Many pandas string methods accept regular expressions

The two chained .replace() methods in the previous example can be replaced with a singular regex .replace(), like this:

df.series_name.str.replace('[\[\]]', '')

Here, the .replace() method is taking the regex string

'[\[\]]'

and replacing with nothing. That regular expression can be deconstructed as follows:

  • the outer brackets [ and ] define a character class, meaning that any of the characters within those character class brackets will be replaced
  • inside the outer brackets is \[\]. It represents the two characters [ and ] which will be replaced. However, since brackets have a special meaning in regular expressions, they need to be escaped with backslashes \. So the bracket characters to be replaced end up looking like this:
\[\]

You can see working code for all of the above examples in my Jupyter notebook