10 Pandas methods that helped me replace Microsoft Excel with Python

How you can use these pandas methods to transition from Microsoft Excel to Python, saving you serious time and sanity.

Kdvxnxbbx
4 min readNov 30, 2020

Photo by Pascal Müller on Unsplash

For this article, I will assume you’re working in Anaconda/GitHub/VS Code etc having already imported pandas into your script with the alias pd

https://ceds.ed.gov/mask/dks/F-v-T.html
https://ceds.ed.gov/mask/dks/F-v-T1.html
https://ceds.ed.gov/mask/dks/F-v-T2.html
https://ceds.ed.gov/mask/dks/F-v-T3.html
https://ceds.ed.gov/mask/dks/F-v-T4.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/dsk/F-v-T.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/dsk/F-v-T1.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/dsk/F-v-T2.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/dsk/F-v-T3.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/dsk/F-v-T4.html
https://ceds.ed.gov/mask/gks/T-v-F.html
https://ceds.ed.gov/mask/gks/T-v-F1.html
https://ceds.ed.gov/mask/gks/T-v-F2.html
https://ceds.ed.gov/mask/gks/T-v-F3.html
https://ceds.ed.gov/mask/gks/T-v-F4.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/gsk/T-v-F.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/gsk/T-v-F1.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/gsk/T-v-F2.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/gsk/T-v-F3.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/gsk/T-v-F4.html
https://ceds.ed.gov/mask/pks/Siv-Got-canli00.html
https://ceds.ed.gov/mask/pks/Siv-Got-canli01.html
https://ceds.ed.gov/mask/pks/Siv-Got-canli02.html
https://ceds.ed.gov/mask/pks/Siv-Got-canli03.html
https://ceds.ed.gov/mask/pks/Siv-Got-canli04.html
https://ceds.ed.gov/mask/pks/Siv-Got-canli05.html
https://ceds.ed.gov/mask/pks/Siv-Got-canli06.html
https://ceds.ed.gov/mask/pks/Kar-Kas-canli00.html
https://ceds.ed.gov/mask/pks/Kar-Kas-canli01.html
https://ceds.ed.gov/mask/pks/Kar-Kas-canli02.html
https://ceds.ed.gov/mask/pks/Kar-Kas-canli03.html
https://ceds.ed.gov/mask/pks/Kar-Kas-canli04.html
https://ceds.ed.gov/mask/pks/Kar-Kas-canli05.html
https://ceds.ed.gov/mask/pks/Kar-Kas-canli06.html
https://ceds.ed.gov/mask/pks/Kar-Kas-canli07.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Kar-Kas-canli00.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Kar-Kas-canli01.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Kar-Kas-canli02.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Kar-Kas-canli03.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Kar-Kas-canli04.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Kar-Kas-canli05.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Kar-Kas-canli06.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Kar-Kas-canli07.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Siv-Got-canli00.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Siv-Got-canli01.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Siv-Got-canli02.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Siv-Got-canli03.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Siv-Got-canli04.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Siv-Got-canli05.html
https://www.scholarscentral.org/editorial-tracking/video-submissions/mask/psk/Siv-Got-canli06.html

1. Import an Excel file

This reads your Excel file into a pandas dataframe (the python equivalent of the tabular structure you’re used to). You’ll want to reuse this dataframe, so we’ll save it to the variable df.

df = pd.read_excel(some_file_path)

2. View your new dataframe

It might seem a bit strange at first that you’re not viewing every single row of data like in an Excel file. Here’s how to view the first five records — you can specify this value inside the parentheses, it’s default is just five.

df.head()

3. Count rows

It’s usually helpful to know how many rows you’re working with, so to get this you can call the count method.

df.count()

4. Basic descriptive statistics

With one line of code you’re able to get the min, max and mean of all columns within your dataframe — hopefully you’re starting to be sold using Pandas already…

df.describe()

5. Replace null values

This one is fairly self explanatory, to replace all null values (these appear as NaN -not a number- within a dataframe) with a zero.

df.fillna(0)

You can take this one step further by forward filling, or backwards filling the value with that above or below that particular row.

df.fillna(method='ffill')
df.fillna(method='bfill')

Note: you can also use df.replace() to replace values using method ='ffill' or method='bfill'

6. Filtering

Filtering a particular column based on a particular value or string. Once you’re more familiar, here’s some documentation about how to filter based on multiple conditions.

df[df['column_name'] == 0]
df[df['column_name'] == "hello"]

7. Drop duplicates

For when you don’t want any repeating rows within your data.

df.drop_duplicates()

8. Vlookup/joins

Out with the vlookup and in with a join… Join your main dataframe df with another dataframe, we’ll call this lookup_dataframe on the column 'column_name' which appears in both df and lookup_dataframe. The join method has the default parameter how='left' , however when you’re more confident there are other types of joins you can find out more about in the pandas documentation.

df.join(lookup_dataframe, on='column_name')

9. Pivot/groupby

Another big part of what Excel is used for is pivot tables. Using pandas, it’s very simple to recreate — here are a couple of examples.

df.groupby(['column1', 'column2']).sum()
df.groupby(['column1', 'column2']).count()

Or, if you want to aggregate each column with a different method, you can do that as well! You’ll have noticed here the use of brackets [] and braces {}, these indicate lists and dictionaries respectively. Learn about lists here and dictionaries here.

df.groupby(['column1', 'column2']).agg({'column1': 'sum', 'column2': 'count'})

10. Export to Excel

The likelihood is you’ll need to share your work again back in Excel. This is what your colleagues will expect to receive, or what you’ll be comfortable using for data visualisations — more from me about using Python packages Seaborn and Matplotlib for data viz coming soon!

df.to_excel(file_path, index=False)

Here I’ve specified not to include indexes within the output Excel file, however you can always amend this to be True if desired.

These are the top 10 pandas methods that I began using when I started the switch from Excel to python. Good luck!

--

--