What is a Pandas Dataframe


How to Automate Excel Reporting

What is a Pandas Dataframe?

*

Python Excel tool of Choice

Behold, the pandas dataframe. A dataframe is very comparable to an excel spreadsheet. As with an excel spreadsheet, a dataframe has columns with heading values and values in the columns. Similar to excel, these columns are assigned something called datatypes. These datatypes determine what you can do with the values within the columns. Let's break down each part of this to get a better understanding.


Pandas Dataframes are Composed of Columns


These columns are called series. Each series within a dataframe has an assigned datatype. So for example, a column of numbers may be assigned the datatype of float (numbers with decimal places). Combining these columns creates a dataframe. Note that the dataframe can include many different datatypes. Just an individual column can only have a single datatype.


A dataframe is aligned along something called an index. This index can be used to select rows of your dataframe. By default, this index is just numbered by position. But you can set an index if you want. A common index is to use dates or times.



Dataframes are Useful for Speed


A common and valid question, is why to even use a dataframe whenever we're able to use built in python types like lists or dictionaries. Dataframes represent spreadsheets visually much cleaner, so they provide a more comfortable experience for most users. Additionally, the power of the dataframe happens within the columns.


Since columns contain only one data type, this means we can skip over some checks python would normally perform and execute our code much faster. Right now, it's not a big deal to understand this, but consider a python list. It can have any type of value contained within it, so python has to check if the operation you're asking to do on those values is valid at every position in the list. This is slow in the realm of computing (but still really fast usually), pandas columns solve this problem. Since there's only one type, python does one check, validates the entire column, and performs the action extremely fast.


Python Readability in Dataframes


Few people work only by themselves, therefore it's important that your work can be transfered to others and easily understood. The pandas dataframe provides this experience for others. Even if they are not python users themselves, the dataframe should feel very familar, just like any other spreadsheet. Python emphasizes these sorts of approaches all throughout the language. This is a great showcase.


Take Some Time to Understand:

  • What is a Pandas Series
  • Why do actions on pandas series execute faster than python lists?
  • What is an dataframe index?
  • In what ways is a dataframe similar to a spreadsheet?

A good understanding of these questions should help in the coming tutorials.





Derrick Sherrill

By: Derrick Sherrill

Thanks for visiting my page! I'm working hard to make the best content I can for you. I love watching people learn and teaching others. Happy Coding!

Become a Patreon!