Pandas DataFrame to Python List – and Vice Versa – Finxter

0
89
Pandas DataFrame to Python List – and Vice Versa – Finxter


In this article, you will learn how to convert a Pandas DataFrame into a list and vice versa.

This operation can be useful whenever you just want to focus on a specific column of a DataFrame and you would like to have it as a simple list. Sometimes you might be interested in converting a list into a Pandas DataFrame, in order to exploit the numerous functions dedicated to DataFrames and easily access/edit their content. 

Long story short

Converting a DataFrame into a list represents a relatively easy and quick procedure.

  • Exploiting Pandas in order to correctly access specific columns or DataFrame subsets we are interested in, it is then sufficient to use the function .values.tolist() in order to convert the selected elements into a list.
  • The opposite function, which allows converting a list into a DataFrame is the Pandas function .DataFrame().
Syntax .tolist()
Details When applied to a DataFrame, use first the method .values() to obtain the elements of that specific DataFrame and then apply .tolist()   
Return Value A list containing the values of the selected DataFrame or DataFrame portion

In the remaining article, we’ll go over this code snippet of different methods to convert a DataFrame to a list and back:

import pandas as pd


url = "my_table.csv"
doc = pd.read_csv(url, sep=',')
df = pd.DataFrame(doc)

# convert a column of the DF into a list
new_list = df[('State')].values.tolist()

# convert multiple columns of the DF into a list
new_list = df.loc[:, ('Year', 'State')].values.tolist()

# convert a row of DF into a list
new_list = df.loc[3].values.tolist()

# convert a list into a DF
new_df = pd.DataFrame(new_list)

This is how the first couple of lines of the sample data looks like:

Import and Read Data as a Pandas DataFrame

We start our script by importing Pandas, which allows using DataFrames and performing numerous operations with them.

After that, we read a set of data from a .csv file using the Pandas function .read_csv(), which accepts as only mandatory input parameter the path of the .csv file.

We also specify the type of separator that is used in the file by adding the optional parameter sep = ",".

After this step, we use the Pandas function .DataFrame() to convert the content of the .csv file into a Pandas DataFrame and we assign it to the variable df.

import pandas as pd

url = r"path of the .csv file"
doc = pd.read_csv(url, sep=',')
df = pd.DataFrame(doc)

What’s a DataFrame?

DataFrames describe heterogeneous table-like data structures, consisting of multiple rows and columns, each of which is labeled.

The rows and columns hence identify a table, whose cells can be used to store data.

Compared to lists and arrays, DataFrames are more versatile structures when it comes to modifying the content of their cells (actually the data stored in each cell). Indeed, it is possible to point to specific columns/rows by just referring to their label. For a detailed description of DataFrames, please refer to this and this tutorial.

The imported .csv file contains information regarding the tobacco consumption in different states and is subdivided into seven different columns which contain data on the state, year etc.

In particular, the first column refers to the year and the second one to the State in which the data have been collected. It is possible to find out all the headings of a certain DataFrame by exploiting the Pandas function .columns() which gives as output all the names of the headers featured in the DataFrame. In order to display all the headers of our .csv file, we call this function in our script and we print its value.

print(df.columns)

The result of this command will be:

You can clearly see that our DataFrame presents seven columns, each of which contains specific information.

Converting a Single Column of the DataFrame into a List

Let’s now suppose we are interested in creating a list containing all those elements that are stored under the header ‘State’; in order to do that, we proceed as follow:

# convert a column of the data frame into a list
new_list = df[('State')].values.tolist()

As you can see from the script, we used the combination of two different functions for achieving the objective: .values() and .tolist().

The first one is used to convert the column presenting “State” as header of the DataFrame into a Numpy array, consisting of a single column and n rows; the function .tolist() is then used to convert the array into a list.

The procedure can be used irrespectively of the type of data contained within the DataFrame; whether you have strings or floats, the final result is the same.

Converting Multiple Columns of the DataFrame into a List

It is also possible to create multidimensional lists by converting multiple columns of our initial DataFrame.

This can be easily achieved by indexing in the correct way the subset of data we are interested in.

Suppose now we are interested in converting into a list the data contained in the columns “Year” and “State”.

We can employ the Pandas function .loc[] in order to access a subset of the DataFrame; after that, we use the same procedure as before, i.e., .values.tolist().

The result will be a list containing other n lists, whose elements are the n-th “Year” and “State”.

The following lines display the procedure.

# convert more columns of the data frame into a list
new_list = df.loc[:, ('Year', 'State')].values.tolist()

As you can see, in order to index all the elements of the columns “Year” and “State”, we used the following nomenclature: .loc[:, (‘Year’, ‘State’)], which means that from our DataFrame we want to index all the rows (using the colon) and the columns identified by the headers “Year” and “State”.

If you are interested in more details about the DataFrame.loc[] function, please refer to the docs.

Converting a DataFrame Row into a List

So far we have seen how to convert single and/or multiple columns of a DataFrame into a list; however, you might be wondering whether it is possible to do the same with the rows of a DataFrame.

The answer is of course yes, and it turns out to be quite simple!

If you remember the previous paragraph, when we used the function .loc[ ], we indexed all the rows and the two columns of interest. If we are now interested in just a single specific row of the DataFrame, it will be sufficient to just specify the number of the row we want to convert and then use again the function values.tolist(). The following code lines describe the procedure:

# convert a line of the data frame into a list
new_list = df.loc[3].values.tolist()

The result will be a list containing all the elements of the fourth (just in this case, since we typed the number 3) row; each element of the list will correspond to a single element of the row.

Convert a List into a DataFrame

Let’s suppose we are now interested in the opposite task, i.e., converting a list into a DataFrame.

Also in this case, the solution to the problem is very simple; it will be sufficient to use the Pandas function .DataFrame() with the list as input parameter.

In order to illustrate the procedure, let’s convert the list obtained from the previous parts back into a DataFrame called “new_df”.

# convert list into data frame
new_df = pd.DataFrame(new_list)

Keep in mind that this new DataFrame will not have the headers of the original one since we obtained it just from the elements of an independent list.

Conclusion

In this article, we saw different options to convert entire or parts of DataFrames into lists and vice versa.

Depending on what you are doing with your script, a DataFrame or a list can represent a more versatile structure to work with. As you saw, the conversion is extremely easy and takes just a couple of code lines.  



Source link

Leave a reply

Please enter your comment!
Please enter your name here