Pandas Read and Write Excel Files – Finxter


Over your career as a Data Scientist, there may be instances where you will work with data to/from a DataFrame to an Excel file.  This article shows you how to manipulate this data using the above functions.

This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.

Note: This article assumes that Excel resides on the computer.

Getting Started

Before any data manipulation can occur, pandas will require installation. This library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

We assume the first line of each code snippet contains the following import statement:

import pandas as pd

Read Excel File

Function Outline

pandas.read_excel(io, sheet_name=0, header=0, names=None, 
                  index_col=None, usecols=None, squeeze=False, 
                  dtype=None, engine=None, converters=None, 
                  true_values=None, false_values=None, skiprows=None, 
                  nrows=None, na_values=None, keep_default_na=True, 
                  na_filter=True, verbose=False, parse_dates=False, 
                  date_parser=None, thousands=None, comment=None, 
                  skipfooter=0, convert_float=None, 
                  mangle_dupe_cols=True, storage_options=None)

The read_excel() function reads an Excel file into a pandas DataFrame.

For this example, we will save a list of the Titanic’s 2nd Class Passengers to Excel. To follow along, perform the following steps.

  • Navigate to https://titanicfacts.net/titanic-passenger-list/#2ndclass.
  • Highlight the 2nd Class Passenger list (including the header row).
  • Press CTRL+C to copy the highlighted area to the system Clipboard.
  • Open Excel.
  • Click inside the A1 cell.
  • Press CTRL+V to paste the contents of the system Clipboard to the Excel Worksheet.
  • Save the Excel file as titanic.xlsx and place this file in the current working directory.

Below are the first few lines of the file.

To read this file into a DataFrame, run the following code.

df = pd.read_excel('titanic.xlsx')
print(df.head(4))
  • Line [2] reads in the titanic.xlsx file created earlier.
  • Line [3] outputs five rows: one header row and four data rows to the terminal.

Output

  Surname First Names Age      Boarded Survivor (S) or Victim (†)
0 Abelson          Mr Samuel  30 Cherbourg                         
1 Abelson          Mrs Hannah  28 Cherbourg                          S
2 Aldworth  Mr Augustus Henry  34 Southampton                         
3 Andrew    Mr Frank Thomas  25 Southampton                         

Notes

  • To hide the index column (far left column above), set the index_col parameter as follows:
df = pd.read_excel('titanic.xlsx', index_col=0)
Surname First Names Age      Boarded Survivor (S) or Victim (†)
Abelson          Mr Samuel  30 Cherbourg                         
Abelson          Mrs Hannah  28 Cherbourg                          S
Aldworth  Mr Augustus Henry  34 Southampton                         
Andrew    Mr Frank Thomas  25 Southampton                         
  • To change the header row a name to a number, set the header parameter as follows:
df = pd.read_excel('titanic.xlsx', header=None)
  0 1 2 3 4
0 Abelson          Mr Samuel  30 Cherbourg                         
1 Abelson          Mrs Hannah  28 Cherbourg                          S
2 Aldworth  Mr Augustus Henry  34 Southampton                         
3 Andrew    Mr Frank Thomas  25 Southampton                         

For a list of available parameters for this function, click here.

DataFrame to Excel

Function Outline

DataFrame.to_excel(excel_writer, sheet_name="Sheet1", na_rep='', 
                   float_format=None, columns=None, header=True, 
                   index=True, index_label=None, startrow=0, 
                   startcol=0, engine=None, merge_cells=True, 
                   encoding=None, inf_rep='inf', verbose=True, 
                   freeze_panes=None, storage_options=None)

The to_excel() function writes a DataFrame to an Excel Worksheet.

For this example, a new Excel file emerges from the original file with some differences.

df = pd.read_excel('titanic.xlsx')
cols = ['Surname', 'First Names', 'Age']
df.to_excel('titanic1.xlsx', sheet_name="Sheet1", columns=cols, freeze_panes=(1,4))
  • Line [2] reads in the titanic.xlsx file created earlier.
  • Line [3] creates a list with the columns we want in to the new Excel file.
  • Line [4] saves a new Excel file with the following options:
    • Indicates the Worksheet to read in (optional).
    • Sets the columns to show (see column list on Line [3]).
    • Freezes the first row, four columns (pane).

Output:

Below are the first few lines of the file. Try scrolling to ensure the freeze pane option worked.

For a list of available parameters for this function, click here.

Excel Styler and Excel Writer

Function Outline

Styler.to_excel(excel_writer, sheet_name="Sheet1", na_rep='', 
                float_format=None, columns=None, header=True, 
                index=True, index_label=None, startrow=0, startcol=0, 
                engine=None, merge_cells=True, encoding=None, 
                inf_rep='inf', verbose=True, freeze_panes=None)

This function writes styles to an Excel Worksheet.

class pandas.ExcelWriter(path, engine=None, date_format=None, 
                         datetime_format=None, mode="w", 
                         storage_options=None, if_sheet_exists=None, 
                         engine_kwargs=None, **kwargs)

This function is a class for writing a DataFrame to an Excel Worksheet.

For this example, we will be creating a dictionary of employees of a small start-up IT company. Run the following code to create a new Excel file, salaries.xlsx.

import pandas as pd
df = pd.DataFrame({'Fname':     ['Anne', 'Mark', 'Paul', 'Ben', 'Micah'],
                 		'Salaries':   [98273,  84900,  120876, 52759, 99767],
                   		'Profit':       [.15,    .23,     .8 ,    .7,    .3]})

xw = pd.ExcelWriter("salaries.xlsx", engine="xlsxwriter")
df.to_excel(xw, sheet_name="Sheet1")

workbook  = xw.book
worksheet = xw.sheets['Sheet1']

sal = workbook.add_format({'num_format': '$#,##0.00'})
pro = workbook.add_format({'num_format': '0%'})

worksheet.set_column('B:B', 15)
worksheet.set_column('C:C', 25, sal)
worksheet.set_column('D:D', 26, pro)
xw.save()
  • Line [2] creates a new DataFrame: a dictionary of lists.
  • Line [3-4] sets up the Excel file, the engine to use, and the Worksheet.
  • Line [5-6] assigns the Workbook and Worksheet.
  • Line [7-8] sets up the formatting for the Salaries and Profit fields.
  • Line [9-11] sets the width of each column and the styles if needed.
  • Line [12] saves the new Excel file.

Output:

For a list of available parameters for these functions, click here.



Source link

Latest articles

Related articles

Leave a reply

Please enter your comment!
Please enter your name here