Python Pandas Input/Output – Flat File – Finxter

0
52
Python Pandas Input/Output – Flat File – Finxter


Over your career as a Pythonista, there may be instances where you will work with Flat Files.  This file type is an ASCII character-based file, usually with commas (,) separating the fields. Other common field separators are the:

  • Semi-colon (;)
  • Tab character (t)
  • Colon (:) and so on.

This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.

Install Required Library

Before any data manipulation can occur, a new library will require installation. The pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

Read CSV File

For Section 2 and Section 3 of this article, the flat file below is used.  Copy these lines and save them to a file called classics.txt. Place this file in the current working directory.

💡 Note: The field separator character in this file is a semi-colon (;).

UPC;Title;Price;Inventory
abbb492978ff656d;The Secret Garden;15.08;274
93379e3a2072a01b;The Metamorphosis;28.59;31
2798974abc8a58a8;Candide;58.63;11
2e69730561ed70ad;Emma;32.93;97
39592d9d72e717c4;Of Mice and Men;47.11;18

With the classics.txt file saved to the current working directory, the code below reads in the flat file and sends the contents to a DataFrame. The sep parameter must exist in this instance. By default, the comma (,) separator is assumed.

import pandas as pd

df = pd.read_csv('classics.txt', sep=';', encoding='utf-8')
print(df)
  • Line [3] reads in the text file and parses the fields using the semi-colon (;) separator. Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring. The data is then saved to a DataFrame (df).
  • Line [4] outputs the DataFrame to the terminal window.

💡 Note: A UnicodeEncodeError occurs when a flat file contains ‘special’ characters, such as characters outside the ASCII range. Click here for to view a  chart of these characters.

Output:

  UPC Title Price Inventory
0 abbb492978ff656d  The Secret Garden  15.08 274
1 93379e3a2072a01b  The Metamorphosis  28.59 31
2 2798974abc8a58a8            Candide  58.63 11
3 2e69730561ed70ad               Emma  32.93 97
4 39592d9d72e717c4    Of Mice and Men  47.11 18

DataFrame to CSV

Expanding on the code above, let’s add an additional line to save the DataFrame (df) to a CSV file.

import pandas as pd

df.to_csv('classics.csv', index=False, encoding='utf-8')
print(df)
  • Line [3] passes index=False to remove the left-hand column numbers (see above). Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring.
  • Line [4] outputs the DataFrame to the terminal window.

Output:

UPC Title Price Inventory
abbb492978ff656d  The Secret Garden  15.08 274
93379e3a2072a01b  The Metamorphosis  28.59 31
2798974abc8a58a8            Candide  58.63 11
2e69730561ed70ad               Emma  32.93 97
39592d9d72e717c4    Of Mice and Men  47.11 18

Read Table

For this example, create a new text file called fiction.txt.

Use the following data for this file. Save and place this file in the current working directory.

💡 Note: The separator here is the whitespace parameter. Set your file up in the same format as below.

💡 Note: The drawback is if any data in any column contains a space, for example, ‘Grey Life’, an error will occur.

import pandas as pd

df = pd.read_table('fiction.txt',  delim_whitespace=True, index_col=0, encoding='utf-8')
print(df)
  • Line [3] reads in the text file, sets the field separator (delimiter) to whitespace, and sets the index to column 0. Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring.
  • Line [4] outputs the DataFrame to the terminal.

Output:

  Title  Price Inventory
UPC  
3c456328b04a8ee8       Grey 48.49 23
bade9943ee01b63f      Paris 17.28 4
9546d537fbf99eb6   Dreaming 20.55 13
a40723994f715420    Houdini 30.25 7
41fc5dce044f16f5  Girl-Blue 46.83 34

To save this table as a DataFrame, run the code below.

df.to_csv('fiction.csv', index=True, encoding='utf-8')

Read FWF

FWF stands for Fixed Width Fields. The read_fwf() function reads a table of fixed-width formatted lines into a DataFrame.

For this example, create a new text file called authors.txt.

Use the following data for this file. Place this file in the current working directory.

import pandas as pd

fwidths = [
    9,  # Title
    19, # Author
    6  # Price
    ]

df = pd.read_fwf('authors.txt', widths=fwidths)
print(df)
  • Line [3] sets the width for each column in the authors.txt file.
  • Line [4] reads in authors.txt and sets the widths of each column to their corresponding item in the widths list.
  • Line [5] outputs the DataFrame to the terminal.

Output:

  Title Author Price
0 Grey    Steve Smith  20.88
1 Paris  Audrey Cohill  23.67
2 Dreaming    Alex Balfour  10.99
3 Houdini  Paula Greaves  25.66



Source link

Leave a reply

Please enter your comment!
Please enter your name here