Hands-On Exploratory Data Analysis with Python
上QQ阅读APP看书,第一时间看更新

Data cleansing

 Let's create a CSV file with only the required fields. Let's start with the following steps:

Import the csv package:

import csv

2.reate a CSV file with only the required attributes:

with open('mailbox.csv', 'w') as outputfile:
writer = csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])

for message in mbox:
writer.writerow([
message['subject'],
message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
]
)

The preceding output is a csv file named mailbox.csv. Next, instead of loading the mbox file, we can use the CSV file for loading, which will be smaller than the original dataset.