
上QQ阅读APP看书,第一时间看更新
Data cleansing
Let's create a CSV file with only the required fields. Let's start with the following steps:
Import the csv package:
import csv
2.reate a CSV file with only the required attributes:
with open('mailbox.csv', 'w') as outputfile:
writer = csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])
for message in mbox:
writer.writerow([
message['subject'],
message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
]
)
The preceding output is a csv file named mailbox.csv. Next, instead of loading the mbox file, we can use the CSV file for loading, which will be smaller than the original dataset.