[Crossposted from Riparian Data]
Straightforward title, straightforward goal, ugly and roundabout (but free!) method of achieving it.
For some time now, I’ve had this goal: download my gmail data, analyze it, and visualize it.
The last time I tried this, I glossed over the whole getting your gmail data into Excel part. This is because I wasn’t able to do all of it myself–Jim had to take my ugly mbox data and make it Excel-readable.
But now, thanks to the basic python skills acquired in my data science class, I can do everything myself! Kinda. The code in part 3 will probably make a real programmer scream, but for the most part, it works–though it’s not fond of commas in subject lines. And if you, like me, are not a programmer–don’t worry! You can still run the code, using my trusty copy/paste/pray methodology.
Alors, here goes:
Step 1: From Gmail to Apple Mail
You have Apple mail, right? You can also do this with Outlook, and probably other desktop clients.
1) In your Gmail settings, go to the “Forwarding and POP/IMAP tab” and make sure POP is enabled.
2) Now, add your Gmail account to your desktop client o’choice. If it’s already there, add it again–you’re going to be removing this one.
Important: Do not check the “remove copy from server after retrieving a message” box!
Step 2: From Apple Mail to mbox
This part is easy. Just select your mailbox in the desktop client, and go to Mailbox->Export Mailbox, and choose a destination folder.
Step 3: From mbox to csv
If you try to save your pristine mbox file as a csv, you will get a one column csv. Don’t do that. Instead, use these python scripts (also up on github).
The first script opens a blank csv file, and fills it with the subject, from, and date lines for each message in your mbox. I called it mbox_parser.py.
import mailbox import csv writer = csv.writer(open("clean_mail.csv", "wb")) for message in mailbox.mbox('your_mbox_name'): writer.writerow([message['subject'], message['from'], message['date']])
If you don’t know what python is, you can still run this script. Here’s how:
1) copy the above code to a plain text file, and save it as mbox_parser.py. Save it to the same folder you saved your mbox file to.
2) open your terminal (spotlight–>terminal)
3) type cd Users/your_account_name/directory_where_you_saved_your_mbox,
4) type python mbox_parser.py
5) Voila! In your directory, you should see a new file, cleaner.csv.
You’ll notice that the ‘date’ column is a long, jam-packed date string. It’ll be much easier to extract insight if I split this puppy up so that the day of the week is separated from the calendar date. Unfortunately, this is not totally a walk in the park. More like a walk on the wild side, heheh.
6) Save the following script as split_date.py and run python split_date.py clean_mail.csv dated_mail.csv in your terminal:
#!/usr/bin/env python import sys import csv infile = open(sys.argv[1]) outfile = open(sys.argv[2], 'w') writer = csv.writer(outfile) for line in infile: splits = line.split(",") newline = [] if len(splits) == 4: for i in range(0,3): newline.append(splits[i]) else: newline.append("None") print newline writer.writerow(newline) infile.close() outfile.close()
7) Lastly, let’s add a header row to your csv:
file_read = csv.DictReader(open('dated_mail.csv', 'rb'), ['subject', 'from', 'day', 'date']) file_write = csv.DictWriter(open('final_mail.csv', 'wb'), ['subject', 'from', 'day','date']) file_write.writeheader() file_write.writerows(file_read)
Share the post "Carpe Datum: How to Export Your GMail to Excel"