Removing duplicate mails
To sync my mail between computers I use offlineimap on a secure filesystem. Today I mistakenly ran offlineimap before mounting the secure filesystem, which caused it to duplicate all emails. Not wanting to do any manual work to fix this, I wrote a small Python 3 program that repaired the damage.
import mailbox
import glob
import os.path
def dedupe(maildir):
'''Removes duplicates from the given dir'''
box = mailbox.Maildir(maildir, create=False)
box.lock()
seen = set() # Set of Message IDs we have seen
delete = set() # Set of message keys to delete
# Search for messages to delete
for (key, message) in box.iteritems():
mid = message['Message-Id']
# If we have seen this Message ID before,
# remember it for deletion
if mid in seen:
delete.add(key)
seen.add(mid)
# Delete the messages
for key in delete:
box.remove(key)
box.close()
# Iterate over all subdirectories as maildirs
for dir in glob.glob('*'):
if not os.path.isdir(dir): continue
dedupe(dir)
I just put this here so I wouldn’t lose it. Perhaps you find some use for it too.