Removing duplicate mails

To sync my mail between computers I use offlineimap on a secure filesystem. Today I mistakenly ran offlineimap before mounting the secure filesystem, which caused it to duplicate all emails. Not wanting to do any manual work to fix this, I wrote a small Python 3 program that repaired the damage.

import mailbox
import glob
import os.path

def dedupe(maildir):
    '''Removes duplicates from the given dir'''

    box = mailbox.Maildir(maildir, create=False)
    box.lock()

    seen = set()   # Set of Message IDs we have seen
    delete = set() # Set of message keys to delete

    # Search for messages to delete
    for (key, message) in box.iteritems():
        mid = message['Message-Id']

        # If we have seen this Message ID before,
        # remember it for deletion
        if mid in seen:
            delete.add(key)

        seen.add(mid)

    # Delete the messages
    for key in delete:
        box.remove(key)

    box.close()

# Iterate over all subdirectories as maildirs
for dir in glob.glob('*'):
    if not os.path.isdir(dir): continue

    dedupe(dir)

I just put this here so I wouldn’t lose it. Perhaps you find some use for it too.

dr. Sybren A. Stüvel
dr. Sybren A. Stüvel
Open Source software developer, photographer, drummer, and electronics tinkerer

Related