Removing duplicate mails

Welcome to the news archive. Here you'll find all the news items, ordered by date. You can use the links below to read other news items, or go back to the archive overview.

« Previous: Back from holiday Next: Game sale »

2009-09-08 11:53 - Removing duplicate mails

To sync my mail between computers I use offlineimap on a secure filesystem. Today I mistakenly ran offlineimap before mounting the secure filesystem, which caused it to duplicate all emails. Not wanting to do any manual work to fix this, I wrote a small Python 3 program that repaired the damage:

  import mailbox
import glob
import os.path

def dedupe(maildir):
    '''Removes duplicates from the given dir'''

    box = mailbox.Maildir(maildir, create=False)
    box.lock()

    # Set of Message IDs we have seen
    seen = set()

    # Set of message keys to delete
    delete = set()

    # Search for messages to delete
    for (key, message) in box.iteritems():
        mid = message['Message-Id']

        # If we have seen this Message ID before,
        # remember it for deletion
        if mid in seen:
            delete.add(key)

        seen.add(mid)

    # Delete the messages
    for key in delete:
        box.remove(key)

    box.close()

# Iterate over all subdirectories as maildirs
for dir in glob.glob('*'):
    if not os.path.isdir(dir): continue

    dedupe(dir)

I just put this here so I wouldn't lose it. Perhaps you find some use for it too.

Post a comment


Comments

There are no comments on Removing duplicate mails yet. Use the form below and be the first!

Post a comment

Use Restructured Text to markup the comment. The link opens in a new window.


All fields are required, except when otherwise noted. You can use a limited subset of Restructured Text to markup the comment.

It might take up to five minutes for your comment to show up, due to caching of the pages.