Progmatism.com

Thunderjudge

Manually creating mail filters is so Web 1.0

Thunderjudge is an extension for Mozilla Thunderbird that acts as a generalized statistical filter. It works just like the built-in junk filter: you tell it that a few emails belong to a certain category, then it looks for clues and automatically decides how future emails should be categorized. You may use any number of different categories, and can easily add new categories, merge categories, and rename categories.

Well that's the idea at least. As of now, Thunderjudge is being released as a 0.5 Alpha, and development is being put on hold. It's just too unstable and I can't do anything about it.

Here's a list of problems:

  • The filtering is just too slow. I'll probably be able to fix this by moving the engine to XPCOM, which I'm currently biting my teeth into.
  • The problem with folders not updating has been fixed. But, and this is much much worse, sometimes arbitrary message corruption happens. For example, when multiple messages arrive and get moved to a folder, message A might get it's own body text with message B's headers. I have no clue what is causing this (it seems like some race condition in Thunderbird's message database), and have neither the time nor desire to find out.
  • Similarily, sometimes messages will just disappear without a trace, while being moved. This occurs much less frequently, but it is completely silent without a trace of evidence that it has happened.
  • The file format used for training data is horrible (space separated values). Either XPCOM or MozStorage could fix this.
  • It has not been tested on Imap. I have no clue what might happen with it.
  • Due to limitations of the Spam filter, Thunderjudge can only run after you've downloaded an entire batch of mail. If you Cancel a download operation before it finishes, Thunderjudge will not try and classify any mail (neither will the Spam filter). I can't do anything about this, except hope that the Thunderbird team, in the future, makes the Spam filter work like the built in rule-based filterer, and fleshes out the filterPlugin interface.
  • Sometimes, moving mail into a newly created folder doesn't fire mail movement events (so the classifier is not trained with this mail). This not much I can do about this, but I have a work around in mind that I can implement once I fix the file format (which I can do once I switch to XPCOM!)

For now, I'm providing this so that others can see the idea, and if they like, try and test it out. If you choose to do so, I'd recommend running Thunderbird in a separate profile (using MOZ_NO_REMOTE so you can get a different process alongside your main version), and POP all of your mail to each. Do not use this extension on your primary mail account unless you're backing up your mail. Hopefully, at some point in the future, the circumstances will change and I'll be able to complete this project, but as for now there's just too much standing in the way to getting it to work well enough.

In case that happens, I also have a few small features I'd like to add:

  • Merge catagories. Should be pretty easy actually.
  • A general catagory manager. Just needs a bit of UI work.
  • The classication algorithm is still simple (only one round of the tournement), but I need to make it more efficient before changing this.
  • Make TJ into an official filterPlugin object. For this I believe there's nothing I can do. I'll have to wait until the actual interface is provided by the core.
  • There's a bug where deleting a catagory doesn't remove it from the "Catagorize" context menu item until restart. This is an easy fix.

Usage

Actually using Thunderjudge is easy. Just drag messages into the folder corresponding to the category it should be classified as. Alternatively, right-click a message and choose "Classify as" to pick an existing category, or even categorize it as a brand new category. When new mail arrives, it will automatically be classified based upon the old training data, and placed into the correct category folder. If something happens where a message is not classified automatically, you may right-click on it and choose "Auto classify".

Version 0.5 Alpha Release

Screenshots