Make GMail’s Spam Filter Work For You
If you run a *nix box and know a little bit about Procmail and Spamassassin, this might be of interest to you.
Like practically everyone on the planet, I get a lot of spam. I’ve owned my personal domain for about 10 years, so the spammers have had plenty of time to put me on probably every spam list in existence.
Spamassassin catches about 99% of my spam, but since I get anywhere from 5,000 to 7,500 spams a day, about 50-75 spams get through to my personal account. The first thing I do every morning is sift through all that spam so I can read the 1-2 legitimate emails. It is a less-than-optimal start to my day.
I’ve had a GMail account almost since they launched, and it too gets a lot of spam. But the froody thing about GMail is that it catches close to 100% of the spam I get, and I can’t remember the last time it had a false positive. To put this in perspective, GMail catches over 500 spams a day for me, and misses about 1 a month. It’s the best spam filter I’ve seen yet, and it’s free. Wouldn’t it be nice if I could use GMail’s anti-spam system for my personal email?
God I live Unix.
So I came up with a way to use a combination of procmail and GMail’s ability to filter email, which led to another really cool discovery which I will also share. Here’s how I did it.
Assume your email address is you@example.com, and your gmail account is you@gmail.com.
1: You have 2 options here.
a) If you want a dedicated GMail account to do nothing but filter, get a GMail invite and create a new GMail account, go into Settings, select the “Forwarding and POP/IMAP” tab, and under Forwarding select “Forward a copy of incoming mail to _you@example.com_ and [keep GMail’s copy in the inbox.]” Why would you want a dedicated account? All will be revealed…
b) If you want to use an existing GMail account, and still use that account for non-filtering purposes, create a filter in Settings|Filters to forward all email bound for you@example.com to you@example.com. (this will make sense very shortly.) Have the filter apply a label of
your choosing, and tell it to skip the inbox and automatically archive it. This keeps your GMail inbox tidy.
2: Create the following Procmail entry on your *nix account:
:0
* !^X-Forwarded-For: you@gmail.com.*
! you@gmail.com
This script says to forward all incoming email to your GMail account unless it has the ‘X-Forwarded-For: you@gmail.com you@example.com‘ header. The “.*” at the end of the X-Forwarded-For line tells procmail to match any destination email address. This is handy if you get email addressed to multiple domains or accounts. GMail will add this header when it washes through your filter.
Here’s what makes this system work: Any spam that washes through your GMail account will bypass the filter and will be automatically trapped by the GMail spam filter. No action necessary on your part.
So essentially your flow of email is now:
Sender -> you@example.com ->
you@gmail.com (spam filter CRUSH!!!!) ->
you@example.com (ahhhh…pure refreshing de-spammed email)
Here’s the extra special bonus: Because you told GMail to archive the copy, you now have an offline backup of all of your personal email in your GMail account. Now us died-in-the-wool mutt users have an “undelete” function too, with the added bonus of when our workstation takes a whoopsie our email is still backed up. This is why I chose to have a dedicated GMail account to do nothing but filter for me. I can use all of the disk space GMail provides to my account to backup my personal email, and it’s all right there in one place. (NOTE: Google makes absolutely no data integrity promises, and neither do I. If your email is important enough that you need real backups, then you should find an alternate and reliable backup solution.)
I implemented this system last night, and for the first time since I can’t even remember when, I woke up to ZERO spam. I haven’t cried as much since the last time I watched Old Yeller. The original one, not the super crappy remake. But I digress.
One thing that’s occurred to me, but which I haven’t implemented yet, is that I could configure mutt to automatically Bcc me on all outgoing email. This would have the advantage of sending all outgoing mail to GMail for archving purposes. A simple procmail script would either
delete all Bcc’d email, or send it to another folder so I wouldn’t see the dups in my inbox.
EDIT: Actually, to get a copy of all outgoing email into your GMail account is even simpler than this. Just set the Bcc field in your mailer to your GMail address. It’ll automatically append to the relevant conversation and you won’t have to muck around with Procmail. In mutt, use the “my_hdr”command in your .muttrc.
Enjoy, and feel free to share with anyone you think might benefit. And if you come up with alternate/better ways to implement, please share.
January 2nd, 2008 at 2:46 pm
Any reason you don’t just use Google Apps where you can have have your own domain for free and reap all the benefits of a webbased client?
January 2nd, 2008 at 3:25 pm
GMail is the best web based mail client out there, IMHO. However, as good as it is, I’m not enamored of web based applications as they stand now. Part of it is that they’re still somewhat slow, and part of it is that I am an old fart and still cling to my command line ways. I’ve used pine and more recently mutt for close to 20 years. mutt does what I need it to do, and because it’s completely text based and written in C, it’s as fast as you can type. On the occasion that I do need to see an email written in HTML, I can just bounce it to my GMail account and view it there. Actually, now that I’m filtering through GMail I just need to look at GMail’s copy. So I really have the best of both worlds now.
But even if you’re not a mutt/pine/elm user, this method has other applications. Some businesses block common web-based email sites such as AOL, GMail, Yahoo, etc, on the (mostly misguided) notion that checking your personal email lowers productivity or exposes your work computer to viruses, etc. This method would work perfectly well using a free web based client such as Squirrelmail on a remote *nix box. Of course, the repercussions of doing an end-run around corporate security are left as an exercise to the reader, but it is possible.
Basically, this method allows me to keep working the way I have always worked, without having to resort to something else just because of a single problem. So far it seems to have worked out pretty good.
January 2nd, 2008 at 4:03 pm
I was surprised to read your opinion about the near 100% success rate of GMail at catching spam, and lack of false positives.
My experience has been diametrically opposite: while I have been lucky enough to receive absolutely no spam, GMail ha marked a significant number of my legitimate e-mails as spam.
When I first realised what was happening I searched in vain for an option to disable spam filtering. I then wrote to the GMail team, indicating my frustration.
The problem is that I use GMail as an account, but check mail with a desktop client (Thunderbird), so I’ll either have to occasionally check to see if any mail has been filtered via the web interface, or change e-mail service.
January 2nd, 2008 at 6:12 pm
i’m with Joe. quite a few false positives. and no way to create a “whitelist” that i can find other than adding contacts to your gmail contact list…
been doing this for a while and I just go once a week and scan my spam folder on gmail (via the web interface). mark stuff as “not spam” and it shows up on my side the next time i pop.
the main problem is lack of a white list! i haven’t looked into it yet, but i’m hoping to be able to programmatically add people to my gmail contact list, that might help…
January 3rd, 2008 at 12:56 am
I’ve been using gmail as a spam filter for several years now nearly exactly like what you have been doing. Except I use their webmail interface all the time. Not because I love it but because I love being able to have identical mailboxes on my laptop top and my desktop. Now that gmail supports imap, I may go to Thunderbird but so far, no reason to.
As for false positives, yes, there were some but I’ve trained gmail out of that. Every now and then I find one and mark it as not-spam and gmail seems to learn from that.
Another hidden gem of gmail is the ability to configure it to send mail as your own domain (after you prove it’s your email address by receiving a verification email).
I have been recommending that all my users get a gmail acct and pop their mail through gmail. Not only do they get the extra spam filtering, they get a web mail interface they can use on the road, and their huge mailboxes don’t accumulate on my server.
My setup is sendmail with a bunch of RBLs + some milters (including a spamassassin milter) then gmail. I still see the odd spam in gmail, for me it’s about one a day.
One problem I have with this setup is that with people sending larger and large attachments in mail these days, my gmail box slowly fills up slightly faster than their ever-increasing generous space allotment.