unixronin | Dec. 14th, 2006

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

"So," I thought yesterday, "it's really time I updated Dspam, I should do it today, it'll only take me an hour or two." I was running v3.2.0. The current version is v3.6.8.

It didn't take "an hour or two". It took twelve. About eight of those were spent just getting dspam itself working. I tried, tried and tried again to make it work in the recommended mode, with dspam running as a re-injecting content-filtering daemon. Postfix receives mail on port 25, does all of its normal filtering, relays it to dspam via a Unix-domain socket, dspam does its thing, reinjects it to Postfix on a different port, Postfix delivers it. But try as I might, it wouldn't work. The dspam daemon wouldn't accept connections to its socket. Or it'd segfault for no apparent reason. Or it'd accept a message, then pass it on unfiltered because it coulldn't classify it either as spam or as not spam. (I'm not certain whether it thought the message was in a superimposed quantum state of being both spam and nonspam simultaneously, or whether it thought it was in some kind of existential Zen state in which it was neither spam nor non-spam, but something purer, above such worldly distinctions. Either way, it wasn't dealing with it well.) Recompiling and reconfiguring with maximal possible debugging enabled (and I do mean all possible debugging output) provided precisely no useful information about the problem whatsoever. Neither did the advice I tried at one point of turning on the virtual-users feature. (I did discover that the notifications feature is horribly broken; if you enable it, 'make install' never bothers to install the notification templates, and then when dspam tries to open one and fails because it isn't there, it dies.)

Finally I said "Fuck this shit" and set it up as an inline content filter the way I had 3.2.0 running. Postfix receives mail, does its filtering, pipes it to a script which pipes it through dspam and then to Postfix's sendmail local-delivery agent for delivery if it's non-local, or directly to sendmail for delivery if it's local and can safely be assumed to be non-spam. Set up like that, it actually worked.

Then I spent another four hours debugging and cleaning up code in the web UI (which is the easiest way to monitor the state of your spam quarantine, fix false positives, check performance stats, and relearn misidentifications) to make IT work; I remember being appalled last time by how buggy the web code was, and this time around is no different. The major problem this time around was that the UI kept failing because if "was unable to determine my identity". Which is to say, it previously knew my identity, but had forgotten it. ...Well, OK, it hadn't really forgotten my identity so much as actively thrown it away. It took me quite a bit of time and frustration to figure out exactly where that was happening.

A few tweaks to the quarantine code ... as shipped, there's a "Delete everything checked" and a "Deliver everything checked, they're not spam" button right next to each other, both above and below the quarantine list, as well as "Delete all" buttons. The "delete checked" and "delete all" carry a popup "Are you sure?" onclick() for confirmation. The "Deliver checked" doesn't. This is bad design. I modified this the same way I modified the previous version: "Deliver checked" and "Delete all" at the top, "Delete checked" and "Delete all" at the bottom, confirmation warning on the top buttons, and by personal preference, none on the bottom buttons. (I could conceivably, through massive clumsiness, click the wrong top button. I'm not going to accidentally click one of the buttons below the quarantine list when I meant to click one of the buttons above.) And again, by personal preference, a 60-second META HTTP-REFRESH on the Performance page (which also means the header tab of the Quarantine page gets updated every 60 seconds, so I have an up-to-date-within-60-seconds count of messages in my quarantine visible from the Performance page at all times).

An hour or so spent retraining the new version with a 6,000-message ham/spam corpus, and thus far it's managed 93.75% accuracy on the first 26 messages delivered after checkpointing my stats post-training. Complete retraining from scratch was required because the data storage format has changed. Let's see how this goes.

So far, the major improvement I see in the dspam CGI is the history screen, which now shows up to eight pages of recently-received messages, 100 messages per page, and on which you can now mark all incorrectly-identified messages (either identified as innocent or as spam) and click a single button to retrain them all correctly. That's a nice feature. The auto-whitelist algorithm has been improved, too; a single misclassification will no longer kick a sender out of auto-whitelist status. This should make the whitelist work a lot better.

In a few days, the top end of the corpus should have pushed down off the recent-history list, and I'll be able to see better how well the new whitelisting works and how well the new version is doing at filtering. Anyway, for now it's all back up and working.

But would I have updated it if I'd known in advance what a ghastly, awful battle it was going to be? Or would I have tried a different spam-filtering solution (maybe CRM114)? I don't know. I switched to Dspam from SpamAssassin because SpamAssassin just wasn't working for me; I had to be constantly tweaking the rulesets by hand to exceed even 90% accuracy, while Dspam -- left all to itself -- exceeds 99% accuracy in a pretty short time.

(Update: In the time since I quoted that performance figure above, three more messages have come in, all correctly identified, and overall accuracy has climbed to 94.444%.)

(Update 2: By 1300E it's at 96.491% on 55 messages since stats reset.)

So sometime in the last few days, a "pre-approved" platinum MasterCard application showed up in the mail. I just today got around to looking at it.

For the most part, the terms really aren't bad at all. Adjustable rate, of couse, 6.74% spread above prime rate for purchases (currently 14.99%), 0% introductory rate on purchases and balance transfers for the first 12 billing cycles, no annual fee, 2% minimum finance charge in any billing cycle in which a finance charge is due.

There's this one half-paragraph in the disclosures that bothers me, though:

"We have the right to change your APRs, fees and other terms at any time, for any reason including, but not limited to, any change in your credit history, credit obligations, account performance, use of your credit line with us or any credtor, or our financial return."

(Emphasis mine)

Now, correct me if I'm wrong, but that basically translates to "Once you sign the agreement, you're boned. We can do whatever we want, whenever we want, for whatever reason we feel like including plain naked greed or you like, actually using your card, and we can require you to supply the anal lube."

Most credit card solicitations we've received in the last few years I've thrown straight in the trash after a cursory glance at the rate disclosures. Is the above as egregious as it seems to me, or is that simply de rigeur for all credit solicitations these days?

Habemus plus vis computatoris quam Deus

Further ramblings of a Unix ronin

Profile

Links

December 2012

Navigation

Page Summary

Most Popular Tags

Expand Cut Tags

December 14th, 2006

Ungah

Whereas the party of the first part is, like, totally boned

Style Credit