Saturday, August 11, 2007

Cheep V1agra 4 SAle!!1!!

Well, this blog is back, after a brief, um, let's say "technical outage".

No, that's too charitable. Let's say "poor-policy-induced outage". Or perhaps "ridiculously-brain-dead-and-just-plain-wacko-policy-induced outage".

This article will discuss what happened, and why Blogger's policy (and therefore Google's policy, since they own Blogger) is so terrible.

I've got an article coming up the new iMacs and related announcements, as well as an article from an HCI (Human-Computer Interaction) perspective on Copying and Pasting on the iPhone. The latter has been 90% finished for weeks (almost a month, actually, now that I look at the date), and I really hope to have it finished and up soon. Really I do.


So what happened to this blog?

I got an email on Tuesday from someone saying they couldn't reach my blog. More specifically, when they tried, they got a "404 Not Found" error, which is a webserver's way of telling you that the webpage you tried to access doesn't exist.

Well, since my blog should indeed exist, I went and took a look at it to see what was up. And, sure enough, I got a 404 error.

Hmm. Well, I hadn't changed anything recently (in fact, I hadn't even posted recently), so I knew it wasn't something that I had done. And, even if I had made any recent changes, I shouldn't have been able to muck things up in such a way that I would end up with a 404 error1. So, the only logical explanation was that something was wrong on Blogger's end of things.

I take a look at the Blogger blog, the Blogger status page, and the Blogger known issues page, and there are no listed outages.

Next, I decide to try to log in and see if things look normal from the Blogger Dashboard (the web interface that you use to manage your blogs).

I'm greeted with this:

Restore access to this blog

Restore access to this blog? That can't be good.

Clicking on the little question mark takes us to a help page with the following:
Why is my blog disabled?
If your blog is disabled, it will be listed on your Dashboard, but you will not be able to click on it to access it. If this is the case, there will be a grace period during which you can request that it be reviewed and recovered. The disabling is a result of our automated classification system marking it as spam. Because this system is automated there will necessarily be some false positives, though we're continually working on improving our algorithms to avoid these. If your blog is not a spam blog, then it was one of the false positives, and we apologize.
I see.

Clicking on the provided link takes us to a page with the following:
Blogger's spam-prevention robots have detected that your blog has characteristics of a spam blog. (What's a spam blog?) Since you're an actual person reading this, your blog is probably not a spam blog. Automated spam detection is inherently fuzzy, and we sincerely apologize for this false positive.
You won't be able to access your blog until one of our humans reviews it and verifies that it is not a spam blog. Please fill out the form below to get a review. We'll take a look at your blog and restore it in less than a business day.
If we don't hear from you within the next 20 days, your blog will be permanently deleted.

I submitted an appeal Wednesday at 7:48 PM, and the blog was reactivated Thursday at 6:29 PM. Which isn't that bad a turnaround, I suppose.


A brief aside: What's a Spam Blog?

Basically, it's a fake blog whose sole purpose in life is to promote the shady websites associated with spammers and scammers. They either copy text from other blogs (or some site like Wikipedia), or simply make up random gibberish. They then link to their own shady site, with the goal of increasing the number of pages that link to it. The motivation is that Google (and other search engines) assign greater relevance to sites that are linked to with greater frequency, since the (usually valid) assumption is that a site with more links to it is more popular, and therefore more significant.
Blogger's help page on Spam Blogs defines them as follows:
As with many powerful tools, blogging services can be both used and abused. The ease of creating and updating webpages with Blogger has made it particularly prone to a form of behavior known as link spamming. Blogs engaged in this behavior are called spam blogs, and can be recognized by their irrelevant, repetitive, or nonsensical text, along with a large number of links, usually all pointing to a single site.
OK, OK, I get it. Sure, my writing is irrelevant and/or nonsensical. But repetitive? I didn't think that it was repetitive. I mean, do I really repeat things over and over? Am I really redundantly repetitive, with continued superfluous uses of identical phrases?
Bart: You're right, Mom. I shouldn't let this bother me. I'm in television now. It's my job to be repetitive. My job. My job. Repetitiveness is my job. I am going to go out there tonight and give the best performance of my life.
Marge: The best performance of your life?
Bart: The best performance of my life!
Episode 1F11 Bart Gets Famous

Anyway, the short version is that Google's anti-spam analyzer flagged my blog as spam. In of in itself, this is actually kinda funny2, and I understand that these things happen. Algorithms make mistakes, particularly when it comes to doing something as subtle and subjective as analyzing the motivation behind a blog. It's the same as the way that everyone's had at least one real email message get marked as spam3.

But, what's not funny, and what really gets me, is the way in which this was handled by Google.

They never emailed me to inform me that my blog was flagged as spam.

Basically, if Google decides that your blog is spam, they disable it without telling you. You're guilty until proven innocent, since you have to petition to get it reinstated. And, if you don't file a petition within 20 days, they delete your blog. However, the only way you know that you need to file this petition is if you log in and look at your blog. They don't email you. They don't even post anything on the blog itself (they just give the aforementioned generic 404 error). They just disable it.

The obvious solution is to change the notification system so that if a blog gets flagged as spam, an email goes out to the blog's owner. This email tells you that you've been flagged, and gives you a link to appeal the decision. If you don't appeal within a few days (or say a week), then they can feel free to disable the blog, and the "20 days to deletion" policy can kick in. If you do appeal, your blog gets "recertified" without ever having been taken down.

I've sent an email to Blogger saying as much, and I'll let you know if I get any response. I somehow doubt I will.

I'm also thinking of moving off of Blogger to another blogging platform. I had chosen Blogger solely due to its connection to Google (a company that I respect), but this experience has left me a bit soured on the platform. If anyone has any thoughts about alternatives, I'd appreciate hearing them.


Apparently I'm not alone: Google mistakenly deleted one of its own corporate blogs under what appear to be identical circumstances.
Readers of Google Inc.'s Custom Search Blog were handed a bit of a surprise Tuesday when the Web site was temporarily removed from the blogosphere and hijacked by someone unaffiliated with the company.
The problem? Google had mistakenly identified its own blog as a spammer's site and handed it over to another person.
"Blogger's spam classifier misidentified the Custom Search Blog as spam," [a Google spokesperson] said via e-mail on Wednesday. Typically Google notifies blog owners when it has spotted content associated with spam on their Web sites to give them a chance to clear up any misunderstandings.
However, that didn't work out in this case. "The Custom Search Blog bloggers overlooked their notification, and after a period of time passed, the blog was disabled."
When blogs are disabled like this, their URL becomes available to the general public. That's when Srikanth [the person who hijacked it] swooped in and wrote the joke post.
Here's another article about it, which includes a different official Google statement:
Whoops! We accidentally classified ourselves as spam, and our ever-perceptive Blogger settings caught us. The Custom Search Blog has since been restored, and we’re taking steps to ensure this doesn’t happen with other Google blogs in the future. Other Blogger users can make sure this doesn’t happen to them by reporting any problems to the Blogger support team via the Blogger Help Center at We can then investigate.
I honestly don't know what they mean here. "Other Blogger users can make sure this doesn’t happen to them by reporting any problems to the Blogger support team...". Was I supposed to send a message in advance asking them to not randomly delete my blog?

Furthermore, it appears that a flagged blog gets opened up for anyone to hijack (by allowing them to register a new blog with the now vacant name). This doesn't make any sense, and in fact makes matters even worse. I could (maybe) understand if that happened after the 20 days have elapsed, but in this case that didn't seem to happen -- the blog was hijacked immediately after being flagged as spam4. Now, I didn't try to hijack my own blog by starting a new one with the same name from a different account (it hadn't occurred to me to try), but I would be curious if that would have worked.
And lastly, as I already mentioned, I didn't receive any notification by email (or by any other means). I checked my spam folders to be sure (although I wouldn't have thought that Gmail would classify an urgent Blogger message as spam), and there was nothing. This flatly contradicts the official statements in the aforementioned articles. So, either the spokesperson is bullshitting, or there was some bug that impacted a bunch5 of blogs. I'm guessing it's the latter, and a bug in Google's anti-spam system caused a bunch of non-spam blogs to get flagged and deleted immediately.

The fact that my blog's deletion seems to stem from a bug, rather than from a brain-dead policy, makes things a bit better, but I'm still not too impressed by the situation. I suppose that I can't complain too heavily about a free service not working correctly (that whole "gift horse" thing), but still, I remain extraordinarily unimpressed.


Cheep V1agra 4 Sale!!1!! LOw lOW pr1ces, but only 4 a limmited t1ime!!1! Apt now! Confident@lity insured!!

  1. Sure, I could do stuff like screw up the formatting, or deleting articles, but nothing short of actually deleting the blog should have yielded this kind of error message. 
  2. Jonathan's writing is indistinguishable from spammer gibberish! [insert Nelson-esque ha ha
  3. And the same way that someone I know with the last name of "Dick" kept ending up with his emails marked as porn spam. 
  4. I can't imagine that the blog would have been giving a 404 error for almost three weeks without anyone noticing. 
  5. Where "bunch" >= 2.