Some years ago I vaguely remember reading an article by some member or other of the Blognoscenti, making the stunningly obvious observation that web-pages can be indexed by URL, or that they might be accessed/reteived by embedding some utterly unique word which appears only on that page (and its replicas) and could/can be indexed by Google.
For some this was a stunning revelation, for me it was an underwhelming trick since I’ve been using much that technique to retreive webpages for years, since my memory – although flakey and lacking the ability to recall context well – has a remarkable aptitude for quotation; thus being able to remember the phrases “Westie“, “Odin“, “Horror-Horn“, and “Cross-Dress” is all that is necessary to retreive one of my all-time-favourite USENET postings.[1]
However: RunningLongWordsTogetherToCreateAToken is the very meat of Wikidom, and has thus become rather popular; this I support, though I am beginning to wonder whether the Blogspammers, noticing trends towards banning comment-submission containing URLs and the like, aren’t trying to sidestep this threat to their revenue by adopting Wiki-like tactics.
I can’t say for certain that he was describing an example of this, but Geoff recently posted:
This blog has been under blogspam attack for the last couple of days, and I haven’t been able to fix it. It seems from searching around that I’m not the only vicim (which is good). Curiously the attacks seem to be purely disruptive: the comments being injected don’t include commercial messages, or p0rn, or URLs to be pagerank-promoted. All the same, the cost/load/admin effort involved is significant.
…and I’ve likewise been noticing an upswing – for three of four months – in commentspam which is apparently clueless in that no Referrer: is set, no HTML/URLs are sent, and no fields in the POST submission exist other than the small amount of ASCII-text which comprises the comment body – ie: I’m not dropping anything on the floor which might contain a more subtle method of PageRank boostage.
Then, this evening whilst watching the rerun of ROME on BBC1, my spamtrap caught this blogspam:
name: Jonn
title: AirToolsStore
comment: Hi! How to me to adjust a background of page?
…and that’s all; it looks pointless until you feed “AirToolsStore” into google, and see how many instances of this very same blogspam have been accruing in the wild for a month or more.
Then it struck me that a WikiWord like AirToolsStore – or any other long keyword – is really a rather effective means of PageRank hacking; it slips under the radar just like any other keyword-oriented memes or brands (AYBABTU and Rocketboom, to name but two) – it’ll bypass 95% of all filters, it’s rather discreet and unobtrusive to the point that many people will assume that it’s the result of human cluelessness and leave it up, rather than actively purging it.
Except it’s not. Not clueless, nor aimless. The above was posted by a robot, using out of date / week-old page information that gave it away. It was premeditated.
So: here’s my theory: in the face of URL filtering and referrer-blockage, they comment spammers have realised that the one remaining hard-to-filter aspect of blogspace is the text of a comment itself, and are trying to turn the very mechanism of Google back on itself by means of this “keyword spamming” – perhaps “WikiSpamming“? – trusting their efforts to create a meme or brand that Google will eventually tie back to the website which paid for the advertising in the first place.
The worrying thing is that it doesn’t work terribly well, and relies on a bulk spamming shotgun approach to work at all. If you Google AirToolsStore the #1 link is not for the shop, but instead the top hits are the most popular blogs and pages which have been hit. That’s one level of indirection away. That’s almost failure. To get it to go higher will (presumably) require more “votes” and thus more spammage.
<…time passes, a little more research ensues…>
Sheesh: they’ve even hit a Wiki at GATech and seem to have left an e-mail alerter for changes though I am disinclined to dig into the validity of that e-mail address; it may just be bogus.
Whatever, to get the word onto a Wiki strikes me as much more likely to be a manual effort, whereas the spam I got today was definitely automated.
Something’s definitely up. Some spammer is being creative.
WikiSpam / KeywordSpam, here we come.
—
[1] Retreival of which is left as an exercise for the interested reader.
Leave a Reply