https://twitter.com/search/realtime?q=email+at+%22dot+com%22
Anybody want to whack a parser on top of this?
Solution: get a better spam filter. Evolve.
by Alec Muffett
https://twitter.com/search/realtime?q=email+at+%22dot+com%22
Anybody want to whack a parser on top of this?
Solution: get a better spam filter. Evolve.
…and for the Brits:
https://twitter.com/search?q=email%20at%20%22dot%20co%20dot%20uk%22
Weirdly, I think some of the “at foo dot com” has been preserved, not only by people wanting to slightly obfuscate against spam-spiders, as a hedge against “helpful” systems that automatically obfuscate anything that matches an email regexp to… obfuscate against spam-spiders.
(e.g. I’ve seen pages on the net that render as “drop me an email at username@[redacted]”.)
True – I have seen that, too; but I think it’s rarer.
while (<>) {
chomp;
$input = $_;
tr!A-Z[](){}<>!a-z !;
s!\@! at !go;
s!\b! !go;
s!\.! dot !go;
@rwords = reverse(split(" "));
if ("@rwords" =~
m!((com|uk)\s(dot\s[-\w]+\s)+at\s([-\w]+\sdot\s)*[-\w]+)!oi) {
print "email:\t";
foreach $word (reverse(split(" ", $1))) {
if ($word eq 'dot') { print '.' }
elsif ($word eq 'at') { print '@' }
else { print $word; }
}
print "\n";
}
}
$ perl text-mail-parse.pl
alec [dot] muffett{@}GMAIL dot Com
email: alec.muffett@gmail.com
firstname DOT lastname {AT} sitename co [dot] UK
email: firstname.lastname@sitename.co.uk
no it’s not perfect, yes it could be improved, but i’ll bet it matches 80% of the twitter search content correctly…
@fanf – aka: dot at dotat dot at – is excused. 🙂
first dog last cat domain dog com
Leave a Reply