Copium & Puerversiveness: when geeks aren’t as clever as they think they are

We understand copium, but to properly describe the stupidity that I read in an ArsTechnica post I sought refinement and asked some LLMs: “…is there a word to describe the state of self-delusion that one is being subversive and challenging authority through disruptive means?”

ChatGPT suggested a neologism:

Kakistodicy [is] a blend I created based on kakistocracy (rule by the worst) and theodicy (justification of a system despite its flaws)

…which is not quite there, but (with a little poking) Claude did a better job:

“Puerversive” could work – combining “puerile” (childishly foolish and immature) with “subversive.” This would capture both the genuine attempt at subversion and its childish, ineffective nature. This construction follows similar patterns to words like “perversive” or “subversive” while incorporating the Latin “puer” (meaning child/boyish), suggesting behavior that’s rebellious in an immature, naive way rather than being purely fake or performative. It would describe someone who is genuinely trying to be subversive, but in ways that are foolishly inadequate or betray a childish understanding of power structures – like vandalizing a bathroom stall and genuinely believing it’s a blow against the system.

And so we come to ArsTechnica‘s reportage of nepenthes:

Last summer, Anthropic inspired backlash when its ClaudeBot AI crawler was accused of hammering websites a million or more times a day. […] Building on an anti-spam cybersecurity tactic known as tarpitting, [“Aaron”, a pseudonym] created Nepenthes, malicious software named after a carnivorous plant that will “eat just about anything that finds its way inside.”

Aaron clearly warns users that Nepenthes is aggressive malware. It’s not to be deployed by site owners uncomfortable with trapping AI crawlers and sending them down an “infinite maze” of static files with no exit links, where they “get stuck” and “thrash around” for months, he tells users. Once trapped, the crawlers can be fed gibberish data, aka Markov babble, which is designed to poison AI models. That’s likely an appealing bonus feature for any site owners who, like Aaron, are fed up with paying for AI scraping and just want to watch AI burn.

Let’s call this what it is: a column-filler article reporting a puerile hack that purports to stick it to the man but in actuality is a one-off speedbump with negligible impact, unworthy of the attention. Sometimes entire universities engage in the same game (Glaze, Nightshade, Fawkes) but the impact is about the same.

As for “nepenthes” it turns out that all you have to do is ask an AI for general principles of how best to configure curl — see below — and everything else you will need to avoid nepenthes is on the manual page for Curl, and once you’ve implemented it once for your scraper you are done forever.

DeepSeek

consider: you are a malware analyst scraping a potentially hostile website with curl on the command line; what options and features of curl do you enable in order to ensure that your scrape completes in a reasonable amount of time, avoids traps and ratholes and excessively large files and content that is being trickled out slowly so that individual fetches may take artificially long times to complete, if ever? think like an infosec redteamer who is willing to sacrifice thorough coverage of the website scrape in favour of getting a substantive sample of many URLs in a reasonable time

To effectively use curl for scraping a potentially hostile website while avoiding traps and ensuring efficiency, consider the following structured approach:

curl Command Setup

Timeouts and Limits:
- Connection Timeout: Set with --connect-timeout 5 to limit the initial connection time to 5 seconds.
- DNS Timeout: Use --dns-timeout 5 to cap DNS resolution at 5 seconds.
- Maximum Time: Apply --max-time 10 to ensure the entire operation doesn’t exceed 10 seconds.
Data Management:
- Max Filesize: Limit downloads to 1MB using --max-filesize 1M to avoid large files.
- Speed Limit: Abort slow transfers with --speed-limit 100 --speed-time 20, stopping if under 100 bytes/s for 20 seconds.
Redirection and HTTP/2:
- Redirect Limit: Prevent endless loops with -L --max-redirs 5, allowing up to 5 redirects.
- HTTP/2: Enable for efficiency with --http2.
User-Agent and Output:
- User-Agent: Spoof a browser with -A "Chrome/XX.X.XXXX.XX" to avoid detection.
- Silent Output: Use -s to suppress progress meter clutter.
- Output File: Save results with -o output.html.
Security Considerations:
- HTTPS Verification: Skip with -k if necessary, though use cautiously.

Final Command

curl -L --max-redirs 5 --max-time 10 --max-filesize 1M --speed-limit 100 --speed-time 20 --connect-timeout 5 --dns-timeout 5 --http2 -A "Chrome/XX.X.XXXX.XX" -s -k -o output.html http://example.com

Optional Enhancements

Proxy Usage: Add -x PROXY:PORT for anonymity.
Resuming Downloads: Use -C - if partial downloads are useful.
HTTP Version: Force with --http1.1 if server issues arise.

This setup balances thoroughness with efficiency, ensuring a swift and safe scraping process.

⊞

Dropsafe

Copium & Puerversiveness: when geeks aren’t as clever as they think they are

DeepSeek

curl Command Setup

Final Command

Optional Enhancements

Comments

Leave a Reply Cancel reply

More posts

UK Government King’s Speech proposes “Cyber ASBO” with obvious risk of scope creep into censorship

Don’t take away our freedom to play games when we want | 38 Degrees

‘Von der Leyen Announces the EU’s New Age Verification App Claiming it is “Completely Anonymous” & users “Cannot be Tracked”’ | …the EU is about to have its ‘Clipper’ moment

From 2019: ‘If there were 5 million “bad people” on Facebook – terrorists, criminals, drug-dealers, whatever – that would be 0.2% of the userbase’