BREAKING: food bloggers are adding nightshade and other poisons to online recipe pages to defeat “cooked.wiki” and other advert/revenue-stripping robots

The poisons are being added to recipe ingredients in very small, white-on-white fonts that are invisible to the human eye but which are picked-up by recipe-scraping robots which include them in the resulting output, putting users at risk of killing themselves and their loved ones.

This is, of course, satire; but it’s really weird to me when skimming Mastodon, Threads and Twitter to see near-sequential posts which one moment are lauding cooked.wiki for stripping all the revenue-bearing crap out of online recipes:

This is great, just add “cooked.wiki/” to the beginning of the URL for a recipe and it removes all the stuff you don’t care about.

cite: various

…and the next are praising the anti-corporatist subversion of Nightshade to supposedly mess with AI-generated models, vigilante-stylee:

But whereas the Chicago team designed Glaze to be a defensive tool — and still recommends artists use it in addition to Nightshade to prevent an artist’s style from being imitated by AI models — Nightshade is designed to be “an offensive tool.” An AI model that ended up training on many images altered or “shaded” with Nightshade would likely erroneously categorize objects going forward for all users of that model, even in images that had not been shaded with Nightshade.

https://venturebeat.com/ai/nightshade-the-free-tool-that-poisons-ai-models-is-now-available-for-artists-to-use/

It’s weird to me that people who support the stripping of content revenue and control from food bloggers may also be the ones condemning the training of models on content which is otherwise entirely open to human “eyeball scraping” and “human learning.”

The impact of cooked.wiki on recipe pages is huge, for instance this (delicious) Whole Orange Cake from The Modern Nonna (TMN)

I’m presuming that cooked.wiki caches/deduplicates results locally for reasons of efficiency and popularity-metrics. Certainly they plan their offering to save those recipes permanently, thereby depriving the author of any future revenue from browser hits.

Speaking as an amateur cook I must admit that I found TMN’s page to be practically unusable on Mobile, which is why I loaded it through cooked.wiki — I literally could not find the ingredients section under all the adverts. TMN heavily promotes the recipe on sites like TikTok, but (wisely?) they refrain from sharing the recipe on that site instead driving viewers to TMN’s advertising-bearing website (“recipe on www.themodernnonna (link in bio)”) to get the information… thereby making advertising revenue. Doing this is not illegitimate even though it’s a huge pain to someone trying to load their blender whilst a toddler is demanding attention.

So: scraping food-blogger websites for greater accessibility and to build a permanent third-party cache of recipes is good, but scraping author/artist websites to build a third-party model of art is bad?

Or vice-versa? Or is it more complex than that?

Perhaps people need to think about what they really believe.

Postscript

Some commentators are submitting that “recipes have no copyright so this is all okay” – I am very aware that multiple judgements over time have declared that recipes are not copyrightable because they are a “process” or “a matter of fact”, and this is fine and it does impact the thrust of what I am saying.

The relevant point is: recipes may not be copyrightable but the manner in which they are expressed is very copyrightable, and what is being scraped here is the copyrighted expression of the recipe from a food blogger’s blog.

It may even be considered an artistic expression with much historical context at extraordinary length from the life story of the food blogger concerned… which is why we need the wiki tools in the first place.

Comments

11 responses to “BREAKING: food bloggers are adding nightshade and other poisons to online recipe pages to defeat “cooked.wiki” and other advert/revenue-stripping robots”

  1. @alecm but when browsing for an omelette recipe I do like to read all the pages about the author’s first encounter with eggs, their complicated relationship with their mother over eggs, the complete history of eggs, the foreign holiday with an egg, other uses for eggs, egg benefits, eggs with benefits, egg romance stories, professional photos of eggs … and right at the bottom tucked away in a small font whose colour matches the page background … the recipe

    1. Not to mention advertisements for different kinds of eggs that you could buy from Temu

  2. But does that make it right to strip their content and copy it to another site? Essentially to build a third party database of the valuable bits of the content… A bit like training an artificial intelligence large language model to replicate a painter’s style.

    1. Tony

      It’s an omelette recipe. They didn’t invent the omelette, they probably copied it from another site in the first place. The bit that is ‘their content’ is what is being stripped out.

      1. Tony, I think you need to go reread the upstream content

  3. @alecm I think i had reached paragraph 6 when i realized this isn’t about tomatoes ?

  4. @alecm something I find funny: some of the recipes extracted by cooked.wiki appear to be AI-generated. in particular, any of the recipes saved from “foodreli” are weirdly different than what’s on foodreli now. the ingredients lists are exactly the same, but the instructions are very different.example: cooked.wiki/saved/c5880088-e78a-4e79-b12f-0cff929f5efa

    I suspect foodreli noticed cooked.wiki and changed their pages so that the recipes are split into two pages, and re-generated them

  5. Does Cooked hamper food bloggers revenue? Yes.
    But.
    A half-baked (and maybe overthought) thought of mine (which might be as rambling as the average food blog] is:
    One reason Cooked hits differently than ChatGPT or Midjourney for the average net-dweller is they strip out the opposite content.
    Where the LLM’s are built to copy/amalgamate the style of [something], the Cooked wiki is built to find the (for lack of better words) facts of [something].

    Apart from cooking, the food bloggers put a lot of effort in their personal style, both in writing the instruction’s pre-amble and the food photos, but not in the recipes themselves (yes, the average foodies will often come up with somewhat personalized recipes, though maybe not really original or unique enough to be copyrightable.)
    Just like artists put a lot of effort in their personal style of painting or photography, but not in the items they portray (yes, the average artist will often come up with somewhat personalized concepts, though maybe not really original or unique enough to be copyrightable – there are a million “woman sitting and smiling” artworks, but only one or two Mona Lisa).

    Basically, ChatGPT would stripmine a food blog to reproduce the style of the pre-amble when prompted by a user, Midjourney would reproduce the style of the photos when so prompted by a user, while Cooked mines it to show the prompt – any artistic flourish will have to be provided by the user.

    And that’s the thing, I think: people don’t see it as taking someone’s artistic work, but just showing what they used for inspiration and the process used.
    The same group simply don’t like the artistic style of food blogging and would i.e. rather paint the evening view from van Gogh’s bedroom window themselves than study The Starry Night (or have an “AI” recreate it).

  6. André

    Okay, seems the comments doesn’t allow for line breaks, sorry about the wall of text.

  7. slackline

    I can see the parallels but there are also differences.

    Unlike with “AI” using data to train models with cooked.wiki…

    There are no new recipes being created and palmed off as new and unique content.
    The source of the content is clearly cited and available. Conscientious readers might click through to help turn the economic wheels.

    In a sense its akin to using an ad-blocker which cleans up pages and makes them considerably more readable (I use multiple layers of ad-blockers on my router and browser plugins, couldn’t see a single add when I viewed the original Whole Orange Cake recipe).

    1. In a sense you are making my argument for me: what an LLM does is statistically record the “gist” or sense of what the content pertaining to [topic] might look like, so it is the person who queries the database who extracts a fuzzy paragraph or two which is based upon that gist. The process of machine learning – the specific aspect which is under judicial review here – is one of gist extraction, not of rendering. Nothing new is being created, other than something which is arguably derivative understanding of the topic. The cooked wiki renders a web page which it has derived from the content. The LLM renders a stochastic utterance. One can argue about where the value proposition stops for the end user, but both of them are glorified scrapers.

Leave a Reply

Your email address will not be published. Required fields are marked *