Hacker Newsnew | past | comments | ask | show | jobs | submit | lumirth's commentslogin

There was an image of a chart [^1] in the email I received announcing this. It is perhaps one of the worst charts I’ve seen in a while.

[1]: https://service.campaigndelivery.cn/resources/templateImages...


Thank you for this. I can’t emphasize enough how much of a difference this can make.

When I was in high school, I found the exact model of door knob on my childhood bedroom and found one that looked exactly the same, except with a lock. It was the kind of door knob that you could unlock with any long, thin piece of metal. The next couple years were a slow war as I stole every screwdriver, skewer, and dart in the house. Eventually, my parents got used to not being able to barge in. When I moved out, I left a pile of screwdrivers and darts on the kitchen table. They hadn’t barged into my room in months. Unsurprisingly, those last couple of months were also the safest and happiest I had ever felt there.

Moral of the story being: kids have a natural inclination to privacy, and will chafe at the lack of it. Trust is difficult to gain, easy to lose, and the trust of your kids is worth more than words can say.


I completely agree with you. Privacy is one of the hot topics when I was young because my parents examined my school bag from time to time.

Tangential to the post’s content (and the app is cool, by the way), but is anybody ever bothered by app/website landing pages that look like they’re AI-generated? I’m not even sure most of them are, but current AI models are so effective at creating those icon + feature grids that now every time I see one, my eyebrow raises so high it hits the ceiling.

In this case you are spot on, the landing page is AI generated, just because I've never enjoyed doing front-end web development myself. That plus this app is very deeply rooted in Apple's platforms (Essentially requiring an Apple Watch to be effective), the majority of users are coming from app store search directly, rather than through the landing page.

Weird little critique, on the front page of the website you have the following text:

> Claude Code for navigating codebases and getting up to speed fast. It's not magic - it's just the pragmatic choice right now.

This text, with all due respect, sounds so obviously AI-written that it’s painful. The “it’s not [thing] — it’s [other thing]” is a huge AI smell. If you’re talking about the pragmatic choice and “getting up to speed”, it would ring less hollow if the text on your website wasn’t written (or didn’t sound like it was written) by AI. If I’m going to your website, it’s because I want to hear from you, not Gemini, Claude, or ChatGPT.

That said, the blog post itself is an interesting reflection. Though, again, I’d appreciate more of the text being a reflection on your part and less of it just being a paste of the AI’s response.


yes, it was written with AI, I will have to refine that. I appreciate the critique. I'll keep working on my blog writing skill.

This sort of implicitly assumes that ad blockers are particularly common. Most normies I know aren’t using one, and are surprised by how pleasant and functional the web is when they’re at my pi-hole protected apartment. Anecdata, obviously, but am I wrong in my assessment?

Forgive me if this comment is silly: have you thought much about how this might be abused? Are you worried about what legal responsibility you may have, or who might abuse this? I’ve gotten the impression that running such a service is something of a landmine.

Hey, thanks for your comment and it's not silly at all. I've thought about this, but didn't see it as too much of a risk. It's just a small utility tool, not part of a big corporation or anything.

Like any file sharing tool, it can be abused, but to me personally it has value for sending ordinary files over. The primary goal is privacy :)

Hope this answers it for you


obligatory reminder that mcdonald’s was, in fact, found to be at fault in that case. they were genuinely serving coffee at 3rd-degree-burn-level temperatures. the person who spilled the coffee was an elderly woman and spent 8 days in the hospital, needing skin grafts. it has since been used as an example of frivolous litigation, which i think is a little perverse, given the details of the case.

link: https://en.wikipedia.org/wiki/Liebeck_v._McDonald's_Restaura...


Is the problem with scraping the bandwidth usage or the stealing of the content? The point here doesn’t seem to be obfuscation from direct LLM inference (I mean, I use Shottr on my MacBook to immediately OCR my screenshots) but rather stopping you from ending up in the dataset.

Is there a reason you believe getting filtered out is only a “maybe?” Not getting filtered out would seem to me to imply that LLM training can naturally extract meaning from obfuscated tokens. If that’s the case, LLMs are more impressive than I thought.


The comet browser is different from scraping, though, no? Not that I’d ever use this, but the goal doesn’t seem to be “no AI can ever touch this” but rather “large scale training-data scrapers find useless garbage.”

I'd say it's a good PoC.

They want to have many users. So they are ok with using OCR for many users. And since they are sending the accessed content through their APIs, might as well send a copy of it to training.

In conclusion, it seems that mass OCR usage is within the scope of the AI companies.


That’s the correct text of the article, as far as I can tell. Though not the entirety of it. The author goes on to say that ChatGPT wasn’t able to parse out the underlying text.

Part of the reason it might be useful is not because “no AI can ever read it” (because I’m sure a pentesting-focused Claude Code could get past almost any similar obfuscation), but rather that the completely automated and dumb scrapers stealing your content for the training of the AI models can’t read it. For many systems, that’s more than enough.

That said, I recently completely tore apart my website and rebuilt it from the ground up because I wasn’t happy with how inaccessible it was. For many like me, sacrificing accessibility is not just a bad look, but plainly unacceptable.


I didn't use Claude Code. I just pasted it directly into the web interface and said "I can't read this, can you help?" and then I excerpted the result so you sighted folks didn't have to reread, you could just verify the content matched.

So basically this person has put up a big "fuck you" sign to people like me... while at the same time not protecting their content from actual AI (if this technique actually caught on it is trivial to reverse it in your data ingestion pipeline)


But it's "made with ♥" (the footer says so).

(He's broken mainstream browsers, too - ctrl+f doesn't work in the page.)

GPT 5.2 extracted the correct text, but it definitely struggled - 3m36s, and it had to write a script to do it, and it messed up some of the formatting. It actually found this thread, but rejected that as a solution in the CoT: "The search result gives a decoded excerpt, which seems correct, but I’d rather decode it myself using a font mapping."

I doubt it would be economic to decode unless significant numbers of people were doing this, but it is possible.


This is the point I was making downthread: no scraper will use 3m36s of frontier LLM time to get <100 KB of data. This is why his method would technically achieve what he asked for. Someone alluded to this further down the thread, but I wonder if one-to-one letter substitution specifically would still expose some extractable information to the LLM, even without decoding.

Yes, it's worse for screenreaders, I listed that next to other drawbacks which I acknowledged. I don't intend to apply this method anywhere else due to these drawbacks, because accessibility matters.

It's a proof of concept, and maybe a starting point for somebody else who wants to tackle this problem.

Can LLMs detect and decode the text? Yes, but I'd wager for the case that data cleaning doesn't happen to the extent that it decodes the text after scraping.


I didn’t think you did use Claude Code! I was just saying that with AI agents these days, even more thoroughly obfuscated text can probably be de-obfuscated without much effort.

I suppose I don’t know data ingestion that well. Is de-obfuscating really something they do? If I was maintaining such a pipeline and found the associated garbage data, I doubt I’d bother adding a step for the edge case of getting the right caesar cipher to make text coherent. Unless I was fine-tuning a model for a particular topic and a critical resource/expert obfuscated their content, I’d probably just drop it and move on.

That said, after watching my father struggle deeply with the complex computer usage his job requires when he developed cataracts, I don’t see any such method as tenable. The proverbial “fuck you” to the disabled folks who interact with one’s content is deeply unacceptable. Accessible web content should be mandatory in the same way ramps and handicap parking are—if not more-so. For that matter, it shouldn’t take seeing a loved one slowly and painfully lose their able body to give a shit about accessibility. Point being, you’re right to be pissed and I’m glad this post had a direct response from somebody with direct personal experience needing accessible content so quickly after it went up.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: