Hacker Newsnew | past | comments | ask | show | jobs | submit | shawnz's commentslogin

I don't agree that this is end to end encrypted. For example, a compromise of the TEE would mean your data is exposed. In a truly end to end encrypted system, I wouldn't expect a server side compromise to be able to expose my data.

This is similar to the weasely language Google is now using with the Magic Cue feature ever since Android 16 QPR 1. When it launched, it was local only -- now it's local and in the cloud "with attestation". I don't like this trend and I don't think I'll be using such products


I agree it is more like e2teee, but I think there is really no alternative beyond TEE + anonymization. Privacy people want it locally, but it is 5 to 10 years away (or never, if the current economics works, there is no need to reverse the trend).

There's FHE, but that's probably an even more difficult technical challenge than doing everything locally

FHE would be ideal. Relevant conversation from 6 months ago:

https://news.ycombinator.com/item?id=44601023


> ... 5 to 10 years away (or never, if the current economics works...

Think PCs in 5y to 10y that can run SoTA multi-modal LLMs (cf Mac Pro) will cost as much as cars do, and I reckon folks will buy it.


ISTM that most people would rather give away their privacy than pay even a single cent for most things.

if (big if) you trust the execution environment, which is apparently auditable, and if (big if) you trust the TEE merkle hash used to sign the response is computer based on the TEE as claimed (and not a malicious actor spoofing a TEE that lives within an evil environment) and also if you trust the inference engine (vllm / sglanf, what have you) then I guess you can be confident the system is private.

Lots of ifs there, though. I do trust Moxie in terms of execution though. Doesn’t seem like the type of person to take half measures.


"Server-side" is a bit of a misnomer here.

Sure, for e.g. E2E email, the expectation is that all the computation occurs on the client, and the server is a dumb store of opaque encrypted stuff.

In a traditional E2E chat app, on the other hand, you've still got a backend service acting as a dumb pipe, that shouldn't have the keys to decrypt traffic flowing through it; but you've also got multiple clients — not just your own that share your keybag, but the clients of other users you're communicating with. "E2E" in the context of a chat app, means "messages are encrypted within your client; messages can then only be decrypted within the destination client(s) [i.e. the client(s) of the user(s) in the message thread with you.]"

"E2E AI chat" would be E2E chat, with an LLM. The LLM is the other user in the chat thread with you; and this other user has its own distinct set of devices that it must interact through (because those devices are within the security boundary of its inference infrastructure.) So messages must decrypt on the LLM's side for it to read and reply to, just as they must decrypt on another human user's side for them to read and reply to. The LLM isn't the backend here; the chat servers acting as a "pipe" are the backend, while the LLM is on the same level of the network diagram as the user is.

Let's consider the trivial version of an "E2E AI chat" design, where you physically control and possess the inference infrastructure. The LLM infra is e.g. your home workstation with some beefy GPUs in it. In this version, you can just run Signal on the same workstation, and connect it to the locally-running inference model as an MCP server. Then all your other devices gain the ability to "E2E AI chat" with the agent that resides in your workstation.

The design question, being addressed by Moxie here, is what happens in the non-trivial case, when you aren't in physical possession of any inference infrastructure.

Which is obviously the applicable case to solve for most people, 100% of the time, since most people don't own and won't ever own fancy GPU workstations.

But, perhaps more interesting for us tech-heads that do consider buying such hardware, and would like to solve problems by designing architectures that make use of it... the same design question still pertains, at least somewhat, even when you do "own" the infra; just as long as you aren't in 100% continuous physical possession of it.

You would still want attestation (and whatever else is required here) even for an agent installed on your home workstation, so long as you're planning to ever communicate with it through your little chat gateway when you're not at home. (Which, I mean... why else would you bother with setting up an "E2E AI chat" in the first place, if not to be able to do that?)

Consider: your local flavor of state spooks could wait for you to leave your house; slip in and install a rootkit that directly reads from the inference backend's memory; and then disappear into the night before you get home. And, no matter how highly you presume your abilities to detect that your home has been intruded into / your computer has been modified / etc once you have physical access to those things again... you'd still want to be able to detect a compromise of your machine even before you get home, so that you'll know to avoid speaking to your agent (and thereby the nearby wiretap van) until then.


Just like your mobile device is one end of the end-to-end encryption, the TEE is the other end. If properly implemented, the TEE would measure all software and ensure that there are no side channels that the sensitive data could be read from.

By that logic SSL/TLS is also end-to-end encryption, except it isn't

When the server is the final recipient of a message sent over TLS, then yes, that is end-to-end encryption (for instance if a load balancer is not decrypting traffic in the middle). If the message's final recipient is a third party, then you are correct, an additional layer of encryption would be necessary. The TEE is the execution environment that needs access to the decrypted data to process the AI operations, therefore it is one end of the end-to-end encryption.

This interpretation basically waters down the meaning of end-to-end encryption to the point of uselessness. You may as well just say "encryption".

E2EE is usually applied in contexts where the message's final recipient is NOT the server on the other end of a TLS connection, so yes, this scenario is a stretch. The point is that in the context of an AI chat app, you have to decide on the boundary that you draw around the server components that are processing the request and necessarily need access to decrypted data, and call that one "end" of the connection.

No need to make up hypotheticals. The server isn't the final destination for your LLM requests. The reply needs to come back to you.

If Bob and Alice are in an E2EE chat Bob and Alice are the ends. Even if Bob asks Alice a question and she replies back to Bob, Alice is still an end.

Similarly with AI. The AI is one of the ends of the conversation.


Another fun application of combining LLMs with arithmetic coding is steganography. Here's a project I worked on a while back which effectively uses the opposite technique of what's being done here, to construct a steganographic transformation: https://github.com/shawnz/textcoder

Cool! It creates very plausible encodings.

> The Llama tokenizer used in this project sometimes permits multiple possible tokenizations for a given string.

Not having tokens be a prefix code is thoroughly unfortunate. Do the Llama team consider it a bug? I don't see how to rectify the situation without a full retrain, sadly.


I can't imagine they consider it a bug, it is a common and beneficial property of essentially every LLM today. You want to be able to represent common words with single tokens for efficiency, but at the same time you still need to be able to represent prefixes of those words in the cases where they occur separately

I find this surprising, but I suppose it must be more efficient overall.

Presumably parsing text into tokens is done in some deterministic way. If it is done by greedily taking the longest-matching prefix that is a token, then when generating text it should be possible to "enrich" tokens that are prefixes of other tokens with additional constraints to force a unique parse: E.g., if "e" is a token but "en" is too, then after generating "e" you must never generate a token that begins with "n". A text generated this way can be deterministically parsed by the greedy parser.

Alternatively, it would suffice to restrict to a subset of tokens that are a prefix code. This would be simpler, but with lower coding efficiency.


Regarding the first part: that's an interesting idea, although I worry it would bias the outputs in an unrealistic way. Then again, maybe it would only impact scenarios that would have otherwise been unparsable anyway?

Regarding the second part: you'd effectively just be limiting yourself to single character tokens in that case which would drastically impact the LLM's output quality


The first approach would only affect outputs that would have been otherwise unparseable.

The second approach works with any subset of tokens that form a prefix code -- you effectively set the probability of all tokens outside this subset to zero (and rescale the remaining probabilities if necessary). In practice you would want to choose a large subset, which means you almost certainly want to avoid choosing any single-character tokens, since they can't coexist with tokens beginning with that character. (Choosing a largest-possible such subset sounds like an interesting subproblem to me.)


I don't think I see the vision here. If you want to maximize the number of tokens representable as a prefix code while still being able to output any sequence of characters, how could you possibly pick anything other than the one-character-long tokens?

Are you saying you'd intentionally make some output sequences impossible on the basis they're not likely enough to be worth violating the prefix code for? Surely there's enough common short words like "a", "the", etc that that would be impractical?

And even excluding the cases that are trivially impossible due to having short words as a prefix, surely even the longer words share prefixes commonly enough that you'd never get tokens longer than, say, two characters in the best case? Like, so many words start with "st" or "wh" or "re" or whatever, how could you possibly have a prefix code that captures all of them, or even the most common ones, without it being uselessly short?


> Surely there's enough common short words like "a", "the", etc that that would be impractical?

Tokens don't have to correspond to words. The 2-character tokens "a " and " a" will cover all practical uses of the lowercase word "a". Yes, this does make some strings unrepresentable, such as the single-character string "a", but provided you have tokens "ab", "ba", "ac", "ca", etc., all other strings can be represented. In practice you won't have all such tokens, but this doesn't materially worsen the output provided the substrings that you cannot represent are all low-probability.


Ah yeah, factoring in the whitespace might make this a bit more practical

I think it's plausible that different languages would prefer different tokenizations. For example in Spanish the plural of carro is carros, in Italian it's carro. Maybe the LLM would prefer carr+o in Italian and a single token in Spanish.

Certainly! What surprised me was that apparently LLMs are deliberately designed to enable multiple ways of encoding the same string as tokens. I just assumed this would lead to inefficiency, since I assumed that it would cause training to not know whether it should favour outputting, say, se|same or ses|ame after "open", and thus throw some weight on each. But provided there's a deterministic rule, like "always choose the longest matching token", this uncertainty goes away.

LLMs are probabilistic black boxes, trying to inject determinism in their natural language processing (as opposed to e.g. forcing a grammar for the output) may very well screw them over completely.

I don't think it is intending to frame the move as clueless, but rather short-sighted. It could very well be a good move for them in the short term.

One huge benefit of Tahoe for me is that you can now hide any menubar icon, even if they don't explicitly support hiding. It's a small thing but that alone makes the upgrade worth it for me

I used to think this, until I tried it. Now I see that it effectively removes all the tedium while still letting you have whatever level of creative control you want over the output.

Just imagine that instead of having to work off of an amorphous draft in your head, it really creates the draft right in front of you in actual code. You can still shape and craft and refine it just the same, but now you have tons more working memory free to use for the actually meaningful parts of the problem.

And, you're way less burdened by analysis paralysis. Instead of running in circles thinking about how you want to implement something, you can just try it both ways. There's no sunk cost of picking the wrong approach because it's practically instantaneous.


I’m getting the impression that developers vary substantially in what they consider tedium, or meaningful.

Sure, and that goes even for myself. Like for example, on some projects maybe I'll be more interested in exploring a particular architectural choice than actually focusing on the details of the feature. It ultimately doesn't matter, the point is that you can choose where to spend your attention, instead of being forced to always go through all the motions even for things that are just irrelevant boilerplate

Shockingly, software developers are people, and are as varied as people are elsewhere. Particularly since it became (relatively) mainstream.

Keep reading:

> Pieper emphasized that current over-the-counter NAD+-precursors have been shown in animal models to raise cellular NAD+ to dangerously high levels that promote cancer. The pharmacological approach in this study, however, uses a pharmacologic agent (P7C3-A20) that enables cells to maintain their proper balance of NAD+ under conditions of otherwise overwhelming stress, without elevating NAD+ to supraphysiologic levels.


Follow the citation: https://www.nist.gov/pml/time-and-frequency-division/how-utc...

> ... in English the abbreviation for coordinated universal time would be CUT, while in French the abbreviation for "temps universel coordonné" would be TUC. To avoid appearing to favor any particular language, the abbreviation UTC was selected.


Here are some of the things that make Firefox the best browser for me:

- An extension system more powerful than Chrome's, which supports for example rich adblockers that can block ads on Youtube. Also, it works on mobile, too

- Many sophisticated productivity, privacy, and tab management features such as vertical tabs, tab groups, container tabs, split tabs, etc. And now it also has easy-to-use profiles and PWA support just like Chrome

- A sync system which is ALWAYS end-to-end encrypted, and doesn't leak your browsing data or saved credentials if you configure it wrong, like Google's does, and it of course works on mobile too

- And yes, LLM-assisted summarization, translation, tab grouping, etc, most of which works entirely offline with local LLMs and no cloud interation, although there are some cloud enabled features as well


When/where was the PWA support added? I tried to test that this week and their docs say to use a third-party extension.


They're calling it taskbar tabs and it's behind a feature flag in nightly currently: https://windowsreport.com/firefox-is-bringing-web-apps-to-wi...


Thanks


My favourite feature is userChrome. The default chrome sucks in both Chrome and Firefox, but at least Firefox allows me to customize it to my liking without forking the entire browser.

On the flip side, changing keybinds in Firefox requires forking, but the defaults aren't too bad.


It's not necessarily performative research just because a pop science author wrote a catchy, exaggerated headline about it


I think finding an upper bound is basically just as difficult as finding the actual value itself, since both would require proving that all of the programs which run longer than that will run forever. That's why we can say BB(x) grows faster than any computable function. Being able to compute BB(x) algorithmically or any faster growing function would let you solve the halting problem


Sure, but I only asked about the single case x=6.


If you want an unproven-but-almost-certainly-correct upper bound on BB(6), consider BB(12).


Not sure if this is a joke, but actually that is guaranteed to be true. It is proven that for all n: BB(n+1) >= BB(n) + 3. But it is not proven that BB(n+1) >= BB(n) + 4, haha.


The point stands: the hard part is proving that all the programs with longer runtime than your upper bound will never terminate, and once you've solved that, getting the exact value is just a little extra work


For arbitrary n, that proof is arbitrarily hard, even undecidable for large enough n. Again though, for the specific case n=6, that difficulty has not yet been demonstrated, especially if you're willing to accept probabilistic arguments instead of rigorous proofs. n-by-n checkers is PSPACE-complete but the specific case n=8 that people actually play, has been completely solved using computers.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: