Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Text is linear, whereas image is parallel. I mean when people often read they don't scan text from left to right (or different direction, depending on language), but rather read the text all at once or non-linearly. Like first lock on keywords and then read adjacent words to get meaning, often even skipping some filler sentences unconsciously.

Sequential reading of text is very inefficient.



LLMs don't "read" text sequentially, right?


The causal masking means future tokens don’t affect previous tokens embeddings as they evolve throughout the model, but all tokens a processed in parallel… so, yes and no. See this previous HN post (https://news.ycombinator.com/item?id=45644328) about how bidirectional encoders are similar to diffusion’s non-linear way of generating text. Vision transformers use bidirectional encoding b/c of the non-causal nature of image pixels.


Didn’t anthropic show that the models engage in a form of planning such that it is predicting a possible future subsequent tokens that then affects prediction of the next token: https://transformer-circuits.pub/2025/attribution-graphs/bio...


Sure, an LLM can start "preparing" for token N+4 at token N. But that doesn't change that the token N can't "see" N+1.

Causality is enforced in LLMs - past tokens can affect future tokens, but not the other way around.


If the attention is masked, then yes they do.


I absolutely don’t “read the text all at once” and do read “left to right”. Could be why I usually find that my reading speed is slower than most. Although I’ve never really had a hard time with comprehension or remembering details.


I remember doing speed reading courses back when I was young and a big part of it was learning to read a paragraph diagonally.

Its much, much faster. At first there's a loss of understanding of course but once you've practiced enough you will be much faster.


Sure, but when people listen to speech it is literally one word at a time. So while there might be some benefit to being able to read non-linearly, it's probably not a bottleneck.


I think you’re making a lot of assumptions about how people read.


He isn't, plenty of studies have been done on the topic. Eyes dart around a lot when reading.


People do skip words or scan for key phrases, but reading still happens in sequence. The brain depends on word order and syntax to make sense of text, so you cannot truly read it all at once. Skimming just means you sample parts of a linear structure, not that reading itself is non-linear. Eye-tracking studies confirm this sequential processing (check out the Rayner study in Psychological Bulletin if you are interested).


Thanks for the reference!

Reading is def not 100% linear, as I find myself skipping ahead to see who is talking or what type of sentence I am reading (question, exclamation, statement).

There is an interesting discussion down thread about ADHD and sequential reading. As someone who has ADHD I may be biased by how my brain works. I definitely don't read strictly linearly, there is a lot of jumping around and assembling of text.


> Reading is def not 100% linear, as I find myself skipping ahead to see who is talking or what type of sentence I am reading (question, exclamation, statement).

My initial reaction was to say speak for yourself about what reading is or isn’t, and that text is written linearly, but the more I think about it, the more I think you have a very good point. I think I read mostly linear and don’t often look ahead for punctuation. But sentence punctuation changes both the meaning and presumed tone of words that preceded it, and it’s useful to know that while reading the words. Same goes for something like “, Barry said.” So meaning in written text is definitely not 100% linear, and that justifies reading in non-linear ways. This, I’m sure, is one reason that Spanish has the pre-sentence question mark “¿”. And I think there are some authors who try to put who’s talking in front most of the time, though I can’t name any off the top of my head.


You may very well skip ahead for context, etc, and that is fine, but that doesn't mean you are actually reading out of order. It's one thing to get distracted or interested in other parts of a sentence or paragraph and jump around. But ultimately, if you are actually gathering the meaning that was written, you have to consume the words linearly at some point. Perhaps with ADHD you just have to endure some distractions on the way to doing so.


That's not exactly correct. You can totally read whole sentences or paragraphs at once without having to piece individual words together.

I can give you an analogy that should hopefully help. If you look at a house, you don't look at the doors, windows, facade, roof individually, then ponder how they are related together to come to a conclusion that it is a house. You immediately know. This is similar with reading. It might require practice though (and a lot of reading!).


Your comparison makes no sense to me. Looking at an object and understanding what it is completely different than processing a sequential series of symbols that are designed to have meaning due to a linear order.


What people do you know that do this? I absolutely read in a linear fashion unless I'm deliberately skimming something to get the gist of it. Who can read the text "all at once"?!


I do this. I'm autistic and have ADHD so I'm not representative of the normal person. However, I don't think this is entirely uncommon.

The relevant technical term is "saccade"

> ADHD: Studies have shown a consistent reduction in ability to suppress unwanted saccades, suggesting an impaired functioning of areas like the dorsolateral prefrontal cortex.

> Autism: An elevated number of antisaccade errors has been consistently reported, which may be due to disturbances in frontal cortical areas.

https://eyewiki.org/Saccade

Also see https://en.wikipedia.org/wiki/Eye_movement_in_reading


I do this too. I suspect it may involve a subtly different mechanism from the saccade itself though? If the saccade is the behavior, and per the eyewiki link skimming is a voluntary type of saccade, there’s still the question of what leads me to use that behavior when I read (and others to read more linearly). Although you could certainly watch my eyes “saccade” around as I move nonlinearly through a passage, I’m not sure it’s out of a lack of control.

Rather, I feel like I absorb written meaning in units closer to paragraphs than to words or sentences. I’d describe my rapid up-and-down, back-and-forth eye motions as something closer to going back to soak up more, if that makes sense. To reinterpret it in the context of what came after it. The analogy that comes to mind is to a Progressive JPEG getting crisper as more loads.

That eyewiki entry was really cool. Among the unexpectedly interesting bits:

> The initiation of a saccade takes about 200 milliseconds[4]. Saccades are said to be ballistic because the movements are predetermined at initiation, and the saccade generating system cannot respond to subsequent changes in the position of the target after saccade initiation[4].


If you're an adult you probably have compensated for the saccades and developed a strategy that doesn't force you to read linearly. This is much of what "speed reading" courses try to do intentionally.


also ping pong around the page (ADHD'r). At times I read a sentance or two in linear fashion, then start jumping, or start or move to the end and read backwards, or any mix of this depending.


I don't know how common it is, but I tend to read novels in a buttered heterogeneous multithreading mode - image and logical and emotional readings all go at each their own paces, rather than a singular OCR engine feeding them all with 1D text

is that crazy? I'm not buying it is


That description feels relatable to me. Maybe buffered more than buttered, in my case ;)

It seems to me that would be a tick in the “pro” column for this idea of using pixels (or contours, a la JPEG) as the models’ fundamental stimulus to train against (as opposed to textual tokens). Isn’t there a comparison to be drawn between the “threads” you describe here, and the multi-headed attention mechanisms (or whatever it is) that the LLM models use to weigh associations at various distances between tokens?


Don't know, probably? I'm a linear reader


some of us with ADHD just kind of read all the words at once




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: