Hacker Newsnew | past | comments | ask | show | jobs | submit | elinear's commentslogin

Benchmarks pasted here, with top scores highlighted. Overall Qwen Max is pretty competitive with the others here.

  Capability                            Benchmark           GPT-5.2-Thinking   Claude-Opus-4.5   Gemini 3 Pro   DeepSeek V3.2   Qwen3-Max-Thinking
  Knowledge                             MMLUPro             87.4               89.5              *89.8*         85.0            85.7            
  Knowledge                             MMLURedux           95.0               95.6              *95.9*         94.5            92.8            
  Knowledge                             CEval               90.5               92.2              93.4           92.9            *93.7*      
  STEM                                  GPQA                *92.4*             87.0              91.9           82.4            87.4           
  STEM                                  HLE                 35.5               30.8              *37.5*         25.1            30.2           
  Reasoning                             LiveCodeBench v6    87.7               84.8              *90.7*         80.8            85.9           
  Reasoning                             HMMT Feb 25         *99.4*             -                 97.5           92.5            98.0            
  Reasoning                             HMMT Nov 25         -                  -                 93.3           90.2            *94.7*      
  Reasoning                             IMOAnswerBench      *86.3*             84.0              83.3           78.3            83.9           
  Agentic Coding                        SWE Verified        80.0               *80.9*            76.2           73.1            75.3           
  Agentic Search                        HLE (w/ tools)      45.5               43.2              45.8           40.8            *49.8*     
  Instruction Following & Alignment     IFBench             *75.4*             58.0              70.4           60.7            70.9           
  Instruction Following & Alignment     MultiChallenge      57.9               54.2              *64.2*         47.3            63.3           
  Instruction Following & Alignment     ArenaHard v2        80.6               76.7              81.7           66.5            *90.2*      
  Tool Use                              Tau² Bench          80.9               *85.7*            85.4           80.3            82.1           
  Tool Use                              BFCLV4              63.1               *77.5*            72.5           61.2            67.7            
  Tool Use                              Vita Bench          38.2               *56.3*            51.6           44.1            40.9           
  Tool Use                              Deep Planning       *44.6*             33.9              23.3           21.6            28.7           
  Long Context                          AALCR               72.7               *74.0*            70.7           65.0            68.7

At current prices, if you do not already have 32GB RAM, it will cost over $300 for DDR5, and over $200 for DDR4...

yep, the pricing is getting ridiculous... on the other hand if you are using that 32gb to make money then its an investment thats worth it imo

Both versions are grammatically correct. "Lagged behind" is common for everyday speech, while using "lagged" as a direct verb is a standard, formal way to describe data gaps in business or news. So yes, the headline uses just "lagged" to save space.


A particularly egregious offender is Kalshi ads. They regularly play for a minute, sometimes up to two minutes before they can be closed.

I would not be surprised if the incentives are in place for ad networks to push for longer ads and for advertisers to create longer ads.



MKBHD in his review mentioned a very odd instance of overheating, seemingly random and unrelated to any stressful workload. Anecdotal evidence points to iOS bugs.

https://youtu.be/cBpGq-vDr2Y?si=B0Tv1iRcd6Iran6B&t=669


Modern day California, and by extension Silicon Valley, would not look the way it is today without the California Gold Rush [1], as horrible as its externalities were.

[1] https://en.wikipedia.org/wiki/California_Gold_Rush#Longer-te...


Great prosperity often comes at, or with, a great cost. California today doesn't make a great picture when you look at the homelessness and inequality, and the exorbitant cost of living anywhere near where the action is.


I have seen this article [1] being thrown around among my conservative friends, and while I do not have the statistics background to understand the detailed analysis, it seems to suggest some strange behavior around the reporting of mail-in votes. Not exactly evidence, but something that may have warranted investigation at the time.

[1] https://votepatternanalysis.substack.com/p/voting-anomalies-...

edit: I'm looking for folks to give their take on this analysis since I nowhere qualified. A good summary is the final two sections of the article.


This type of stuff was posted on HN and pretty well debunked / rejected by most commentors: https://news.ycombinator.com/item?id=25031344

My comment and some discussion on it: https://news.ycombinator.com/item?id=25031443


The vote spikes are simple - early voting ballots getting reported [1]. Biden encouraged his supporters to vote early. Trump did the opposite. Accordingly, the early/absentee votes are ~90% for Biden. Due to the way they get counted they come in larger lumps than day-of vote counting. As far as I can tell the rest of the post is statistical gish-gallop with some graphs and equations to make it all look more convincing.

I also want to say that the sources of reported votes isn't a mystery, and the author could easily have found out that they were early votes if they had wanted to. Either they didn't check, or didn't want to inform people of those very relevant facts.

[1]: https://www.reuters.com/article/uk-factcheck-wi-pa-mi-vote-s...


My thought is that such a body would act like existing organizations that, for example, certify medical professionals, engineers, accountants, or even colleges. For online news it may look like a journalism credential, requiring sites to follow basic and sound journalistic ethics when publishing. By these standards it should be easy for the large papers (NYT, WSJ, etc) as well as your local news and tv to receive accreditation for their sites. See https://en.m.wikipedia.org/wiki/Accreditation


Can't say about Ori or a lot of the distributed file systems being showcased, but Panzura's global file system addresses this exact use case. But since they make enterprise products that run on dedicated hardware, they might be a bit pricey.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: