| | The State of LLMs 2025: Progress, Problems, and Predictions (sebastianraschka.com) |
| 1 point by nsainsbury 6 days ago | past | discuss |
|
| | The State of LLMs 2025: Progress, Problems, and Predictions (sebastianraschka.com) |
| 3 points by ModelForge 10 days ago | past | discuss |
|
| | The State of LLMs 2025: Progress, Progress, and Predictions (sebastianraschka.com) |
| 4 points by ibobev 11 days ago | past | discuss |
|
| | The State of LLMs 2025: Progress, Progress, and Predictions (sebastianraschka.com) |
| 9 points by vismit2000 11 days ago | past | discuss |
|
| | New LLM Pre-Training and Post-Training Paradigms (sebastianraschka.com) |
| 2 points by lr0 13 days ago | past | 1 comment |
|
| | Understanding Encoder and Decoder LLMs (sebastianraschka.com) |
| 1 point by jeffjeffbear 23 days ago | past |
|
| | A Technical Tour of the DeepSeek Models from V3 to v3.2 (sebastianraschka.com) |
| 23 points by ibobev 37 days ago | past | 1 comment |
|
| | A Technical Tour of the DeepSeek Models from V3 to v3.2 (sebastianraschka.com) |
| 5 points by mzl 38 days ago | past | 1 comment |
|
| | Recommendations for Getting the Most Out of a Technical Book (sebastianraschka.com) |
| 2 points by naves 38 days ago | past |
|
| | A Technical Tour of the DeepSeek Models from V3 to v3.2 (sebastianraschka.com) |
| 8 points by giuliomagnifico 38 days ago | past |
|
| | Getting the Most Out of a Technical Book (sebastianraschka.com) |
| 4 points by quietlearning 58 days ago | past |
|
| | Beyond Standard LLMs (sebastianraschka.com) |
| 1 point by vismit2000 63 days ago | past |
|
| | Beyond Standard LLMs (sebastianraschka.com) |
| 1 point by ibobev 66 days ago | past |
|
| | A Researcher's Field Guide to Non-Standard LLM Architectures (sebastianraschka.com) |
| 2 points by ModelForge 67 days ago | past |
|
| | Understanding the 4 Main Approaches to LLM Evaluation (From Scratch) (sebastianraschka.com) |
| 1 point by ibobev 87 days ago | past |
|
| | Popular Attention Alternatives: GQA, MLA, SWA (sebastianraschka.com) |
| 4 points by ModelForge 87 days ago | past |
|
| | Multi-Head Latent Attention (sebastianraschka.com) |
| 4 points by ModelForge 89 days ago | past |
|
| | Understanding the 4 Main Approaches to LLM Evaluation (From Scratch) (sebastianraschka.com) |
| 2 points by ibobev 3 months ago | past |
|
| | LLM Evaluation from Scratch: Multiple Choice, Verifiers, Leaderboards, LLM Judge (sebastianraschka.com) |
| 4 points by ModelForge 3 months ago | past |
|
| | Understanding and Implementing Qwen3 from Scratch (sebastianraschka.com) |
| 1 point by ibobev 3 months ago | past |
|
| | GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (sebastianraschka.com) |
| 490 points by ModelForge 5 months ago | past | 97 comments |
|
| | From GPT-2 to GPT-OSS: Analyzing the Architectural Advances (sebastianraschka.com) |
| 3 points by mdp2021 5 months ago | past |
|
| | PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs (sebastianraschka.com) |
| 1 point by Anon84 5 months ago | past |
|
| | PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs (sebastianraschka.com) |
| 4 points by mariuz 5 months ago | past |
|
| | LLM architecture comparison (sebastianraschka.com) |
| 418 points by mdp2021 5 months ago | past | 24 comments |
|
| | The Big LLM Architecture Comparison (sebastianraschka.com) |
| 3 points by Quizzical4230 5 months ago | past |
|
| | Comprehensive ML/AI questions and answers for interview prep (sebastianraschka.com) |
| 2 points by yaiml 6 months ago | past |
|
| | PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs (sebastianraschka.com) |
| 4 points by sbbq 6 months ago | past |
|
| | Intermediate ML and AI questions and answers for interview prep (sebastianraschka.com) |
| 3 points by sbbq 6 months ago | past |
|
| | Understanding and Coding the KV Cache in LLMs from Scratch (sebastianraschka.com) |
| 6 points by sbbq 6 months ago | past |
|
|
| More |