Agentic pelican on a bicycle (robert-glaser.de)

Роберт Глейзер провел эксперимент, используя агентный цикл «генерация-оценка-улучшение» для создания SVG-изображения пеликана на велосипеде. Модели получали доступ к Chrome DevTools для конвертации SVG в JPG и использовали зрение для самокоррекции. Тестируемые Claude Opus, Sonnet, Haiku, GPT-5 Medium, GPT-5-Codex и Gemini 2.5 Pro делали 4-6 итераций, самостоятельно решая, когда остановиться. Эксперимент основан на бенчмарке Симона Уилльсона, который используют даже лаборатории в маркетинге новых моделей.

Claude Opus добавил цепь и спицы, улучшив механическую достоверность; Sonnet делал тонкие доработки кривых и теней; Haiku за 6 итераций настойчиво исправлял пропорции. GPT-5 Medium и Codex показывали постепенное улучшение, а Gemini 2.5 Pro демонстрировал стабильные результаты. Ключевое открытие: модели способны к самооценке и самокоррекции без детальных указаний, сохраняя дух оригинального абсурдного запроса. Использование единого рендерера обеспечило объективность сравнения.

by todsacerdoti • 11 ноября 2025 г. в 19:40 • 85 points

Механизм	Параметры KV	Память KV	Качество	Примечание
MHA	h·d·d	O(hd)	высокое	baseline
MQA	d·d	O(d)	↓	быстрый
GQA	g·d·d	O(gd)	≈ MHA	компромисс
MLA	d_lat·d	O(d_lat)	≈ MHA	state-of-art

ядро	TFLOPS	%SOL
F.sdpa (Flash)	186.73	89.13
F.sdpa (CuDNN)	203.61	97.19
flash-attn	190.58	90.97
v1 (basic)	142.87	68.20
v2 (swizzle)	181.11	86.45
v3 (2-stage)	189.84	90.62
v4 (ldmatrix.x4)	194.33	92.76
v5 (pipe)	197.74	94.39

Метод	Время	Точн. токенов	Решено задач
Авторегрессия	1×	94 %	21 %
Диффузия 10 шагов	0.6×	95 %	19 %
Диффузия 30 шагов	1.3×	94 %	21 %

Agentic pelican on a bicycle (robert-glaser.de)

Spatial intelligence is AI’s next frontier (drfeifei.substack.com)

The Principles of Diffusion Models (arxiv.org)

Grok 4 Fast now has 2M context window (docs.x.ai) 💬 Длинная дискуссия

Study identifies weaknesses in how AI systems are evaluated (oii.ox.ac.uk) 🔥 Горячее 💬 Длинная дискуссия

Leaving Meta and PyTorch (soumith.ch) 🔥 Горячее 💬 Длинная дискуссия

The Learning Loop and LLMs (martinfowler.com)

LLMs encode how difficult problems are (arxiv.org)

Mathematical exploration and discovery at scale (terrytao.wordpress.com) 🔥 Горячее

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model (book.sv) 🔥 Горячее 💬 Длинная дискуссия

The shadows lurking in the equations (gods.art) 🔥 Горячее

Launch HN: Plexe (YC X25) – Build production-grade ML models from prompts (plexe.ai)

Lessons from interviews on deploying AI Agents in production (mmc.vc)

AI's Dial-Up Era (wreflection.com) 🔥 Горячее 💬 Длинная дискуссия

The Case That A.I. Is Thinking (newyorker.com) 💬 Длинная дискуссия

Tongyi DeepResearch – open-source 30B MoE Model that rivals OpenAI DeepResearch (tongyi-agent.github.io) 🔥 Горячее

Helion: A high-level DSL for performant and portable ML kernels (pytorch.org)

Backpropagation is a leaky abstraction (2016) (karpathy.medium.com) 🔥 Горячее

Learning from failure to tackle hard problems (blog.ml.cmu.edu)

The Smol Training Playbook: The Secrets to Building World-Class LLMs (huggingface.co)

Signs of introspection in large language models (anthropic.com)

Developers are choosing older AI models (augmentcode.com)

ICE Will Use AI to Surveil Social Media (jacobin.com) 💬 Длинная дискуссия

A definition of AGI (arxiv.org) 🔥 Горячее 💬 Длинная дискуссия

Feed the bots (maurycyz.com) 🔥 Горячее 💬 Длинная дискуссия

Pico-Banana-400k (github.com) 🔥 Горячее

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference (arxiv.org)

A bug that taught me more about PyTorch than years of using it (elanapearl.github.io) 🔥 Горячее

Antislop: A framework for eliminating repetitive patterns in language models (arxiv.org)

Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text? (twitter.com) 🔥 Горячее

LLMs can get "brain rot" (llm-brain-rot.github.io) 🔥 Горячее 💬 Длинная дискуссия

Should LLMs just treat text content as an image? (seangoedecke.com)

The case for the return of fine-tuning (welovesota.com)

Downloadable movie posters from the 40s, 50s, 60s, and 70s (hrc.contentdm.oclc.org) 🔥 Горячее

Most users cannot identify AI bias, even in training data (psu.edu)

Andrej Karpathy – It will take a decade to work through the issues with agents (dwarkesh.com) 🔥 Горячее 💬 Длинная дискуссия

Claude Code vs. Codex: I built a sentiment dashboard from Reddit comments (aiengineering.report)

Benjie's Humanoid Olympic Games (generalrobots.substack.com)

Coral NPU: A full-stack platform for Edge AI (research.google)

A Gemma model helped discover a new potential cancer therapy pathway (blog.google)

Claude Haiku 4.5 (anthropic.com) 🔥 Горячее 💬 Длинная дискуссия

Nvidia DGX Spark: great hardware, early days for the ecosystem (simonwillison.net)

Beliefs that are true for regular software but false when applied to AI (boydkane.com) 🔥 Горячее 💬 Длинная дискуссия

How AI hears accents: An audible visualization of accent clusters (accent-explorer.boldvoice.com)

If you'd built a "tool" that stupid, why would you advertise the fact? (svpow.com)

PlayStation 3 Architecture (2021) (copetti.org)

NanoChat – The best ChatGPT that $100 can buy (github.com) 🔥 Горячее 💬 Длинная дискуссия

Who invented deep residual learning? (people.idsia.ch)

AMD and Sony's PS6 chipset aims to rethink the current graphics pipeline (arstechnica.com) 🔥 Горячее 💬 Длинная дискуссия

Show HN: I invented a new generative model and got accepted to ICLR (discrete-distribution-networks.github.io) 🔥 Горячее

Reasoning LLMs are wandering solution explorers (arxiv.org)

A small number of samples can poison LLMs of any size (anthropic.com) 🔥 Горячее 💬 Длинная дискуссия

Figure 03, our 3rd generation humanoid robot (figure.ai) 🔥 Горячее 💬 Длинная дискуссия

Why do LLMs freak out over the seahorse emoji? (vgel.me) 🔥 Горячее 💬 Длинная дискуссия

Rule-Based Expert Systems: The Mycin Experiments (1984) (shortliffe.net)

What GPT-OSS leaks about OpenAI's training data (fi-le.net) 🔥 Горячее

NIST's DeepSeek "evaluation" is a hit piece (erichartford.com)

The deadline isn't when AI outsmarts us – it's when we stop using our own minds (theargumentmag.com) 🔥 Горячее 💬 Длинная дискуссия

How to inject knowledge efficiently? Knowledge infusion scaling law for LLMs (arxiv.org)

Paged Out Issue #7 [pdf] (pagedout.institute) 🔥 Горячее

New antibiotic targets IBD and AI predicted how it would work (healthsci.mcmaster.ca)

How does gradient descent work? (centralflows.github.io) 🔥 Горячее

Microsoft CTO says he wants to swap most AMD and Nvidia GPUs for homemade chips (cnbc.com)

Who needs Git when you have 1M context windows? (alexmolas.com) 💬 Длинная дискуссия

What makes 5% of AI agents work in production? (motivenotes.ai)

The G in GPU is for Graphics damnit (ut21.github.io)

DARPA project for automated translation from C to Rust (2024) (darpa.mil)

Evaluating the impact of AI on the labor market: Current state of affairs (budgetlab.yale.edu)

Announcing Tinker (thinkingmachines.ai)

OpenTSLM: Language models that understand time series (opentslm.com) 🔥 Горячее

Ask HN: Who wants to be hired? (October 2025) 💬 Длинная дискуссия

Building the heap: racking 30 petabytes of hard drives for pretraining (si.inc) 🔥 Горячее 💬 Длинная дискуссия

High-resolution efficient image generation from WiFi Mapping (arxiv.org)

Introduction to Multi-Armed Bandits (2019) (arxiv.org)

Making sure AI serves people and knowledge stays human (diff.wikimedia.org)