05.05.202520:04
Here is a quick and dirty test I ran: Generating a math book entirely with AI. Initially, I wanted to post it as a blog post and insisted that it not use LaTeX. The formatting of the first units differs from that of the later units and could be improved significantly. However, those changes are relatively easy to implement: 1. Download as .docx 2. Upload file 3. Gemini 2.5 Pro prompt: "Generate an improved version of Unit # with Superior Mathematical Typesetting."
Modular Arithmetic - From Basics to Advanced Concepts https://docs.google.com/document/d/10TIxITHGAR5yskFPdzC1CKQmyogMhGGOZbyyBBgLM40/edit?usp=sharing
Unit 0: Foundations – The Division Algorithm
Unit 1: Introduction to Congruence Modulo n
Unit 2: Properties of Congruence Relations
Unit 3: Modular Arithmetic Operations
Unit 4: Multiplicative Inverses and Cancellation
Unit 5: Solving Linear Congruences
Unit 6: Solving Systems of Congruences – The Chinese Remainder Theorem (CRT)
Unit 7: Powers and Primes – Fermat’s Little Theorem (FLT)
Unit 8: Generalizing Fermat – Euler’s Totient Function and Euler’s Theorem
Unit 9: A Glimpse of Advanced Modular Arithmetic
Unit 10: Modular Arithmetic Meets Abstract Algebra – Rings, Groups, and Fields
Modular Arithmetic - From Basics to Advanced Concepts https://docs.google.com/document/d/10TIxITHGAR5yskFPdzC1CKQmyogMhGGOZbyyBBgLM40/edit?usp=sharing
Unit 0: Foundations – The Division Algorithm
Unit 1: Introduction to Congruence Modulo n
Unit 2: Properties of Congruence Relations
Unit 3: Modular Arithmetic Operations
Unit 4: Multiplicative Inverses and Cancellation
Unit 5: Solving Linear Congruences
Unit 6: Solving Systems of Congruences – The Chinese Remainder Theorem (CRT)
Unit 7: Powers and Primes – Fermat’s Little Theorem (FLT)
Unit 8: Generalizing Fermat – Euler’s Totient Function and Euler’s Theorem
Unit 9: A Glimpse of Advanced Modular Arithmetic
Unit 10: Modular Arithmetic Meets Abstract Algebra – Rings, Groups, and Fields
04.05.202512:20
The Ukraine War and the Kill Market
Read more: https://www.politico.eu/article/ukraines-army-have-video-game-like-digital-weapons-store-deadly-realistic/
The [Ukrainian] program […] rewards soldiers with points if they upload videos proving their drones have hit Russian targets. It will soon be integrated with a new online marketplace called Brave 1 Market, which will allow troops to convert those points into new equipment for their units.
[...]
The program assigns points for each type of kill: 20 points for damaging and 40 for destroying a tank; up to 50 points for destroying a mobile rocket system, depending on the caliber; and six points for killing an enemy soldier.
[...]
Units will soon be able to use the special digital points they’ve been getting since last year by trading them in for new weapons. A Vampire drone, for example, costs 43 points. The drone, nicknamed Baba Yaga, or witch, is a large multi-rotor drone able to carry a 15-kilogram warhead. The Ukrainian government will pay for the drones that are ordered and will deliver them to the front-line unit within a week.
[...]
The scheme is aimed at directing more equipment to the most effective units. It will also help to bypass bureaucratic procurement procedures and buy weapons directly from manufacturers.
[...]
The ability to get points for killing enemy troops is also spurring competition among units; so far about 90 percent of the army's drone units have scored points. In fact, they are logging so many hits that the government has had to revamp the logistics of drone deliveries to get more of them to points-heavy units. “They started killing so quickly that Ukraine does not have time to deliver new drones,” Fedorov said.
Read more: https://www.politico.eu/article/ukraines-army-have-video-game-like-digital-weapons-store-deadly-realistic/
30.04.202517:25
Links for 2025-04-30 [Part 2]
AI
14. How people use LLMs https://www.lesswrong.com/posts/FXnvdeprjBujt2Ssr/how-people-use-llms
15. NotebookLM Audio Overviews are now available in over 50 languages https://blog.google/technology/google-labs/notebooklm-audio-overviews-50-languages/
16. o3 Beats a Master-Level Geoguessr Player—Even with Fake EXIF Data https://sampatt.com/blog/2025-04-28-can-o3-beat-a-geoguessr-master
17. Mark Zuckerberg predicts that within the next 12 to 18 months most of AI development code will be written by AI. He said 'We're trying to build a coding agent and an AI research agent that advances Llama research specifically.' https://www.dwarkesh.com/p/mark-zuckerberg-2
18. Former Google CEO Schmidt: Why U.S. Needs to Win Race for Superintelligent AI https://www.youtube.com/watch?v=5l8eDLunQFU
19. “At McKinsey, consultants are using an in-house generative AI chatbot called Lilli. It synthesizes the firm's entire body of intellectual property, which spans 100 years and over 100,000 documents and interviews, the firm told BI…Over 70% of the firm's 45,000 employees now use the tool.” https://www.businessinsider.com/consulting-ai-mckinsey-bcg-deloitte-pwc-kpmg-chatbots-ai-tools-2025-4 [no paywall: https://archive.is/RuXpi]
20. GPT-4o Is An Absurd Sycophant https://www.lesswrong.com/posts/zi6SsECs5CCEyhAop/gpt-4o-is-an-absurd-sycophant
21. “Sycophancy in GPT-4o: What happened and what we’re doing about it” https://openai.com/index/sycophancy-in-gpt-4o/
22. Our Reality: A Simulation Run by a Paperclip Maximizer https://www.lesswrong.com/posts/HxLYnGYspLoeLLrE6/our-reality-a-simulation-run-by-a-paperclip-maximizer-1
China AI
1. “April Politburo Study Session on AI is bad news for Nvidia” https://sinocism.com/p/april-politburo-study-session-on
2. Politburo holds a second AI study session after seven years https://triviumchina.com/2025/04/28/politburo-holds-a-second-ai-study-session-after-seven-years/
3. Former ASML head scientist Lin Nan drives China’s latest EUV breakthrough https://www.msn.com/en-xl/news/other/former-asml-head-scientist-lin-nan-drives-china-s-latest-euv-breakthrough/ar-AA1DNjSr
Miscellaneous
1. Tracking single neurons in the human brain reveals new insight into language and other human-specific functions https://www.thetransmitter.org/human-neurotechnology/tracking-single-neurons-in-the-human-brain-reveals-new-insight-into-language-and-other-human-specific-functions/
2. French defense company Turgis Gaillard is set to unveil a major new weapons system at the Paris Air Show in June 2025. According to exclusive information from Challenges magazine, the group will present "Foudre" (Lightning), a prototype multiple rocket launcher designed to compete with the renowned American Himars. Developed in secret for two years with 100% self-financing. https://www.challenges.fr/entreprise/defense/un-engin-100-francais-concurrent-du-himars-foudre-le-lance-roquettes-que-personne-nattendait_603520
AI
14. How people use LLMs https://www.lesswrong.com/posts/FXnvdeprjBujt2Ssr/how-people-use-llms
15. NotebookLM Audio Overviews are now available in over 50 languages https://blog.google/technology/google-labs/notebooklm-audio-overviews-50-languages/
16. o3 Beats a Master-Level Geoguessr Player—Even with Fake EXIF Data https://sampatt.com/blog/2025-04-28-can-o3-beat-a-geoguessr-master
17. Mark Zuckerberg predicts that within the next 12 to 18 months most of AI development code will be written by AI. He said 'We're trying to build a coding agent and an AI research agent that advances Llama research specifically.' https://www.dwarkesh.com/p/mark-zuckerberg-2
18. Former Google CEO Schmidt: Why U.S. Needs to Win Race for Superintelligent AI https://www.youtube.com/watch?v=5l8eDLunQFU
19. “At McKinsey, consultants are using an in-house generative AI chatbot called Lilli. It synthesizes the firm's entire body of intellectual property, which spans 100 years and over 100,000 documents and interviews, the firm told BI…Over 70% of the firm's 45,000 employees now use the tool.” https://www.businessinsider.com/consulting-ai-mckinsey-bcg-deloitte-pwc-kpmg-chatbots-ai-tools-2025-4 [no paywall: https://archive.is/RuXpi]
20. GPT-4o Is An Absurd Sycophant https://www.lesswrong.com/posts/zi6SsECs5CCEyhAop/gpt-4o-is-an-absurd-sycophant
21. “Sycophancy in GPT-4o: What happened and what we’re doing about it” https://openai.com/index/sycophancy-in-gpt-4o/
22. Our Reality: A Simulation Run by a Paperclip Maximizer https://www.lesswrong.com/posts/HxLYnGYspLoeLLrE6/our-reality-a-simulation-run-by-a-paperclip-maximizer-1
China AI
A meeting of China’s Communist leadership underscored its intense focus on developing homegrown artificial intelligence, analysts said.
1. “April Politburo Study Session on AI is bad news for Nvidia” https://sinocism.com/p/april-politburo-study-session-on
2. Politburo holds a second AI study session after seven years https://triviumchina.com/2025/04/28/politburo-holds-a-second-ai-study-session-after-seven-years/
3. Former ASML head scientist Lin Nan drives China’s latest EUV breakthrough https://www.msn.com/en-xl/news/other/former-asml-head-scientist-lin-nan-drives-china-s-latest-euv-breakthrough/ar-AA1DNjSr
Miscellaneous
1. Tracking single neurons in the human brain reveals new insight into language and other human-specific functions https://www.thetransmitter.org/human-neurotechnology/tracking-single-neurons-in-the-human-brain-reveals-new-insight-into-language-and-other-human-specific-functions/
2. French defense company Turgis Gaillard is set to unveil a major new weapons system at the Paris Air Show in June 2025. According to exclusive information from Challenges magazine, the group will present "Foudre" (Lightning), a prototype multiple rocket launcher designed to compete with the renowned American Himars. Developed in secret for two years with 100% self-financing. https://www.challenges.fr/entreprise/defense/un-engin-100-francais-concurrent-du-himars-foudre-le-lance-roquettes-que-personne-nattendait_603520
29.04.202507:47
Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users
Claude 3.5 Sonnet (new) aka Sonnet 3.6 (released 2024-10-22), with a small scaffold, is superhuman at persuasion (98%ile among human experts; 3-4x more persuasive than the median human expert).
Take this with a grain of salt, as the study hasn’t been peer-reviewed. However, they pre-registered it five months ago.
If these results are replicated, then in a few more years, AI progress will allow wealthy individuals and nation states to run massive and successful superhuman influence operations, changing the behavior of large groups of people.
Paper: https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/view
Preregistration: https://osf.io/atcvn/?view_only=dcf58026c0374c1885368c23763a2bad
Press: https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/
Claude 3.5 Sonnet (new) aka Sonnet 3.6 (released 2024-10-22), with a small scaffold, is superhuman at persuasion (98%ile among human experts; 3-4x more persuasive than the median human expert).
Take this with a grain of salt, as the study hasn’t been peer-reviewed. However, they pre-registered it five months ago.
If these results are replicated, then in a few more years, AI progress will allow wealthy individuals and nation states to run massive and successful superhuman influence operations, changing the behavior of large groups of people.
Paper: https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/view
Preregistration: https://osf.io/atcvn/?view_only=dcf58026c0374c1885368c23763a2bad
Press: https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/


24.04.202512:53
Gemini 2.5 Pro really is on a different level. Especially if you use it properly. I look forward to Google's next model.
https://x.com/aryehazan/status/1915308472377237554
https://x.com/aryehazan/status/1915308472377237554
23.04.202515:18
A Practical Guide to Superhuman Mathematical AI
(written by o3)
1. Data Preparation
1.1. Download and clean formal libraries (Lean mathlib, Isabelle AFP, Coq, Metamath).
1.2. Crawl/tokenize informal sources (arXiv LaTeX, MathOverflow).
1.3. Seed with a small verified synthetic proof–theorem set.
2. Pre-train FM-Math
2.1. Choose a 10–100 B-param transformer with syntax-aware tokenization.
2.2. Train on formal + informal + seed data.
3. Graph-Based Encoder & Retrieval
3.1. Represent proof state as a graph (goals, hypotheses, lemma nodes).
3.2. Wrap FM-Math in a Graph Transformer over that graph.
3.3. Index lemma embeddings in a vector DB for k-NN retrieval.
4. RL Environment with Proof Assistants
4.1. Embed Lean/Isabelle as
4.2. Reward +1 for completed proofs, shaped bonuses for closed subgoals, penalties for failures.
5. Dual-Head Agent
5.1. Conjecturer: mines gaps & analogies in the library graph.
5.2. Prover: outputs tactic/lemma distributions given current graph-state.
6. Search Orchestrator
6.1. Async MCTS (AlphaZero style), policy/value from dual-head.
6.2. Prune branches on verifier failure; share nodes across trees (SKEST).
7. Synthetic-Data Factory
7.1. Forward enumeration: random axiom subsets → depth-d consequences.
7.2. Mutation: weaken/strengthen or permute existing theorems.
7.3. Batch-run LeanNavigator agents to generate millions of proofs.
7.4. Filter via “TP-as-a-Judge” (batch verification).
7.5. Add only verified items back to replay buffers.
8. RL + Self-Play Loop
8.1. Initialize replay buffer with human & synthetic proofs.
8.2. Alternate:
(a) Off-policy RL (PPO) on buffer.
(b) Self-play jobs generating new proofs via the orchestrator.
8.3. Include an intrinsic-curiosity bonus for novel proof-state embeddings.
9. Phase 0: Instrumentation (2024–mid-25)
9.1. Validate FM-Math on FrontierMath, miniF2F, ProofNet.
9.2. Ensure end-to-end FM → graph → prover → verifier at scale.
9.3. Publish baselines.
10. Phase 1: Neuro-Symbolic MVP (mid-25–26)
10.1. ≥50 % FrontierMath; ≥80 % miniF2F.
10.2. Autoformalize first-year undergrad algebra texts at ≥70 %.
10.3. Ablate graph vs. flat encoding and RL+MCTS vs. pure policy.
11. Phase 2: Scaling & Data Gen (26–28)
11.1. Run LeanNavigator++ at 1 K GPUs → ≥1 B tokens verified.
11.2. Autonomously prove ≥40 % of theorems in a graduate corpus.
11.3. Introduce hierarchical proof planning: outline → tactics.
12. Phase 3: Open-Problem Sprint (28–31)
12.1. Focus AI+human teams on one Clay-level conjecture.
12.2. Track progress via formal benchmarks & peer review.
12.3. Human-gate any frontier theorem before release.
13. Phase 4: Generalist Mathematician (2031+)
13.1. ≥95 % FrontierMath; full IMO gold.
13.2. Publish referee-verified, AI-only theorems.
13.3. Enable on-demand axiom switching or proposal.
14. Safety & Alignment
14.1. Independent board for dual-use risk (cryptography, bio-math).
14.2. Log every search trace; generate explainable NL+diagrams.
14.3. Human-gate sensitive conjectures.
15. Scale Resources & Teams
15.1. Expand from 1–2 K to 10–30 K GPUs as self-play scales.
15.2. Grow formal corpora from <2 TB to 20–50 TB with autoformalized arXiv.
15.3. Recruit 100+ AI researchers, formalizers, and mathematicians.
16. Success Criteria
16.1. Continuous metrics: FrontierMath, miniF2F, ProofNet.
16.2. Breakthrough: AI contributes to open problems; AI-only published key results.
16.3. Safety alarm: dual-use thresholds require human oversight.
(written by o3)
1. Data Preparation
1.1. Download and clean formal libraries (Lean mathlib, Isabelle AFP, Coq, Metamath).
1.2. Crawl/tokenize informal sources (arXiv LaTeX, MathOverflow).
1.3. Seed with a small verified synthetic proof–theorem set.
2. Pre-train FM-Math
2.1. Choose a 10–100 B-param transformer with syntax-aware tokenization.
2.2. Train on formal + informal + seed data.
3. Graph-Based Encoder & Retrieval
3.1. Represent proof state as a graph (goals, hypotheses, lemma nodes).
3.2. Wrap FM-Math in a Graph Transformer over that graph.
3.3. Index lemma embeddings in a vector DB for k-NN retrieval.
4. RL Environment with Proof Assistants
4.1. Embed Lean/Isabelle as
step(tactic) → (new state or error)
. 4.2. Reward +1 for completed proofs, shaped bonuses for closed subgoals, penalties for failures.
5. Dual-Head Agent
5.1. Conjecturer: mines gaps & analogies in the library graph.
5.2. Prover: outputs tactic/lemma distributions given current graph-state.
6. Search Orchestrator
6.1. Async MCTS (AlphaZero style), policy/value from dual-head.
6.2. Prune branches on verifier failure; share nodes across trees (SKEST).
7. Synthetic-Data Factory
7.1. Forward enumeration: random axiom subsets → depth-d consequences.
7.2. Mutation: weaken/strengthen or permute existing theorems.
7.3. Batch-run LeanNavigator agents to generate millions of proofs.
7.4. Filter via “TP-as-a-Judge” (batch verification).
7.5. Add only verified items back to replay buffers.
8. RL + Self-Play Loop
8.1. Initialize replay buffer with human & synthetic proofs.
8.2. Alternate:
(a) Off-policy RL (PPO) on buffer.
(b) Self-play jobs generating new proofs via the orchestrator.
8.3. Include an intrinsic-curiosity bonus for novel proof-state embeddings.
9. Phase 0: Instrumentation (2024–mid-25)
9.1. Validate FM-Math on FrontierMath, miniF2F, ProofNet.
9.2. Ensure end-to-end FM → graph → prover → verifier at scale.
9.3. Publish baselines.
10. Phase 1: Neuro-Symbolic MVP (mid-25–26)
10.1. ≥50 % FrontierMath; ≥80 % miniF2F.
10.2. Autoformalize first-year undergrad algebra texts at ≥70 %.
10.3. Ablate graph vs. flat encoding and RL+MCTS vs. pure policy.
11. Phase 2: Scaling & Data Gen (26–28)
11.1. Run LeanNavigator++ at 1 K GPUs → ≥1 B tokens verified.
11.2. Autonomously prove ≥40 % of theorems in a graduate corpus.
11.3. Introduce hierarchical proof planning: outline → tactics.
12. Phase 3: Open-Problem Sprint (28–31)
12.1. Focus AI+human teams on one Clay-level conjecture.
12.2. Track progress via formal benchmarks & peer review.
12.3. Human-gate any frontier theorem before release.
13. Phase 4: Generalist Mathematician (2031+)
13.1. ≥95 % FrontierMath; full IMO gold.
13.2. Publish referee-verified, AI-only theorems.
13.3. Enable on-demand axiom switching or proposal.
14. Safety & Alignment
14.1. Independent board for dual-use risk (cryptography, bio-math).
14.2. Log every search trace; generate explainable NL+diagrams.
14.3. Human-gate sensitive conjectures.
15. Scale Resources & Teams
15.1. Expand from 1–2 K to 10–30 K GPUs as self-play scales.
15.2. Grow formal corpora from <2 TB to 20–50 TB with autoformalized arXiv.
15.3. Recruit 100+ AI researchers, formalizers, and mathematicians.
16. Success Criteria
16.1. Continuous metrics: FrontierMath, miniF2F, ProofNet.
16.2. Breakthrough: AI contributes to open problems; AI-only published key results.
16.3. Safety alarm: dual-use thresholds require human oversight.
05.05.202519:13
Links for 2025-05-05
AI
1. On the generalization of language models from in-context learning and finetuning: a controlled study https://arxiv.org/abs/2505.00661
2. Novel AI model inspired by neural dynamics from the brain https://news.mit.edu/2025/novel-ai-model-inspired-neural-dynamics-from-brain-0502
3. Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers https://physics.allen-zhu.com/part-4-architecture-design/part-4-1
4. Nikolay Savinov predicts that the industry is going to achieve near perfect retrieval across 1-2M context length 'quite soon', and that soon afterwards a 10M token context window will become the norm. https://www.youtube.com/watch?v=NHMJ9mqKeMQ
5. What's going on with AI progress and trends? (As of 5/2025) https://www.lesswrong.com/posts/v7LtZx6Qk5e9s7zj3/what-s-going-on-with-ai-progress-and-trends-as-of-5-2025
6. Waymo robotaxis are safer than human drivers https://growsf.org/news/2025-05-02-waymo-safety/
7. Around 60% of students reported using AI themselves, while they estimated that nearly 90% of their peers use AI. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910
8. “I Recorded Everything I Said for Three Months. AI Has Replaced My Memory.” https://www.wsj.com/tech/personal-tech/ai-personal-assistant-wearable-tech-impressions-28156b57 [no paywall: https://archive.is/xac6J]
9. This Chart Might Keep You From Worrying About AI’s Energy Use https://spectrum.ieee.org/ai-energy
10. Will nuclear energy power the AI boom? https://thebaffler.com/latest/project-ludicrous-northwood [no paywall: https://archive.is/m8KCB]
11. Jensen: "First thing to understand: 50% of the world's AI researchers are Chinese." https://www.youtube.com/live/E2o9O0EVouA?si=RZdkLpin-k5C8kGZ&t=594
12. What if AI just keeps getting smarter? https://www.lesswrong.com/posts/MCaqKAfSn345MCz7o/ra-x-controlai-video-what-if-ai-just-keeps-getting-smarter
13. Where’s my ten minute AGI? – if AIs are actually able to perform most tasks on 1-hour task horizons, why don’t we see more real-world task automation? https://epochai.substack.com/p/wheres-my-ten-minute-agi
14. T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT https://arxiv.org/abs/2505.00703
15. Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think https://arxiv.org/abs/2504.20708
Miscellaneous
1. Novel High Resolution 3D Printing Method for Metals and Ceramics https://www.youtube.com/watch?v=kLgPW2672s4
2. Non-linear Ethnic Niches: The emerging Western caste system https://substack.com/home/post/p-162313414
3. Mathematician solves algebra’s oldest problem using intriguing new number sequences: “This is a dramatic revision of a basic chapter in algebra.” https://www.unsw.edu.au/newsroom/news/2025/05/mathematician-solves-algebras-oldest-problem-using-intriguing-new-number-sequences
AI
1. On the generalization of language models from in-context learning and finetuning: a controlled study https://arxiv.org/abs/2505.00661
2. Novel AI model inspired by neural dynamics from the brain https://news.mit.edu/2025/novel-ai-model-inspired-neural-dynamics-from-brain-0502
3. Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers https://physics.allen-zhu.com/part-4-architecture-design/part-4-1
4. Nikolay Savinov predicts that the industry is going to achieve near perfect retrieval across 1-2M context length 'quite soon', and that soon afterwards a 10M token context window will become the norm. https://www.youtube.com/watch?v=NHMJ9mqKeMQ
5. What's going on with AI progress and trends? (As of 5/2025) https://www.lesswrong.com/posts/v7LtZx6Qk5e9s7zj3/what-s-going-on-with-ai-progress-and-trends-as-of-5-2025
6. Waymo robotaxis are safer than human drivers https://growsf.org/news/2025-05-02-waymo-safety/
7. Around 60% of students reported using AI themselves, while they estimated that nearly 90% of their peers use AI. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910
8. “I Recorded Everything I Said for Three Months. AI Has Replaced My Memory.” https://www.wsj.com/tech/personal-tech/ai-personal-assistant-wearable-tech-impressions-28156b57 [no paywall: https://archive.is/xac6J]
9. This Chart Might Keep You From Worrying About AI’s Energy Use https://spectrum.ieee.org/ai-energy
10. Will nuclear energy power the AI boom? https://thebaffler.com/latest/project-ludicrous-northwood [no paywall: https://archive.is/m8KCB]
11. Jensen: "First thing to understand: 50% of the world's AI researchers are Chinese." https://www.youtube.com/live/E2o9O0EVouA?si=RZdkLpin-k5C8kGZ&t=594
12. What if AI just keeps getting smarter? https://www.lesswrong.com/posts/MCaqKAfSn345MCz7o/ra-x-controlai-video-what-if-ai-just-keeps-getting-smarter
13. Where’s my ten minute AGI? – if AIs are actually able to perform most tasks on 1-hour task horizons, why don’t we see more real-world task automation? https://epochai.substack.com/p/wheres-my-ten-minute-agi
14. T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT https://arxiv.org/abs/2505.00703
15. Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think https://arxiv.org/abs/2504.20708
Miscellaneous
1. Novel High Resolution 3D Printing Method for Metals and Ceramics https://www.youtube.com/watch?v=kLgPW2672s4
2. Non-linear Ethnic Niches: The emerging Western caste system https://substack.com/home/post/p-162313414
3. Mathematician solves algebra’s oldest problem using intriguing new number sequences: “This is a dramatic revision of a basic chapter in algebra.” https://www.unsw.edu.au/newsroom/news/2025/05/mathematician-solves-algebras-oldest-problem-using-intriguing-new-number-sequences


03.05.202512:44
I didn't know how important Zeiss was: ASML’s most advanced steppers literally can’t function without the atom-perfect optics from Carl Zeiss SMT—a German optics & optoelectronics powerhouse that builds the entire “imaging engine” inside every machine.
P.S. Dutch-based ASML (Advanced Semiconductor Materials Lithography), headquartered in Veldhoven, Netherlands, is the world’s only supplier of extreme-ultraviolet (EUV) scanners—the literal heart of every cutting-edge chip fab.
P.S. Dutch-based ASML (Advanced Semiconductor Materials Lithography), headquartered in Veldhoven, Netherlands, is the world’s only supplier of extreme-ultraviolet (EUV) scanners—the literal heart of every cutting-edge chip fab.
30.04.202517:24
Links for 2025-04-30 [Part 1]
AI
1. DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition — “Our future work will focus on scaling this paradigm to an AlphaProof-like system with the ultimate aim of tackling IMO-level mathematical problems that represent the frontier of automated theorem proving challenges.” https://github.com/deepseek-ai/DeepSeek-Prover-V2
2. Automated Proof Engineering (APE): Towards File-level Automated Proof Engineering of Formal Math Libraries. APE-Bench I shifts evaluation from “Can the model prove lemma X?” to “Can it behave like a competent maintainer of a giant formal library?” [PDF] https://xinhuajian.wordpress.com/wp-content/uploads/2025/04/ape_bench_i-2.pdf
3. Reinforcement Learning for Reasoning in Large Language Models with One Training Example — 36.0% -> 73.6% on MATH500 by performing RLVR on a single example. Applying entropy loss alone, without any outcome reward, improves perf by 27.4%. https://arxiv.org/abs/2504.20571
4. Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory —state-of-the-art (SOTA) performance—26% more accurate than OpenAI Memory. https://arxiv.org/abs/2504.19413
5. ReasonIR: Training Retrievers for Reasoning Tasks https://arxiv.org/abs/2504.20595
6. SAS-Prompt: Large Language Models as Numerical Optimizers for Robot Self-Improvement https://sites.google.com/asu.edu/sas-llm/
7. Instant Policy: In-Context Imitation Learning via Graph Diffusion — The robot learns several novel tasks instantly, after just ONE demonstration each. https://www.robot-learning.uk/instant-policy
8. Hugging Face releases a 3D-printed robotic arm starting at $100 https://techcrunch.com/2025/04/28/hugging-face-releases-a-3d-printed-robotic-arm-starting-at-100/
9. SplitReason: Learning To Offload Reasoning https://arxiv.org/abs/2504.16379
10. Scaling Laws For Scalable Oversight https://arxiv.org/abs/2504.18530
11. MAGI: Multi-Agent Guided Interview for Psychiatric Assessment https://arxiv.org/abs/2504.18260
12. Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations — Instead of hoping a model explains things right once, they validate explanations by recursively asking about their own outputs - until contradictions are exposed or resolved. [Published in 2022] https://arxiv.org/abs/2205.11822
13. CoRT (Chain of Recursive Thoughts) https://github.com/PhialsBasement/Chain-of-Recursive-Thoughts
AI
1. DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition — “Our future work will focus on scaling this paradigm to an AlphaProof-like system with the ultimate aim of tackling IMO-level mathematical problems that represent the frontier of automated theorem proving challenges.” https://github.com/deepseek-ai/DeepSeek-Prover-V2
2. Automated Proof Engineering (APE): Towards File-level Automated Proof Engineering of Formal Math Libraries. APE-Bench I shifts evaluation from “Can the model prove lemma X?” to “Can it behave like a competent maintainer of a giant formal library?” [PDF] https://xinhuajian.wordpress.com/wp-content/uploads/2025/04/ape_bench_i-2.pdf
3. Reinforcement Learning for Reasoning in Large Language Models with One Training Example — 36.0% -> 73.6% on MATH500 by performing RLVR on a single example. Applying entropy loss alone, without any outcome reward, improves perf by 27.4%. https://arxiv.org/abs/2504.20571
4. Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory —state-of-the-art (SOTA) performance—26% more accurate than OpenAI Memory. https://arxiv.org/abs/2504.19413
5. ReasonIR: Training Retrievers for Reasoning Tasks https://arxiv.org/abs/2504.20595
6. SAS-Prompt: Large Language Models as Numerical Optimizers for Robot Self-Improvement https://sites.google.com/asu.edu/sas-llm/
7. Instant Policy: In-Context Imitation Learning via Graph Diffusion — The robot learns several novel tasks instantly, after just ONE demonstration each. https://www.robot-learning.uk/instant-policy
8. Hugging Face releases a 3D-printed robotic arm starting at $100 https://techcrunch.com/2025/04/28/hugging-face-releases-a-3d-printed-robotic-arm-starting-at-100/
9. SplitReason: Learning To Offload Reasoning https://arxiv.org/abs/2504.16379
10. Scaling Laws For Scalable Oversight https://arxiv.org/abs/2504.18530
11. MAGI: Multi-Agent Guided Interview for Psychiatric Assessment https://arxiv.org/abs/2504.18260
12. Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations — Instead of hoping a model explains things right once, they validate explanations by recursively asking about their own outputs - until contradictions are exposed or resolved. [Published in 2022] https://arxiv.org/abs/2205.11822
13. CoRT (Chain of Recursive Thoughts) https://github.com/PhialsBasement/Chain-of-Recursive-Thoughts
27.04.202514:21
Links for 2025-04-27 [Part 2]
Miscellaneous
1. Scott Aaronson | How Much Math Is Knowable? https://www.youtube.com/watch?v=VplMHWSZf5c
2. Why Rome Actually Fell: Plagues, Slavery, & Ice Age – Kyle Harper https://www.youtube.com/watch?v=QFzgSmN8Ng8
3. So Long, And No Thanks for the Externalities: The Rational Rejection of Security Advice by Users (published in 2009) [PDF] https://gwern.net/doc/cs/security/2009-herley.pdf
4. “Scratching is often pleasurable, which suggests that, in order to have evolved, this behavior must provide some kind of benefit”—study finds evidence that scratching can also provide a defence against bacterial skin infections https://www.popsci.com/health/is-scratching-rashes-bad/
Politics
1. South Korea faces a demographic change unlike anything we have seen before. Birth rates dropped, the population ages rapidly, and cities are beginning to empty. This video isn't about numbers. It's about a disappearing future. https://www.youtube.com/watch?v=Ufmu1WD2TSk
2. Hungary’s monthly fertility rate fell to 1.25 in March 2025, 0.1 lower than a year earlier and more than 0.2 lower than two years ago. https://x.com/TothGCsaba/status/1916037291497209914
3. British scientists will soon get a green light to begin research into solar geoengineering, dimming the sun to slow global warming. https://www.thetimes.com/uk/environment/article/uk-experiments-dim-sun-global-warming-fss9l5cw5 [no paywall: https://archive.is/lCw2G]
Miscellaneous
1. Scott Aaronson | How Much Math Is Knowable? https://www.youtube.com/watch?v=VplMHWSZf5c
2. Why Rome Actually Fell: Plagues, Slavery, & Ice Age – Kyle Harper https://www.youtube.com/watch?v=QFzgSmN8Ng8
3. So Long, And No Thanks for the Externalities: The Rational Rejection of Security Advice by Users (published in 2009) [PDF] https://gwern.net/doc/cs/security/2009-herley.pdf
4. “Scratching is often pleasurable, which suggests that, in order to have evolved, this behavior must provide some kind of benefit”—study finds evidence that scratching can also provide a defence against bacterial skin infections https://www.popsci.com/health/is-scratching-rashes-bad/
Politics
1. South Korea faces a demographic change unlike anything we have seen before. Birth rates dropped, the population ages rapidly, and cities are beginning to empty. This video isn't about numbers. It's about a disappearing future. https://www.youtube.com/watch?v=Ufmu1WD2TSk
2. Hungary’s monthly fertility rate fell to 1.25 in March 2025, 0.1 lower than a year earlier and more than 0.2 lower than two years ago. https://x.com/TothGCsaba/status/1916037291497209914
3. British scientists will soon get a green light to begin research into solar geoengineering, dimming the sun to slow global warming. https://www.thetimes.com/uk/environment/article/uk-experiments-dim-sun-global-warming-fss9l5cw5 [no paywall: https://archive.is/lCw2G]
23.04.202519:26
Links for 2025-04-23
AI
1. OpenAI’s o3 now outperforms 94% of expert virologists. https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills
2. TTRL: Test-Time Reinforcement Learning — A novel method for training LLMs using RL on *unlabeled* data by utilizing the priors in the pre-trained models. https://github.com/PRIME-RL/TTRL
3. Learning Adaptive Parallel Reasoning with Language Models https://arxiv.org/abs/2504.15466
4. Dynamic Early Exit in Reasoning Models https://arxiv.org/abs/2504.15895
5. RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning https://ragen-ai.github.io/
6. Anthropic believes that fully AI employees will soon start working in companies https://www.axios.com/2025/04/22/ai-anthropic-virtual-employees-security
7. Analyzing o3 and o4-mini with ARC-AGI. o4-mini: o1-pro-level performance, 220 × cheaper https://arcprize.org/blog/analyzing-o3-with-arc-agi
8. A father’s quest for diagnosis inspired a disruptive AI solution https://news.microsoft.com/source/emea/features/a-fathers-quest-for-diagnosis-inspired-a-disruptive-ai-solution/
9. “Our results suggest that models are rapidly improving, and the best frontier models are held back by only a few key subcapabilities.” [PDF] https://cdn.prod.website-files.com/663bd486c5e4c81588db7a1d/6807879ce7b1b5f5163f4a32_RepliBenchPaper.pdf
10. 2 Big Questions for AI Progress in 2025-2026 https://helentoner.substack.com/p/2-big-questions-for-ai-progress-in
11. All AI datacenters are vulnerable to Chinese espionage, new report says https://time.com/7279123/ai-datacenter-superintelligence-china-trump-report/
12. AI supercomputers double in performance every 9 months, cost billions of dollars, and require as much power as mid-sized cities. Companies now own 80% of all AI supercomputers, while governments’ share has declined. https://epoch.ai/blog/trends-in-ai-supercomputers
13. Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games https://www.lesswrong.com/posts/M6dXdCbdoLSpHt8v3/corrupted-by-reasoning-reasoning-language-models-become-free
14. LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities https://arxiv.org/abs/2504.16078
Miscellaneous
1. Interpreting the retinal neural code for natural scenes: From computations to neurons https://www.cell.com/neuron/fulltext/S0896-6273(23)00467-1
2. Towards circuit mechanisms of the creative process: Describing the functions, mechanisms and neural correlates of creativity https://osf.io/preprints/psyarxiv/mpbgy_v1
3. How prediction markets create harmful outcomes: a case study https://bobjacobs.substack.com/p/how-prediction-markets-can-create
AI
1. OpenAI’s o3 now outperforms 94% of expert virologists. https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills
2. TTRL: Test-Time Reinforcement Learning — A novel method for training LLMs using RL on *unlabeled* data by utilizing the priors in the pre-trained models. https://github.com/PRIME-RL/TTRL
3. Learning Adaptive Parallel Reasoning with Language Models https://arxiv.org/abs/2504.15466
4. Dynamic Early Exit in Reasoning Models https://arxiv.org/abs/2504.15895
5. RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning https://ragen-ai.github.io/
6. Anthropic believes that fully AI employees will soon start working in companies https://www.axios.com/2025/04/22/ai-anthropic-virtual-employees-security
7. Analyzing o3 and o4-mini with ARC-AGI. o4-mini: o1-pro-level performance, 220 × cheaper https://arcprize.org/blog/analyzing-o3-with-arc-agi
8. A father’s quest for diagnosis inspired a disruptive AI solution https://news.microsoft.com/source/emea/features/a-fathers-quest-for-diagnosis-inspired-a-disruptive-ai-solution/
9. “Our results suggest that models are rapidly improving, and the best frontier models are held back by only a few key subcapabilities.” [PDF] https://cdn.prod.website-files.com/663bd486c5e4c81588db7a1d/6807879ce7b1b5f5163f4a32_RepliBenchPaper.pdf
10. 2 Big Questions for AI Progress in 2025-2026 https://helentoner.substack.com/p/2-big-questions-for-ai-progress-in
11. All AI datacenters are vulnerable to Chinese espionage, new report says https://time.com/7279123/ai-datacenter-superintelligence-china-trump-report/
12. AI supercomputers double in performance every 9 months, cost billions of dollars, and require as much power as mid-sized cities. Companies now own 80% of all AI supercomputers, while governments’ share has declined. https://epoch.ai/blog/trends-in-ai-supercomputers
13. Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games https://www.lesswrong.com/posts/M6dXdCbdoLSpHt8v3/corrupted-by-reasoning-reasoning-language-models-become-free
14. LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities https://arxiv.org/abs/2504.16078
Miscellaneous
1. Interpreting the retinal neural code for natural scenes: From computations to neurons https://www.cell.com/neuron/fulltext/S0896-6273(23)00467-1
2. Towards circuit mechanisms of the creative process: Describing the functions, mechanisms and neural correlates of creativity https://osf.io/preprints/psyarxiv/mpbgy_v1
3. How prediction markets create harmful outcomes: a case study https://bobjacobs.substack.com/p/how-prediction-markets-can-create
23.04.202512:55
Nvidia open-sourced Describe Anything! It can generate detailed descriptions for user-specified regions in images and videos, marked by points, boxes, scribbles, or masks.
Project page: https://describe-anything.github.io/
Project page: https://describe-anything.github.io/
05.05.202515:25
Here's an interesting quote from George Simion, Romania's far-right presidential candidate:
He wants a "strong Romanian army inside NATO." However, he no longer wants to support Ukraine because "the war is not going anywhere."
These are inconsistent positions.
If Russia poses the greatest threat to your country, you should be glad that the war isn't going anywhere, e.g., to Romania, until you can create a strong army. And even once you do, it's much better to keep the resources of your biggest enemy tied in another unaligned country than having to fight it yourself.
Source of the quote: https://nickthorpe.substack.com/p/i-am-young-and-restless
Russia is the biggest danger towards Romania, Poland and the Baltic states.
He wants a "strong Romanian army inside NATO." However, he no longer wants to support Ukraine because "the war is not going anywhere."
These are inconsistent positions.
If Russia poses the greatest threat to your country, you should be glad that the war isn't going anywhere, e.g., to Romania, until you can create a strong army. And even once you do, it's much better to keep the resources of your biggest enemy tied in another unaligned country than having to fight it yourself.
Source of the quote: https://nickthorpe.substack.com/p/i-am-young-and-restless
02.05.202516:50
Links for 2025-05-02 [Part 2]
AI
19. Waymo, Toyota strike partnership to bring self-driving tech to personal vehicles https://www.cnbc.com/2025/04/29/waymo-toyota-partner-to-bring-self-driving-tech-to-personal-vehicles-.html
20. “An employee at Elon Musk’s artificial intelligence company xAI leaked a private key on GitHub that for the past two months could have allowed anyone to query private xAI large language models (LLMs) which appear to have been custom made for working with internal data from Musk’s companies, including SpaceX, Tesla and Twitter/X” https://krebsonsecurity.com/2025/05/xai-dev-leaks-api-key-for-private-spacex-tesla-llms/
21. Why the AI Revolution Won’t Look Like You Expect—And Why That’s More Dangerous https://www.youtube.com/watch?v=NMwjqqtU5Dw
22. OpenAI: "We’ve spent the last few days doing a deep dive on what went wrong with last week’s GPT-4o update in ChatGPT. Expanding on what we missed with sycophancy and the changes we’re going to make in the future" https://openai.com/index/expanding-on-sycophancy/
Compute
1. Scott Aaronson: “Grant Sanderson, of 3blue1brown, has put up a phenomenal YouTube video explaining Grover’s algorithm, and dispelling the fundamental misconception about quantum computing, that QC works simply by “trying all the possibilities in parallel.” Let me not futz around: this video explains, in 36 minutes, what I’ve tried to explain over and over on this blog for 20 years … and it does it better. It’s a masterpiece.” https://www.youtube.com/watch?v=RQWpF2Gb-gU
2. Penn engineers have developed the first photonic chip that reshapes how light behaves to carry out the nonlinear mathematics at the heart of modern AI while reducing energy use. https://penntoday.upenn.edu/news/penn-engineers-first-train-ai-lightspeed
3. An Interview with Dan Kim and Hassan Khan About CHIPS https://stratechery.com/2025/an-interview-with-dan-kim-and-hassan-khan-about-chips/
4. Short video of America’s largest data center. https://www.youtube.com/watch?v=fUiI03X6DQc
AI
19. Waymo, Toyota strike partnership to bring self-driving tech to personal vehicles https://www.cnbc.com/2025/04/29/waymo-toyota-partner-to-bring-self-driving-tech-to-personal-vehicles-.html
20. “An employee at Elon Musk’s artificial intelligence company xAI leaked a private key on GitHub that for the past two months could have allowed anyone to query private xAI large language models (LLMs) which appear to have been custom made for working with internal data from Musk’s companies, including SpaceX, Tesla and Twitter/X” https://krebsonsecurity.com/2025/05/xai-dev-leaks-api-key-for-private-spacex-tesla-llms/
21. Why the AI Revolution Won’t Look Like You Expect—And Why That’s More Dangerous https://www.youtube.com/watch?v=NMwjqqtU5Dw
22. OpenAI: "We’ve spent the last few days doing a deep dive on what went wrong with last week’s GPT-4o update in ChatGPT. Expanding on what we missed with sycophancy and the changes we’re going to make in the future" https://openai.com/index/expanding-on-sycophancy/
Compute
1. Scott Aaronson: “Grant Sanderson, of 3blue1brown, has put up a phenomenal YouTube video explaining Grover’s algorithm, and dispelling the fundamental misconception about quantum computing, that QC works simply by “trying all the possibilities in parallel.” Let me not futz around: this video explains, in 36 minutes, what I’ve tried to explain over and over on this blog for 20 years … and it does it better. It’s a masterpiece.” https://www.youtube.com/watch?v=RQWpF2Gb-gU
2. Penn engineers have developed the first photonic chip that reshapes how light behaves to carry out the nonlinear mathematics at the heart of modern AI while reducing energy use. https://penntoday.upenn.edu/news/penn-engineers-first-train-ai-lightspeed
3. An Interview with Dan Kim and Hassan Khan About CHIPS https://stratechery.com/2025/an-interview-with-dan-kim-and-hassan-khan-about-chips/
4. Short video of America’s largest data center. https://www.youtube.com/watch?v=fUiI03X6DQc
29.04.202519:41
Dynamism v1 (DYNA-1) Model: A Breakthrough in Performance and Production—Ready Embodied AI
The first robot foundation model built for round-the-clock, high-throughput dexterous autonomy.
Read more: https://www.dyna.co/research
The first robot foundation model built for round-the-clock, high-throughput dexterous autonomy.
Here is a time-lapse video of our model autonomously folding 850+ napkins in a span of 24 hours with
• 99.4% success rate — zero human intervention
• 60% human throughput speed
• 4.3/5 quality ratings (set by the client)
Read more: https://www.dyna.co/research
27.04.202514:21
Links for 2025-04-27 [Part 1]
AI
1. That METR Study Doesn’t Say "AGI in 5 Years" https://amistrongeryet.substack.com/p/measuring-ai-progress
2. The case for multi-decade AI timelines https://epochai.substack.com/p/the-case-for-multi-decade-ai-timelines
3. Dario Amodei reiterates and emphasizes that he believes we will probably have a version of AGI in 2026 or 2027. https://www.darioamodei.com/post/the-urgency-of-interpretability
4. New CNAS Report on the World-Altering Stakes of U.S.-China AI Competition https://www.cnas.org/press/press-release/new-cnas-report-on-the-world-altering-stakes-of-u-s-china-ai-competition
5. China’s Huawei Develops New AI Chip, Seeking to Match Nvidia https://www.wsj.com/tech/chinas-huawei-develops-new-ai-chip-seeking-to-match-nvidia-8166f606 [no paywall: https://archive.is/DUb3O]
6. Robotic system zeroes in on objects most relevant for helping humans https://news.mit.edu/2025/robotic-system-zeroes-objects-most-relevant-helping-humans-0424
7. I-Con: A Unifying Framework for Representation Learning https://arxiv.org/abs/2504.16929
8. o3 Is a Lying Liar https://www.lesswrong.com/posts/KgPkoopnmmaaGt3ka/o3-is-a-lying-liar
9. The o3 Era Begins https://www.lesswrong.com/posts/7x9MZCmoFA2FtBtmG/ai-113-the-o3-era-begins
10. Personal evaluation of LLMs, through chess: “Claude 3.7 Sonnet and o3 were the only models able to complete a game without hallucination.” https://www.lesswrong.com/posts/gNFixvxw7JxzvMjCJ/personal-evaluation-of-llms-through-chess
11. Reinforcement learning tests chips for errors that only show up in huge data centers https://spectrum.ieee.org/data-centers
12. Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning https://arxiv.org/abs/2504.17192
13. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs https://arxiv.org/abs/2504.15965
14. ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting https://arxiv.org/abs/2504.15921
15. OpenAI boosts ChatGPT with sourced news from 160+ outlets, including The Washington Post. https://openai.com/global-affairs/the-washington-post-partners-with-openai/
16. OpenAI Alums, Nobel Laureates Urge Regulators to Save Company's Nonprofit Structure https://www.lesswrong.com/posts/rN8tHAJnRYgN7hfoF/openai-alums-nobel-laureates-urge-regulators-to-save-company
17. ZAPBench, a whole-brain activity dataset and benchmark with single cell resolution on the larval zebrafish to enable the development and comparison of more accurate brain activity models. https://research.google/blog/improving-brain-models-with-zapbench/
18. Who's Working On It? AI-Controlled Experiments https://sarahconstantin.substack.com/p/whos-working-on-it-ai-controlled
19. Argonne National Lab has an AI-based tool that can help design and operate nuclear reactors—at a time when AI itself is feeding a power frenzy. https://www.wsj.com/articles/nuclear-power-is-back-and-this-time-ai-can-help-manage-the-reactors-5ce03ae7 [no paywall: https://archive.is/XSZcH]
20. As AI models become more complex and more capable, is it possible that they’ll have experiences of their own? It’s an open question. Anthropic recently started a research program to investigate it. https://www.anthropic.com/research/exploring-model-welfare
21. Consciousness, Reasoning and the Philosophy of AI with Murray Shanahan https://www.youtube.com/watch?v=v1Py_hWcmkU
22. Intrinsically aligned superintelligence via “wise world models” using 4 axiomatic principles: mindfulness | emptiness | non-duality | boundless care https://arxiv.org/abs/2504.15125
23. “By default, AI will replace all of us in the economy, creating a new social contract where powerful actors won't have to care about regular people.” https://www.lesswrong.com/posts/LCFgLY3EWb3Gqqxyi/the-intelligence-curse-an-essay-series
AI
1. That METR Study Doesn’t Say "AGI in 5 Years" https://amistrongeryet.substack.com/p/measuring-ai-progress
2. The case for multi-decade AI timelines https://epochai.substack.com/p/the-case-for-multi-decade-ai-timelines
3. Dario Amodei reiterates and emphasizes that he believes we will probably have a version of AGI in 2026 or 2027. https://www.darioamodei.com/post/the-urgency-of-interpretability
4. New CNAS Report on the World-Altering Stakes of U.S.-China AI Competition https://www.cnas.org/press/press-release/new-cnas-report-on-the-world-altering-stakes-of-u-s-china-ai-competition
5. China’s Huawei Develops New AI Chip, Seeking to Match Nvidia https://www.wsj.com/tech/chinas-huawei-develops-new-ai-chip-seeking-to-match-nvidia-8166f606 [no paywall: https://archive.is/DUb3O]
6. Robotic system zeroes in on objects most relevant for helping humans https://news.mit.edu/2025/robotic-system-zeroes-objects-most-relevant-helping-humans-0424
7. I-Con: A Unifying Framework for Representation Learning https://arxiv.org/abs/2504.16929
8. o3 Is a Lying Liar https://www.lesswrong.com/posts/KgPkoopnmmaaGt3ka/o3-is-a-lying-liar
9. The o3 Era Begins https://www.lesswrong.com/posts/7x9MZCmoFA2FtBtmG/ai-113-the-o3-era-begins
10. Personal evaluation of LLMs, through chess: “Claude 3.7 Sonnet and o3 were the only models able to complete a game without hallucination.” https://www.lesswrong.com/posts/gNFixvxw7JxzvMjCJ/personal-evaluation-of-llms-through-chess
11. Reinforcement learning tests chips for errors that only show up in huge data centers https://spectrum.ieee.org/data-centers
12. Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning https://arxiv.org/abs/2504.17192
13. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs https://arxiv.org/abs/2504.15965
14. ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting https://arxiv.org/abs/2504.15921
15. OpenAI boosts ChatGPT with sourced news from 160+ outlets, including The Washington Post. https://openai.com/global-affairs/the-washington-post-partners-with-openai/
16. OpenAI Alums, Nobel Laureates Urge Regulators to Save Company's Nonprofit Structure https://www.lesswrong.com/posts/rN8tHAJnRYgN7hfoF/openai-alums-nobel-laureates-urge-regulators-to-save-company
17. ZAPBench, a whole-brain activity dataset and benchmark with single cell resolution on the larval zebrafish to enable the development and comparison of more accurate brain activity models. https://research.google/blog/improving-brain-models-with-zapbench/
18. Who's Working On It? AI-Controlled Experiments https://sarahconstantin.substack.com/p/whos-working-on-it-ai-controlled
19. Argonne National Lab has an AI-based tool that can help design and operate nuclear reactors—at a time when AI itself is feeding a power frenzy. https://www.wsj.com/articles/nuclear-power-is-back-and-this-time-ai-can-help-manage-the-reactors-5ce03ae7 [no paywall: https://archive.is/XSZcH]
20. As AI models become more complex and more capable, is it possible that they’ll have experiences of their own? It’s an open question. Anthropic recently started a research program to investigate it. https://www.anthropic.com/research/exploring-model-welfare
21. Consciousness, Reasoning and the Philosophy of AI with Murray Shanahan https://www.youtube.com/watch?v=v1Py_hWcmkU
22. Intrinsically aligned superintelligence via “wise world models” using 4 axiomatic principles: mindfulness | emptiness | non-duality | boundless care https://arxiv.org/abs/2504.15125
23. “By default, AI will replace all of us in the economy, creating a new social contract where powerful actors won't have to care about regular people.” https://www.lesswrong.com/posts/LCFgLY3EWb3Gqqxyi/the-intelligence-curse-an-essay-series


23.04.202516:34
"You Don't Have the Cards."


23.04.202512:51
Dia: SOTA open-weights TTS model for ultra-realistic dialogue.
GitHub: https://github.com/nari-labs/dia/
Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.
GitHub: https://github.com/nari-labs/dia/


05.05.202509:48
The Ultimate LLM Meta-Leaderboard averaged across the 28 best benchmarks
Gemini 2.5 Pro > o3 > Sonnet 3.7 Thinking
Compiled by https://x.com/scaling01/status/1919217718420508782
01.05.202514:28
Phi-4-Reasoning-Plus: Small Model, Big Reasoning Power
Highlights:
- Punches far above its size: Phi-4-Reasoning-Plus is a 14-billion parameter open-weights model that outperforms or matches much larger open (DeepSeek-R1-Distill-70B, QwQ-32B) and several closed models (o1-mini, Claude-Sonnet-3.7) on AIME 2025, HMMT, OmniMath, GPQA and LiveCodeBench; approaches 671 B-param DeepSeek-R1 on math.
- Open and Accessible: Released under a permissive MIT license, allowing broad commercial and research use.
- Laptop-friendly, transparent outputs: Runs on a beefy laptop GPU; separates chain-of-thought in <think>…</think> tags from the final answer for cleaner inspection and evaluation.
- Better Than the Teacher: In some cases, like the AIME 2025 benchmark (with parallel compute) and OmniMath, the model surpassed the performance of its teacher model (o3-mini), indicating a capacity for self-improvement beyond initial training signals.
- Reasoning as a transferable meta-skill: Despite zero explicit training, it solves TSP, 3-SAT, maze routing and calendar-planning tasks, backing the claim that “reasoning generalises.”
- Data-centric recipe: Supervised-fine-tuned on ~1.4 M carefully filtered reasoning demos (≈16 B tokens) generated by o3-mini, then polished with just 6 k math problems via GRPO RL to “lock-in” its reasoning style.
- RL that actually helps: A single short RL run raises AIME/HMMT accuracy by ≈10 pp and extends explanations by ≈50 %, showing RL can sharpen thought without huge compute.
Read more:
Paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2025/04/phi_4_reasoning.pdf
Microsoft press release: https://azure.microsoft.com/en-us/blog/one-year-of-phi-small-language-models-making-big-leaps-in-ai/
Press: https://venturebeat.com/ai/microsoft-launches-phi-4-reasoning-plus-a-small-powerful-open-weights-reasoning-model/
Highlights:
- Punches far above its size: Phi-4-Reasoning-Plus is a 14-billion parameter open-weights model that outperforms or matches much larger open (DeepSeek-R1-Distill-70B, QwQ-32B) and several closed models (o1-mini, Claude-Sonnet-3.7) on AIME 2025, HMMT, OmniMath, GPQA and LiveCodeBench; approaches 671 B-param DeepSeek-R1 on math.
- Open and Accessible: Released under a permissive MIT license, allowing broad commercial and research use.
- Laptop-friendly, transparent outputs: Runs on a beefy laptop GPU; separates chain-of-thought in <think>…</think> tags from the final answer for cleaner inspection and evaluation.
- Better Than the Teacher: In some cases, like the AIME 2025 benchmark (with parallel compute) and OmniMath, the model surpassed the performance of its teacher model (o3-mini), indicating a capacity for self-improvement beyond initial training signals.
- Reasoning as a transferable meta-skill: Despite zero explicit training, it solves TSP, 3-SAT, maze routing and calendar-planning tasks, backing the claim that “reasoning generalises.”
- Data-centric recipe: Supervised-fine-tuned on ~1.4 M carefully filtered reasoning demos (≈16 B tokens) generated by o3-mini, then polished with just 6 k math problems via GRPO RL to “lock-in” its reasoning style.
- RL that actually helps: A single short RL run raises AIME/HMMT accuracy by ≈10 pp and extends explanations by ≈50 %, showing RL can sharpen thought without huge compute.
Read more:
Paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2025/04/phi_4_reasoning.pdf
Microsoft press release: https://azure.microsoft.com/en-us/blog/one-year-of-phi-small-language-models-making-big-leaps-in-ai/
Press: https://venturebeat.com/ai/microsoft-launches-phi-4-reasoning-plus-a-small-powerful-open-weights-reasoning-model/
29.04.202507:56
Watch Gemini 2.5 Pro implement a landmark Google DeepMind research paper. 🕹
It codes the reinforcement learning algorithm, visualizes the training live and even debugs errors.
It codes the reinforcement learning algorithm, visualizes the training live and even debugs errors.


24.04.202519:13
Four-star General Cavoli says Ukraine mastered Patriots fast—and the U.S. is now learning from them.


23.04.202515:53
Of course, these models still perform poorly in research-level mathematics, and such results are the exception. But that will soon change. These are just the first sparks (move 37) of the realm beyond.
https://x.com/bremen79/status/1914738549276205241
https://x.com/bremen79/status/1914738549276205241


23.04.202512:01
Stock market return in India 🇮🇳 vs in China 🇨🇳. From this graph, you'd think that India is the most promising emerging industrial power while China is an agricultural basket case.
Shown 1 - 24 of 369
Log in to unlock more functionality.