Data & AIQuickie15min
An LLM Walks into General Relativity
This talk presents an experiment where an LLM generates a full presentation on General Relativity. The output is fluent but scientifically flawed, revealing how AI excels in structure yet fails in physics reasoning. Using this case, the talk explores validation methods to ensure reliability in AI‑generated technical content.
Tasos NikolaouUp Hellas
talkDetail.whenAndWhere
Friday, April 24, 12:45-13:00
MC 2
talks.roomOccupancytalks.noOccupancyInfo
Large Language Models are increasingly used to generate technical content: documentation, reports, and even conference presentations. The results are often fluent, confident, and well-structure, which makes their mistakes harder to spot.
In this talk, we run a simple experiment, an LLM is asked to generate an entire presentation on General Relativity, covering gravitational time dilation, gravitational waves, and black holes, using real scientific sources. The output looks convincing. It has equations, misconceptions, and citations. And yet, several explanations are subtly but fundamentally wrong.
General Relativity is an unforgiving domain. Concepts that sound intuitive, like “light slows down in gravity”, “gravitational waves are ripples in space”, “black holes suck everything in”, fail as soon as you frame them in terms of measurements, observables, and invariants. This makes physics an ideal stress test for AI-generated explanations.
Using the generated slides as a case study, we show:
In this talk, we run a simple experiment, an LLM is asked to generate an entire presentation on General Relativity, covering gravitational time dilation, gravitational waves, and black holes, using real scientific sources. The output looks convincing. It has equations, misconceptions, and citations. And yet, several explanations are subtly but fundamentally wrong.
General Relativity is an unforgiving domain. Concepts that sound intuitive, like “light slows down in gravity”, “gravitational waves are ripples in space”, “black holes suck everything in”, fail as soon as you frame them in terms of measurements, observables, and invariants. This makes physics an ideal stress test for AI-generated explanations.
Using the generated slides as a case study, we show:
- where LLMs consistently succeed (structure, narrative, pedagogy),
- where they fail (measurement-based reasoning and physical constraints),
- and how to design agent pipelines that combine AI generation with deterministic validation and human review.
talkDetail.shareFeedback
talkDetail.feedbackNotYetAvailable
talkDetail.feedbackAvailableAfterStart
talkDetail.signInRequired
talkDetail.signInToFeedbackDescription
occupancy.title
occupancy.votingNotYetAvailable
occupancy.votingAvailableBeforeStart
talkDetail.signInRequired
occupancy.signInToVoteDescription
comments.speakerNotEnabledComments