The White Marble Lie

Everything You Think Ancient Rome Looked Like Is Wrong – And AI Makes It Worse

An inaccurate portrayal of ancient Rome with modern lighting and other modern anachronisms

By L. M. Hawkes · HawkesAdventures.com

In the first article in this series, I introduced the Katana Problem – AI’s tendency to arm Roman legionaries with weapons from feudal Japan – and argued that AI historical failures are not random. They are patterned, predictable, and systematic.

This article is the full taxonomy.

Across 1,106 AI-generated images audited for historical accuracy, I documented nine distinct failure categories. Some are immediately jarring. Some are subtle enough to slip past casual inspection. All of them are consistent enough to constitute a pattern, and all of them are actively noticed and documented by the Roman history community.

Understanding them matters whether you’re generating images, consuming them, or building anything that depends on historical credibility.

Failure 1: The White Marble Default

This is the single most-cited error in the Roman accuracy community, and it is baked so deeply into AI training data that it requires aggressive, specific prompting to suppress.

AI generates Rome as though every surface is white marble. Temples, forums, baths, private houses – all rendered in gleaming, pristine white stone. It looks authoritative. It looks classical. It is historically wrong.

The actual primary Roman building materials were brick, concrete (opus caementicium), and tufa. Marble facing was applied to specific, high-status surfaces – not plastered universally across an entire civilization. The all-white-marble aesthetic comes from how Roman ruins look today, after seventeen centuries of weathering have stripped away the color, the painted plaster, the decorative cladding, and the surface treatments that covered the stone underneath.

That is what ruins look like. It is not what inhabited Roman buildings looked like.

Real Roman urban environments were brick-red and earth-toned, smoke-darkened near cooking fires and lamps, worn and patched and layered with prior construction. Prompting specifically for brick and stone construction – and explicitly excluding “white marble” as a descriptor – produces dramatically more accurate and, frankly, more interesting results.

Failure 2: Architecture That Belongs to a Different Century

The pointed Gothic arch is arguably the most frequent architectural anachronism in AI-generated Roman imagery, and it is persistent enough to appear even when prompts explicitly specify Roman construction.

The Roman arch is round. The Gothic arch is pointed. They are separated by roughly 800 years of architectural history. AI conflates them constantly, because the training data contains far more Gothic and Medieval European architecture than Roman.

But the Gothic arch is only the most visible symptom of a broader architectural contamination. Also documented in community analysis:

Background buildings matching the Monument to Victor Emmanuel II – built in 1911
Colonnades and piazzas copying the layout of St. Peter’s Square, which is Renaissance and Baroque, not Roman
The Baths of Caracalla depicted in Baroque interior style, a movement that began 1,300 years after the baths were built
Medieval fortification elements – battlements, drawbridge-style gates, crenellated walls – appearing in Imperial-era settings
Renaissance facade treatments on buildings that should predate the Renaissance by fifteen centuries

A University of Bordeaux art history professor formally documented several of these failures in widely-shared AI Rome videos in December 2025. This is not academic nitpicking. It is a documented, public, growing body of evidence that AI-generated historical imagery systematically misrepresents its subject.

Failure 3: The Victorian Interior Problem

This one surprised me more than the others, because it is not an obvious failure category until you have seen enough examples to recognize the pattern.

Roman interior scenes frequently generate with Victorian gas-bracket lanterns – the kind with glass panels and decorative metalwork that belong in a 19th century London townhouse, not a 1st century Roman domus. Alongside the lanterns: visible pipe runs and conduit along walls. Electrical lighting quality and directionality in what should be torch-lit or oil-lamp-lit spaces.

The model has a “dramatic interior” template that pulls from 19th century references rather than ancient ones. Victorian interior design generated an enormous volume of illustration and photography that dominates the training data’s representation of “atmospheric, candlelit interiors.” The model reaches for what it knows.

The correct Roman lighting source is an oil lamp – a simple ceramic or bronze vessel with a wick, producing warm, directional, flickering light. Torches. Braziers. Nothing with glass panels. Nothing with pipe runs. Prompting for these specifically, with explicit exclusions of glass-enclosed fixtures and any visible conduit, produces dramatically better results.

Failure 4: The MMA Fighter Problem

AI renders gladiators as lean, muscular modern athletes. This is historically wrong in a way that is genuinely interesting once you understand the reason.

Historical gladiators deliberately carried significant body fat. This was not poor conditioning – it was tactical. A layer of subcutaneous fat over vital organs meant that sword wounds were more survivable. A cut that might kill a lean fighter would wound but not kill a heavier one. Gladiators were valuable investments for their lanistas. Keeping them alive was good economics.

The historically accurate gladiator looks more like a heavyweight wrestler than an MMA fighter. AI defaults to the lean, defined athletic physique because that is what “warrior” and “fighter” look like in the training data – in action films, in video games, in fantasy illustration. The historical reality runs counter to the aesthetic default.

Prompting explicitly for period-accurate body composition – substantial build, visible body fat over muscle – produces more historically accurate results and, frankly, more unusual and distinctive imagery than the default athletic figure.

Failure 5: The Equipment Mixing Problem

Roman gladiator types were distinct, formally categorized, and immediately recognizable to anyone familiar with the institution. A Retiarius fought with a net and trident and wore minimal armor. A Secutor wore a smooth, close-fitting helmet specifically designed to give the net nothing to catch. A Murmillo carried a large rectangular scutum and a helmet with a fish-crest.

AI mixes these freely and without apparent awareness that the categories exist.

A figure carrying a trident and net wearing a Secutor helmet. A Murmillo shield on the wrong body type. Helmet, weapon, and shield combinations that never existed in documented Roman practice. The enthusiast community on r/ancientrome and dedicated gladiator history forums detects these immediately – they are the visual equivalent of putting cavalry insignia on an infantry soldier. The errors are specific, the community is knowledgeable, and the failure to get this right reads as a lack of seriousness about the subject.

Specifying the gladiator type explicitly and naming the correct equipment set in the prompt reduces this significantly.

Failure 6: Wrong Materials Throughout

AI defaults to materials that look impressive rather than materials that are historically correct.

Beyond the white marble problem, documented failures include:

Metal gates and grating that look too manufactured – machined-looking components where everything would have been hand-forged
Window glass that is flat and transparent – glass panes of that quality did not exist in the Roman period; Roman glass was thick, cloudy, imperfect, and used sparingly
Fabric that looks machine-woven rather than hand-loomed – togas and tunics with the wrong weave texture and wrong drape
Gilding in inappropriate contexts – decoration suited to dry environments appearing in scenes that would have been humid
Chrome and polished-metal finishes that could not have been produced with Roman metallurgical technology

The underlying failure is the same as the white marble problem: AI reaches for the most visually impressive version of a material rather than the historically accurate one.

Failure 7: Civilization Drift

A subtler failure, and arguably the most corrosive to historical credibility: AI’s training data contains far more imagery of ancient civilizations generally than ancient Rome specifically.

The result is images that are tonally “ancient” but not identifiably Roman. Greek architectural orders appear where Roman construction would be used. Egyptian visual vocabulary bleeds into crowd scenes. The image looks and feels historical, but it could be Rome, Athens, Persia, or a fantasy hybrid of all of them simultaneously.

One highly-upvoted comment in the AI art community put it plainly: the model shows what people think Rome looked like, not what it actually looked like. The training data reflects centuries of accumulated popular imagination about antiquity – not Roman-specific visual evidence.

The fix is asserting Roman specificity explicitly and repeatedly in the prompt – specifying construction materials, architectural forms, and social context that are Roman and not merely ancient. Vague period references produce generic ancient imagery. Specific Roman references produce Roman imagery.

Roman visual culture was intensely hierarchical. A senator, an equestrian, a freedman, a slave, and a street merchant looked visibly different from each other – in garment quality, cleanliness, ornamentation, and bearing. The social structure of Roman life was legible on the body.

AI collapses this into one undifferentiated “Roman citizen” aesthetic. Everyone wears roughly similar clothing at roughly similar levels of cleanliness. The enormous diversity of Roman society – freedmen, slaves, provincial subjects, equestrians, senators, soldiers, merchants – is flattened into a single visual register.

A related failure: AI will sometimes dress a figure in fine embroidered garments and jewelry while simultaneously rendering them dirty and disheveled. These status signals are contradictory. A figure in senatorial dress should look like they have servants. Specifying role, social status, and the garment quality appropriate to that status – separately and explicitly – produces more coherent results.

Failure 9: Too Clean, Too New

This is the failure that cuts across every other category. AI generates Rome as if it were freshly built.

Pristine stone with no weathering. Armor with no wear. Textiles with no staining or repair. Environments that look like museum reconstructions rather than inhabited places.

Real ancient cities were worn, smoke-darkened, patched, and layered with construction from multiple eras. Surfaces that had been in use for decades or centuries showed it. The lived-in quality of a real Roman street or interior – the accumulated grime near cooking fires, the scuffed leather, the patina on bronze, the weathered wood – is almost entirely absent from default AI output.

Prompting specifically for aged, worn, authentic surfaces – and explicitly rejecting “pristine,” “new,” and “spotless” as descriptors – produces imagery that reads as genuinely inhabited rather than staged.

What This Adds Up To

Nine failure categories. All of them documented in active community discussion. All of them systematic. All of them correctable.

The community of people who notice these failures is not small. It is active on Reddit, in history forums, in tabletop RPG design communities, and in academic circles. The frustration with AI historical imagery that looks right but isn’t is a live, growing, monetizable gap.

Closing that gap requires more than better prompting. It requires a structured workflow – a correction loop that turns documented failures into prompt intelligence and applies that intelligence systematically across a large catalog.

That workflow is the subject of the next article.

L. M. Hawkes writes cinematic, historically grounded interactive gamebooks drawing from the warrior traditions of Rome, Greece, Japan, the Viking Age, and the great battles of antiquity. The Vault of Ages Art Pack Configurator – a curated catalogue of historically accurate cinematic illustration – is available at HawkesAdventures.com under personal and commercial licenses.

This is Part 2 of a 6-part series.

Previously, Part 1: AI Keeps Putting Katanas in Ancient Rome

Coming next week, Part 3: The Three-Stage Correction Pipeline – the exact workflow I built to fix every failure category in this article.

Tags: Artificial Intelligence · History · Ancient Rome · Midjourney · Historical Fiction · Game Design · Worldbuilding · AI Art