Why Synthetic Data Fails in Real World

Lately I’ve been hearing a familiar refrain: "Well, we can always fill the gaps with synthetic data." It sounds neat - fast, scalable, controlled. But sometimes, when that synthetic data meets the real world, things start to break.

Lately I’ve been hearing a familiar refrain: “Well, we can always fill the gaps with synthetic data.” It sounds neat – fast, scalable, controlled. But sometimes, when that synthetic data meets the real world, things start to break.

Let me give you an example.

A research team once built an AI model to recognise Lego bricks. Instead of photographing thousands of real ones (slow, messy, unpredictable), they used computer-generated images – flawless lighting, perfect edges, every brick pristine.

The model learned beautifully… until it saw an actual photo of a real Lego brick taken by a human. Suddenly, nothing worked. Shadows, smudges, reflections, weird angles – all the things synthetic data had never seen – broke the model’s logic completely.

That’s the catch. Synthetic data often skips over the awkward stuff: quirks, inconsistencies, formatting chaos, human shorthand. But those “imperfections” are the signal, especially in sectors like emergency response, policing or finance.

Try training a Police AI on synthetic phrases when your real-world data looks more like this: “STATE 6 on scene – suspect unclear” These aren’t errors. They’re meaning, compressed.

Now (to be clear) synthetic data has its place. If you’re working in structured environments or building models that rely on clean numeric input, it can be incredibly powerful. It helps scale. It helps fill in blanks.

But if your AI needs to operate in the messy, context-heavy world of how people actually write, report, and communicate? Then you need real data – with all the unpredictability that comes with it.

Because the problem isn’t that synthetic data is bad. It’s that the real world is more complicated, uneven, and unpredictable than we give it credit for and that can be a real challenge to artificially generate.

About the Author: Elliott Young

As Chief Technology Officer at Dell Technologies EMEA, he supports CXOs in both business and IT roles in leveraging GenAI for real-world results. Together with his team at Dell, he empowers organizations of all sizes, from large enterprises to mid-sized businesses, to not only optimize and create competitive advantage with GenAI but also to design entirely new business models. That is where the real revolution happens.

He leads the development of cutting-edge AI and multicloud strategies, helping industries from finance and manufacturing to public safety turn complexity into clarity. A growing area of his focus is AI strategy in policing, where he works closely with senior leaders to enable safer communities through practical, responsible adoption of GenAI — from freeing up officers’ time to improving intelligence review and incident response.

He has been a pilot, a solutions architect and a consultant. Flying a helicopter 50 feet above the ground while herding kangaroos in the Australian outback taught him how to stay calm and in control, a skill that serves him well whether managing a £50k IT project or a £1bn transformation program. He can also be found presenting virtually at various industry events throughout the year.