June 2, 2026—Pimcore—2 min read

Updating millions of objects a day in Pimcore? Sure.

By Miro Kodet

How to manage massive data flow within single Pimcore instance

Running a large project is not easy. And running a large project reliably is even worse. A customer came across a PIM implementation serving all product data for their healthcare project. Multiple channels, lots of variants, hundreds of thousands child objects containing additional data. One fatty server with scaling possibilities, no public cloud.

The basic architecture took us a while to implement as we did it alongside various frontends and additional subprojects, but evolved into a stable state. Next step was to keep data flowing in and out without any glitches. We started small - one object type, one source of external data and pretty low cadence. Even here we hit limitations of Pimcore ORM pretty fast. Updating half a million objects in one step is simply not a good idea. A full sync was taking 19 hours and we needed it under 30 minutes. Something had to change.

Run commands?

Well, we gave it a try - running multiple instances of a CLI command to the rescue. Nope. How should a poor developer keep data consistency without additional headaches? Sure, you could keep track of objects in the current loop. You can also implement some fancy locking to keep data on the track. We knew there has to be a better way.

Directly in database?

That would be pretty straightforward and fast out of principle. Are you sure you are able to update all relations as well? Are you sure you know all tiny details and operations the ORM is silently doing under the hood? That's not the way to go. Not in this case.

Queue!

That's the hit. We've leveraged a fast message queue (RabbitMQ in our case) and a handful of consumers — 20 of them, managed by Supervisor to keep them alive and restart on failure. Each object gets its update per message, ACK will make sure you won't consume it multiple times and you can easily run it in parallel. Alright, problem solved.

Well, not that much. To keep it running - and update millions of entries a day - you have to feed the beast somehow. A scheduled command crawls the data sources and pushes messages into the queue. Sounds simple, but getting the cadence right took a few iterations — push too fast and you overwhelm the consumers, too slow and you're back to missing your sync window. 52GB RAM, 16 cores and some grey hair on top. Scaling alongside high throughput is something you have to take into account.

We have also considered a database queue (Symfony Messenger would play nicely with our setup) or Redis for this matter. Both of them would work in a similar way with their own pros and cons - our aim to detach the queue from the system, ease of administration and overall good DX simply outweighed the other solutions. But it's definitely not the silver bullet, always pick the stack according to your current project and its needs.

What's the recipe then?

Spread your load into multiple async processes. If you cannot do that natively in your current language or stack, bypass it with a technology that can. In our case the external queue was the best-fitting solution — today it chews through ~2 million updates a day without breaking a sweat.