A few days ago, I wanted to update millions of records in a MySQL database using Doctrine 2. I did a couple of searches and landed on this doctrine documentation. It suggests combining query iteration with batching strategy to achieve this.
For instance:
$batchSize = 1000;
$numberOfProcessedUpdate = 0;
$query = $entityManager->createQuery('select u from App\Model\User u');
$iterableResult = $query->iterate();
foreach ($iterableResult as $row) {
$user = $row[0];
$user->updateSomethingImportant();
if (($numberOfProcessedUpdate % $batchSize) === 0) {
$entityManager->flush(); // Executes all updates.
$entityManager->clear(); // Detaches all objects from Doctrine!
}
$numberOfProcessedUpdate++;
}
$entityManager->flush();
This did not work well for my use case as it seems to be ineffective for records above a million.
Records | Time (s) | Memory (mb) |
---|---|---|
500k | 185.1500730514526 | 6.255645751953125 |
250k | 69.39160299301147 | 4.524032592773438 |
1 M | Allowed Memory size exhausted |
I made a little modification to the code snippet. I combined iteration, batching and pagination strategy.
$batchSize = 1000;
$numberOfRecordsPerPage = 5000;
$totalRecords = $queryBuilder->select('count(u.id)')
->from('App\Model\User u')
->getQuery()
->getSingleScalarResult(); //Get total records to update
$totalRecordsProcessed = 0;
while (true) {
$query = $entityManager->createQuery('select u from App\Model\User u')
->setMaxResults($numberOfRecordsPerPage) //Maximum records to fetch at a time
->setFirstResult($totalRecordsProcessed);
$iterableResult = $query->iterate();
if ($iterableResult->next() === false) {
break;
}
foreach ($iterableResult as $row) {
$user = $row[0];
$user->updateSomethingImportant();
if (($totalProcessed % $batchSize ) === 0) {
$entityManager->flush();
$entityManager->clear();
}
$totalProcessed++;
}
}
$entityManager->flush();
Number of records to update | Time (s) | Memory (mb) |
---|---|---|
2M | 9.931747436523438 | 1042.758535861969 |
500k | 7.36187744140625 | 200.1340479850769 |