Iterating large data in doctrine 2

@samueljabiodun · April 19, 2019

A few days ago, I wanted to update millions of records in a MySQL database using Doctrine 2. I did a couple of searches and landed on this doctrine documentation. It suggests combining query iteration with batching strategy to achieve this.

For instance:

$batchSize = 1000;
$numberOfProcessedUpdate = 0;

$query = $entityManager->createQuery('select u from App\Model\User u');

$iterableResult = $query->iterate();

foreach ($iterableResult as $row) {
    $user = $row[0];
    $user->updateSomethingImportant();
    if (($numberOfProcessedUpdate % $batchSize) === 0) {
        $entityManager->flush(); // Executes all updates.
        $entityManager->clear(); // Detaches all objects from Doctrine!
    }
    $numberOfProcessedUpdate++;
}

$entityManager->flush();


This did not work well for my use case as it seems to be ineffective for records above a million.

Records Time (s) Memory (mb)
500k 185.1500730514526 6.255645751953125
250k 69.39160299301147 4.524032592773438
1 M   Allowed Memory size exhausted


I made a little modification to the code snippet. I combined iteration, batching and pagination strategy.

$batchSize = 1000;
$numberOfRecordsPerPage = 5000;

$totalRecords = $queryBuilder->select('count(u.id)')
            ->from('App\Model\User u')
            ->getQuery()
            ->getSingleScalarResult();   //Get total records to update

$totalRecordsProcessed = 0;

        while (true) {
            $query = $entityManager->createQuery('select u from App\Model\User u')
                ->setMaxResults($numberOfRecordsPerPage) //Maximum records to fetch at a time
                ->setFirstResult($totalRecordsProcessed);
          
             $iterableResult = $query->iterate();
            if ($iterableResult->next() === false) {
                break;
            }
                      
            foreach ($iterableResult as $row) {
                $user = $row[0];
                $user->updateSomethingImportant();
              
                 if (($totalProcessed % $batchSize ) === 0) {
                    $entityManager->flush();
                    $entityManager->clear();
                }
                $totalProcessed++;
            }
        }

	$entityManager->flush();


Number of records to update Time (s) Memory (mb)
2M 9.931747436523438 1042.758535861969
500k 7.36187744140625 200.1340479850769

Twitter, Facebook