Chunking the large data volume into Smaller Sets

Using a date range to group data into sets can be easy and clean, but the total number of records that could fall within a given range might go above the Force.com query optimizer’s selectivity threshold. When this happens, the underlying query must do a full object scan, often leading to timeouts. (Refer to the Database Query & Search Optimization Cheat Sheet to see what the selectivity thresholds are.) If you are certain that filtering by the date range alone will always be selective, you can use this field to chunk your data. Otherwise, using a unique ID is more reliable, and can be easily implemented by following these steps.
  1. Create or use an existing auto-number field. Alternatively, you can use any number fields that can make up a unique value, as long as they are sequential.
    Note: If you are adding a new auto-number field, make sure to enable the “Generate Auto Number for existing records…” option.
  2. Create a formula field that converts the auto-number field text value into a numeric value—you cannot use an index with comparison operators such as “<=” (less than or equal to) or “>” (greater than) for the text-based auto-number field. In this example, we’ll name this field “ExportID.”
  3. Place a custom index on the formula field by contacting salesforce.com Customer Support.
    Hint: Depending on the characteristics and distribution of your data, adding another selective filter, such as “CreatedDate,” and placing a compound (or two-column) index might help group the data sets in a more meaningful, manageable way.
  1. Run pre-queries to determine the boundaries. For example:
  • To find the lowest boundary:
Select ExportID__c From YourObject__c order by ExportID__c asc null last limit 1;
  • To find the highest boundary:
Select ExportID__c From YourObject__c  order by ExportID__c desc null last limit 1;
Hint: Although you should have no records with the “ExportID__c” field empty (i.e., null), look for them before running your queries so that you don’t miss them before the extraction.
  1. Add a range filter to your extraction (i.e., a WHERE clause in your SOQL) to limit the number of targeted rows so that they are below the selectivity threshold. For example:
SELECT Field1__c, Field2__c, [Field3__c …]
FROM Account
WHERE  ExportID__c > 1000000
AND ExportID__c <= 1200000
When submitting the subsequent requests to retrieve the data, simply replace the “ExportID__c” values by incrementing the ranges until you have gone through all of your data. Things get a little more complicated if you mass delete records often. While the approach described above still works, the gaps created by deleting records reduces the number of records included in each range, making it less optimal from a performance perspective. In other words, you could have a “chunk” that returns only a few or even no rows. 

Ref: https://developer.salesforce.com/blogs/engineering/2013/06/extracting-large-data-volume-ldv-in-force-com.html

Comments