Incremental Update - Power BI
Updated by Manu da Silva
Creating Parameters
In this task, use the Power Query Editor to create the parameters RangeStart and RangeEnd with default values. Default values apply only when filtering the data to be loaded into the model in Power BI Desktop. The entered values should include only a small amount of the most recent data from your data source. When published to the service, these values are overridden by the incremental update policy.
- In Power BI Desktop, click on Transform Data to open the Power Query Editor.
- Click on Manage Parameters > New Parameter.
- In Manage Parameters > Name, type RangeStart (case-sensitive), in Type, select Date/Time and in Current Value, enter an initial date/time value.
- Create a second parameter called RangeEnd. In Type, select Date/Time and in Current Value, enter an end date/time value.
Now that you have the RangeStart and RangeEnd parameters, filter the data to be loaded into the model based on these parameters.
Before starting the next step, it is necessary to convert the column, in which you want to perform the incremental update, to the Date/Time type. Generally, they will be in text format. To do this, go to the column in Power Query, click on the text symbol on the left in the column > Date/Time.
Filter Data
With the RangeStart and RangeEnd parameters defined, apply a filter based on the conditions in the RangeStart and RangeEnd parameters.
- In the Power Query Editor, select the date column, already converted to Date/Time, you wish to filter and click on the filter icon > Date/Time Filters > Custom Filter.
- In Filter Rows, to specify the first condition, select is after or is after or equal to, select Parameter and select RangeStart.
- To specify the second condition, if you selected is after in the first condition, select is before or equal to, or if you selected is after or equal to in the first condition, select is before for the second condition and select Parameter and select RangeEnd. For example:
In the Power Query Editor, click on Close & Apply. Power Query will load the data based on the filters defined in the RangeStart and RangeEnd parameters and any other filters you have set.
Power Query loads only the data specified between the RangeStart and RangeEnd parameters. Depending on the amount of data in this period, loading should occur quickly. If it seems slow and with many processes, the query is likely not folding.
Defining Policy
After defining the RangeStart and RangeEnd parameters and filtering the data based on these parameters, you set an incremental update policy. The policy is applied only after the model is published to the service and a manual or scheduled update operation is executed.
- In Data View > Fields > open the context menu of the table and click on Incremental Refresh.
- In Incremental Refresh and Real-time Data > Select table, verify or select the table. By default, the Select table drop-down list is standardized to the table selected in the data view.
- Specify the necessary settings:
In Set import and refresh windows > Refresh this table incrementally, click on the slider to Enable. If the slider is disabled, it means that the Power Query expression for the table does not include a filter based on the RangeStart and RangeEnd parameters.
In File data starting from:, specify the historical storage period you want to include in the dataset. All rows with dates in this period will be loaded into the dataset in the service unless other filters apply.
In Incremental data refresh starting from:, specify the update period. All rows with dates in this period will be updated in the dataset whenever a manual or scheduled update operation is executed.
- Specify optional settings:
In Choose optional settings, select Get the latest data in real-time with DirectQuery (Premium only) to include the latest data changes that occurred in the data source after the last update period. This setting causes the incremental update policy to add a DirectQuery partition to the table.
Select Refresh only complete days to update only whole days. If the update operation detects that a day was not completed, the rows from that entire day will not be updated. This option is automatically enabled if you select Get the latest data in real-time with DirectQuery (Premium only).
Select Detect data changes to specify a date/time column used to identify and update only the days when data has changed. A date/time column must exist, usually for audit purposes, in the data source. This should not be the same column used to partition the data with the RangeStart and RangeEnd parameters. The maximum value of this column is evaluated for each of the periods in the incremental range. If it has not changed since the last update, the current period is not updated. For datasets published in Premium capacities, you can also specify a custom query.
Depending on your settings, your policy should look like this:
Review your settings and click Apply to complete the update policy. Source data is not loaded with this step.
Save and Publish to Service
When your RangeStart and RangeEnd parameters, filtering, and update policy settings are completed, save your model and publish it to the service. If your dataset becomes large, make sure to enable the large dataset storage format before invoking the first update in the service.
Update Dataset
In the service, update the dataset. The first update will load new and updated data within the update period, as well as historical data from the entire storage period. Depending on the amount of data, this may take a long time. Subsequent updates, manual or scheduled, are usually much faster because the incremental update policy is applied and only the data from the specified period in the update policy configuration is updated.