Compute year wise weighted average on a large dataset

{2 Comments}

Assume a dataset with a Key Performance Indicator (KPI) [appearing in one column] data for years ranging from 1985 to 2010 for 114 countries.  This dataset has 170,000 rows of data and one row below the last row for every country, there is a total of the KPI column.  So, if there are 25 rows for India, then in the 26th row, there will be the total appearing for numbers in the KPI column.  The same is occurring for other countries as well.

There is another dataset (in another worksheet of the same workbook) which has an index value for the same countries and same date range as the first dataset.  The second dataset is relatively smaller (with only 1315 rows) because the index value is not available for all years of each country.

The objective is to determine the year wise (for all years from 1985 to 2010) weighted average of KPI.  An illustration of the weighted average computation has been shown in range F5:H10 of the "Index" worksheet of the workbook link shared below.

Solving this problem using Pivot Table, filters, formulas will slow down processing speed due to sheer size of data.  I have solved this problem using the Power Pivot tool (for Excel 2010 and higher versions).

You may refer to my solution in this workbook.

RSS 2 Comments…

 Share your views
  1. Hi Ashish,
    Could you please help in calculating weighted average in the grand total cell in power pivot. The situation is when we have many entries for each district of a state and then we need to get a weighted average at the state level by assigning the weights for each district.

    • Hi,

      In this workbook, the setup is as follows:

      1. Base data sheet which has Cidade (Municipal) wise Area under cultivation for 3 years and crop yield for the same period
      2. On another worksheet (Municipal weights), there are municipal wise weights
      3. Your expected result (derived by using SUMIF,SUMPRODUCT and AVEERAGEIF functions) is shown on the "Expected results" worksheet

      To get the same result with the PowerPivot, try this solution

      1. Take both "Base data" and "Municipal weights" worksheets to the PowerPivot window. In the PowerPivot window, I renamed the Base data tab to Harvest_data and municipal weights to municipal_weights
      2. Establish a relationship between the Cidade column on the harvest_data tab (Data Table) and Municipal column on the Municipal_weights tab (Related Look table)
      3. In the PowerPivot window, go to Home > Pivot Table > Pivot Table and drag Cidade from the harvest_data table to the Row labels
      4. Go to the PowerPivot button in the Ribbon and click on New Measure
      5. In the Window, select the harvest_table, write this measure and name it as Area in 2012

      =if(HASONEVALUE(harvest_data[Cidade]),SUM(harvest_data[Area_total_12]),SUMX(SUMMARIZE(harvest_data,[Cidade],"Weights",sum(municipal_weights[Area]),"Area",sum(harvest_data[Area_total_12])),[Weights]*[Area])/SUM(municipal_weights[Area]))

      6. In the Window, select the harvest_table, write this measure and name it as Area in 2013

      =if(HASONEVALUE(harvest_data[Cidade]),SUM(harvest_data[Area_total_13]),SUMX(SUMMARIZE(harvest_data,[Cidade],"Weights",sum(municipal_weights[Area]),"Area",sum(harvest_data[Area_total_13])),[Weights]*[Area])/SUM(municipal_weights[Area]))

      7. In the Window, select the harvest_table, write this measure and name it as Yield in 2012

      =if(HASONEVALUE(harvest_data[Cidade]),average(harvest_data[Yld - 1213]),SUMX(SUMMARIZE(harvest_data,[Cidade],"Weights",sum(municipal_weights[Area]),"Yield",average(harvest_data[Yld - 1213])),[Weights]*[Yield])/SUM(municipal_weights[Area]))

      You will notice that the figures in B104:D104 of the "PowerPivot solution" worksheet tally with the figures in D5:F5 of the "Expected result" worksheet.

      Hope this helps.

Leave a Comment

Your email address will not be published.

*

*