Analyse membership changes from year to year

{1 Comment}

Assume a simple 4 column dataset as shown below.  This data shows which ID had which type of subscription in which year.  So ID A, which started as a "Free" subscriber in 2018 switched to a "Premium" subscriber in 2019 and then churned out in 2020.  Likewise, ID D which started as a "Pro" subscriber in 2018, churned out in 2019 but returned as a "Free" subscriber in 2020.
The objective is to study how subscribers switched from one subscription type to another across year.  So the expected result should look like this


I have solved this question using the PowerPivot.  You may download my MS Excel workbook from here.

Segregating data appearing in a single column into multiple columns where there is blank row between records

{0 Comments}

Here is a sample dataset downloaded from an external application.  As you can observe, there are anomalies in the dataset:

  1. All the data appears in a single column when Infact it should be segregated into different columns; and
  2. There is no separating one record from another.  All we know is that each record ends with a number.  So:
    1. Record1 starts at row 2 and ends at row 5
    2. Record2 starts at row 6 and ends at row 9
    3. Record3 starts at 14 and ends at row 17
  3. Not all records have 4 rows.  As an exception, the highlighted range below has 5 rows (row 82 to 86).  The text entry "Part paid" appears in row 85.  In no other record is there the payment status before the number.


The objective is to clean the dataset and transform it into a 5 column one as shown below:
I have solved this question using the Query Editor (available in both MS Excel and PowerBI Desktop).  You may download my solution workbook from here.

Show text entries in the value area section of a Pivot Table after meeting certain conditions

{0 Comments}

In the value area section of a normal Pivot Table one can only show the result of aggregation functions such as SUM(), COUNT(), AVERAGE() etc.  Even if one drags a text field to the value area section of a Pivot Table, one cannot show those text fields because they automatically get counted.

Consider the following dataset.  The important columns to consider here are COD (Column C), Level (Column E) and Date (column G).


For a COD, there can be a number of rows (COD 31512268 has 3 rows).  For this COD, there is just one level (E) for the same date/time.
It is also possible that for a particular COD, there can be different Levels (COD 31512259 has 4 rows).  For this COD, there are 2 levels (E and D) for the same data/time.

To further complicate the issue, there can be some cases where for the same date/time, a COD may have different levels.  COD 11058698 has 2 different levels (K and M) for the same date/time.
The expected result is to show a Pivot Table with COD's in the row labels and the Level(s) as on the farthest date/time of each COD.  If a particular COD has 2 levels as on the farthest date/time, then they should be shown in the value area section of the Pivot Table (separated by commas).  So the expected result should look like this.  Notice that COD 11058698 has 2 levels as on the farthest date/time (K and M) and COD 11058700 has 3 levels as on the farthest date/time (Blank, M and 1M).
I have solved this question in MS Excel and PowerBI Desktop with the help of the DAX formulas.  You may download my Excel solution workbook from here and PowerBI Desktop file from here.

Count tasks by status

{0 Comments}

Assume a simple 3 column dataset as shown below - the date of each task and the status of that task.
The objective is to get the status wise count of tasks by the last time stamp.  So for the Status "To-do", the count should be 2 - Task ABC and DEF.  Only these two tasks on their last time stamp have the status as "To-do".  Tasks CED and ADR should not be counted because their last time stamp had a status other than "To-do".  So the final expected result in MS Excel is:

Since the original data is being fetched from an external data source, no additional tables or columns can be created from/in the source data table.

The final result in PBI Desktop is this
You may download my PowerPivot solution workbook from here and PBI Desktop solution file from here.

Segment towns according to volume contribution and market share with a slicer

{0 Comments}

This post is an extension to the one I posted here - Segment towns according to volume contribution and market share. Here's a simple dataset of Shampoo sales in the state of Rajasthan, India.

For a chosen segment, one may want to segment the 4 towns based on the following conditions:
Based on the two screenshots shared above, the desired result is shown in the screenshot below:
The difference between this solution at the previous one (the link of which I have shared above) is that in this one we want to drag the Classification (range E16:E17) to either the row/column/report filter section of the Pivot Table use it as a slicer.  The current limitation with measures that one writes in PowerPivot's is that measures cannot be used in either row/column/report filter section or as a slicer of/in a Pivot Table.  So in the previous solution, I had written a measure to return the result as Headroom, Stronghold, Emerging or small in only the value area section of the Pivot Table.  One could not drag that measure into the row labels of a Pivot Table.  In this solution, one can drag the Town classification to the row/column/report filter section or even to the slicer (see images below)
You may download my solution workbook from here.

Segment towns according to volume contribution and market share

{0 Comments}

Here's a simple dataset of Shampoo sales in the state of Rajasthan, India.
For a chosen segment, one may want to segment the 4 towns based on the following conditions:
Based on the two screenshots shared above, the desired result is shown in the screenshot below:
The desired result is shown in range E16:E19 and the explanation of the classification is shown in range F16:F19.

The final result obtained by using the PowerPivot is shown in the screenshot below:
You may download my solution workbook from here.

Tabulating data from multiple unstructured Excel files

{0 Comments}

Many a times data downloaded from Applications/ERP's are not in a filter/Pivot ready format.  In such cases, a lot of time has to first be invested in getting that data in proper order before even beginning to analyse that data.  What makes this situation worse is that data is downloaded every month in that unstructured manner and the effort which was spent on "cleaning up" the data in the previous month has to be replicated this month as well.

Here is one such dataset.  Some easy to spot problems here are:

  1. All data is in a single column instead of being in multiple columns
  2. Data which should ideally appear in a single row is appearing in multiple rows.  See data in rows 26 to 32.  Information from rows 27 to 32 should appear in row 26
  3. Rows 1 to 6 are not required at all

Data should appear in the following format to make it "filter/Pivot ready".  Furthermore, when multiple such files (as the one shown above) are placed in a certain folder, one should simply be able to open the output file (shown below) and go to Data > Refresh All.

I  have been able to solve this problem using Data > Get & Transform.  Following these steps to get my solution up and working on your system:

  1. Download the input file from here
  2. Download the output file from here
  3. Create a folder on your desktop and place the input file in this folder.  Store as many input files in this folder.  Change data in those input files, if you so wish.
  4. Save the output file anywhere outside the folder that you created in step 3 above
  5. Ensure that all input workbooks are closed
  6. Open the output file and go to Data > Queries and Connections.  Right click on the Result Query > Edit
  7. Double click on Source and select the folder that you created in 3 above
  8. Click on Close and Load
  9. Anytime there is any change in the source folder files, just go to the Output file and click on Data > Refresh All

Calculate rolling sum for the past week by ignoring blank cells

{0 Comments}

Assume a simple dataset as shown in the image below (the input data is in columns A and B only.  The desired outcome is in columns C and D).

The objective is to calculate the 7 days rolling sum and average (as shown in columns C and D) ignoring blank cells.  So in cell C8, the rolling sum is the summation of values from range B2:B8.  In cell C9, it is from B3:B9.  However, in cell C10, it will be from range B3:B9 (not from range B4:B10).  Likewise, in cell C11, the rolling sum will be from range B4:B11.  So the range to be considered for calculating the rolling sum has to roll back automatically until it picks up 7 numeric cells - the blanks have to be ignored.  The rolling average is a simple division - Rolling sum/7.

I have solved this question with Excel formulas here.  This time however, I am sharing a solution by using the DAX formula language available in the PowerPivot and PowerBI Desktop.  You may download my PowerBI Desktop file from here.  The same solution can also be obtained in MS Excel using the PowerPivot as well.

Append data from multiple worksheets of multiple workbooks where each worksheet has a different heading

{0 Comments}

In a folder there are multiple workbooks with an unknown number of worksheets in each workbook.  Each worksheet has data for one year and has 13 columns - the first is for the Product and the other 12 are for each month of the year.  So sheet1 of Book1 has Product in column1, 1 Jan 2015 in column2, 1 Feb 2015 in column3 and so on until 1 Dec 2015 in column13.  Likewise, sheet2 of Book1 has Product in column1, 1 Jan 2016 in column2, 1 Feb 2016 in column3 and so on until 1 Dec 2016 in column13.  The worksheet which has data for the current year will have fewer columns since data for all months is not available.  The number of rows on each worksheet vary depending upon the number of products sold in that year (as can be seen in the two images below).

The objective is to append data from all worksheet of all workbooks to create a simple 3 column Table - Date in column1, Product in column2 and Value in column3.

The solution should be dynamic enough to accommodate the following:

  1. More files added to/deleted from the folder
  2. More worksheets added to/deleted from the existing files
  3. Data added by rows/columns to the existing worksheets
  4. Heading changes in the existing worksheets i.e. in sheet1 of the DATA1 workbook (first image above), one may want to change the year appearing in the first row from 2015 to 2013

This question would have been an easy one to solve had the headings on all worksheets of all workbooks been the same such as Month1, Month2, Month3, Month4 etc.  In that scenario, the technique shown in this YouTube vide would have worked very well.  However, since the headers on each worksheet of each workbook are different, this technique yields an unexpected result (shown below) - Please note that I have not used the Table.PromoteHeaders([Data]) command as described in the video because that would take away the row with dates i.e. row1, row5, row11, row16.  If I do not run that step, I get the output as shown in the image below.  The dates appear as part of the Data rather than being identified as headers.

I have solved this question using the Power Query Editor.  You may download my source data from here and solution workbook from here.

  1. Download the source data workbooks and save them in a folder.
  2. Download the solution workbook and save it anywhere else.
  3. Open the solution workbook and go to Data > Queries and Connections.
  4. In the pane that opens up, right click on the Query > Edit.
  5. In the Applied steps window, double click on Source and specify the path where you have saved the source data workbooks (step1 above).
  6. Click on OK > Close and Apply.

Summarise data by most recent status

{0 Comments}

Here's a simple 3 column dataset showing Date, ID and Status - the status of each ID by Date.

So, the narrative for ID A is:

  1. It was "New" on Jan 1
  2. It remained "New" until Jan 14
  3. On Jan 15, the status changed to "Open"
  4. It remained "Open" till Jan 31 and the status changed to "Closed" on Feb 1
  5. It remained "Closed" till March 31 and the status changed to "Stop" on April 1
  6. It has remained in 'Stop" status till Today

Note that for the month of March, there is no record for ID A but the status of it has to be treated as Closed (refer point 5 above).

The objective is to count the number of ID's by status and month.  The expected result is:

Please note that the trick part here is to get the result as 2 in cell D6 (Status closed for March). I have solved this problem using Power Query and PowerPivot.  Since these two Business Intelligence (BI) tools are available in PowerBI desktop (PBI) as well, you may download a folder with both files (the MS Excel workbook and PBI file) from here.