Author: Ashish Mathur

Summarise data by most recent status

{0 Comments}

Here's a simple 3 column dataset showing Date, ID and Status - the status of each ID by Date.

So, the narrative for ID A is:

  1. It was "New" on Jan 1
  2. It remained "New" until Jan 14
  3. On Jan 15, the status changed to "Open"
  4. It remained "Open" till Jan 31 and the status changed to "Closed" on Feb 1
  5. It remained "Closed" till March 31 and the status changed to "Stop" on April 1
  6. It has remained in 'Stop" status till Today

Note that for the month of March, there is no record for ID A but the status of it has to be treated as Closed (refer point 5 above).

The objective is to count the number of ID's by status and month.  The expected result is:

Please note that the trick part here is to get the result as 2 in cell D6 (Status closed for March). I have solved this problem using Power Query and PowerPivot.  Since these two Business Intelligence (BI) tools are available in PowerBI desktop (PBI) as well, you may download a folder with both files (the MS Excel workbook and PBI file) from here.

Segment customers into dynamic buckets

{0 Comments}

Consider a 4 column table - Respondent ID, Device ID, App Name and Category.  So this dataset shows which apps are installed on which device ID by which user and which category do the apps fall into.  It is a small dataset with only 4 columns and 2,000 rows.

The question on this dataset is - "I would like to segment the total user base by Categories into the following 9 buckets:

  1. Those who only have 1 app installed; and
  2. Those who have 2 apps installed; and
  3. Those who have 3 apps installed; and
  4. Those who have 4 apps installed; and
  5. Those who have 5 apps installed; and
  6. Those who have 6 apps installed; and
  7. Those who have 7 apps installed; and
  8. Those who have 8 - 10 apps installed; and
  9. Those who have 10+ apps installed

The expected result is a Pivot Table with buckets in the column labels, Categories in the row labels and number of people in the value area section (as shown below)

Here's how one can interpret the Pivot Table shown above:

  1. Cell B50 - There are 75 people who only have 1 "Tool" app installed
  2. Cell J44 - There is just 1 person who has 10+ Photography apps installed.

I have solved this problem using Power Query and PowerPivot.  Since these two Business Intelligence (BI) tools are available in PowerBI desktop (PBI) as well, you may download a folder with both files (the MS Excel workbook and PBI file) from here.

Compute hours spent on projects given resource allocation

{2 Comments}

In the dataset below column A has the Employee Name, column B and C are the assignment start and end dates, Column D is the location and columns E to J are the Month-Year columns.  So each row represents data for an employee on a particular project.  The numbers in range E2:J8 represent how much that particular employee is aligned to the particular project i.e. a value of 1 means that the employee is dedicated solely to that project, 1.4 means that the employee will be spending extra hours on that project and 0.1 indicates that the employee will be working on multiple other projects.

The objective is to create another column (column K in the second screenshot) which will show the number of hours the employee will spend on the project.  The number of hours will be computed as number of working days in a month (treat Saturday and Sunday as weekends) * time allocation to that project (the numbers in range E2:J8) * 8.5 hours per day for an Offshore project and 8 hours per day for other projects.

The raw data sheet looks like this

The expected result is

The figure in cell K3 has been computed as:

  • Number of working days between November 11, 2018 and November 30, 2018 are 15.  So 15 * 1 = 15
  • Number of working days between December 1, 2018 and December 12, 2018 are 8.  So 8 * 0.5 = 4
  • Total effective working days are 15 + 4 = 19
  • Since it is an Offshore project, the hours per day would be 8.5.  Therefore total effective hours: 19 * 8.5 = 161.5

I have solved this problem using 3 methods:

  1. Excel formulas - Refer worksheet named "Formula output"
  2. Power Query and PowerPivot - Refer worksheet named "Power Pivot output"
  3. Power Query only - Refer worksheet named "Power Query output"

You may download my solution workbook from here.

Customer analysis by Country and time period

{2 Comments}

Here is a Sales dataset of 8 columns and 29 rows.  It basically details the revenue earned and cash collected by service type, Customer, Country and Period.  For a selected Country and time period, there could be customers availing of both services or of any 1 service.


There are 2 broad questions that one may want to get answers to:

  1. Determine the number of customers who availed of a certain number of services
  2. Determine customers with whom business was forged for the first time and those who churned out

For a chosen country and Year/Month, the first question stated above further sub-divides into:

  1. How may customers availed of both services - Consultancy and Implementation
  2. How may customers availed of only one of the two services

So if a user selects the Country as India and Year/Month as January 2015, then Customers who availed of both services would be 1,3 and 4.  Note that Customer 2 should not be considered (even though he/she availed of both services) because the revenue earned from one of the services (Implementation) was nil.  For the same selection (India and January 2015), the Customers who availed of only 1 service would be Customer 2 - this customer availed of only the Consultancy service (Revenue was earned from this Customer only for this service).  After applying a filter on the source dataset, the rows for India and January 2015 are:

The expected result is shown below in PowerBI desktop software.  If you are not concerned with who those customers are (you just want the count), then you may simply remove the Customer Name field from the visual.

The second question is to determine the number of new and lost customers.  If a customer was not in the database in any prior month, the customer is identified as new.  To clarify, a customer who availed of the Consultancy service in a prior month also availed of the Implementation service for the first time in the current month would not be counted as a new customer.  If a customer ceases to generate revenue in any month, the customer would be counted as lost (churned) in that month.  So when USA is selected in the Country slicer and Year/Month is February 2015, the expected result is:

I have solved this question with the help of the PowerPivot.  You may download my PowerBI desktop solution file from here and source Excel workbook from here.  This problem can also be solved in MS Excel using the PowerPivot.

Compute Relative Size Factor per vendor

{0 Comments}

Relative size factor (RSF) is a test to identify anomalies where the largest amount for subsets in a given key is outside the norm for those subsets. This test compares the top two amounts for each subset and calculates the RSF for each. In order to identify potential fraudulent activities in invoice payment data, one utilizes the largest and the second-largest amounts to calculate a ratio based on purchases that are grouped by vendors.  You may read more on this topic here.

Here is a 3 column dataset.  The first column is Vendor Number, the second is Invoice number and last is invoice amount.  There can be multiple invoices per vendor.  The objective is to determine the highest invoice value for a vendor and divide that by the second highest invoice value for that same vendor to get a ratio.  The same needs to be done for all vendors.  An interesting case in the dataset below is Vendor_No V4439 - there are 2 instances of highest value for this vendor i.e. 25,378.30 and another 2 instances of second highest value i.e. 24,068.25.  The RSF for this case will be 25,378.30/24,068.25.  If there is no instance of second highest value for a vendor, then the result should be 0.

The expected result is:

I have solved this question with the help of the PowerPivot.  You may download my solution workbook from here.

Analyse free flowing text data or user entered remarks from multiple perspectives

{0 Comments}

Here is a 2 column dataset - UserID in column A and Remarks in Column B.  This dataset basically tabulates the remarks/comments shared by different users.  Entries in the Remarks column are basically free flowing text entries which have the following inconsistencies/nuances:

  1. Users reported multiple errors which are separated by comma, Alt+Enter (same line within the cell) and numbered bullets
  2. Users committed spelling mistakes (see arrows in Table1)
  3. A user ID may be repeated in column A

Given this dataset, one may want to "hunt" for specific "keyword Groups" (column E above) in each user remark cell and get meaningful insights.  Some questions which one would like to have answers to are:

  1. How may users reported each type of keyword Group - "How may users used the Unresponsive keyword?".  See Pivot Table1 below.
  2. Which are the keyword Groups that each user reported - "Which are the different keyword groups reported by UserID A004?".  See Pivot Table2 below.
  3. How many users reported each of the different keyword Groups - "How many users reported all 3 problems of Slow, unresponsiveness and crash".  See Pivot Table 3 below.
  4. How may users who used this keyword group also used this keyword group - "How many users who reported Crash also reported Unresponsive?".  See Pivot Table 4 below.

This was quite a formidable challenge to solve because of spelling mistakes and multiple keywords reported in each cell.  I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook from here.

Determine the top selling location for each product

{0 Comments}

Visualise a 3 column dataset as shown below - Location, Product and Sales.  Each location can have multiple products (Product A has Banana, Apple and Carrot) and each product can be sold in multiple locations (Banana is sold in locations A, B and F).

The objective is to determine the location with highest sales for each product.  So for Banana, maximum sale value is 25 and location of maximum sales value is B.  Likewise for Orange, maximum sales value is 49 and location of maximum sales value is A.  The expected result is:

I have 4 solutions to this problem:

  1. Advanced Filters - This is a static solution.  For any changes in the source data range, one will have to re-enter the 3 inputs in the Advanced Filter window
  2. Formulas - This is a semi-dynamic solution.  To make it fully dynamic, one will have to write an array formula to first extract all unique product names in a column.  The array formula to extract product names in a column can be obtained from here.
  3. Power Query - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All
  4. PowerPivot - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All

You may download my solution workbook from here.

Remove duplicates from each cell of a dataset

{0 Comments}

Here's a dataset with 4 columns and 2 rows.  In column A, there is Patient ID and in the other columns are observations recorded by 3 Doctors for each patient.  As can be seen, in each cell there are multiple observations which are either separated by Alt+Enter (another line in the same cell) and/or a comma.  To make things more complicated, there are duplicate text entries appearing in each cell.  In cell B2, Creative appears twice and in cell D2, Fatigue, headaches and stress each appear twice.  Please also note that the entries appear in mixed case in each cell i.e. in cell D2, the first occurrence of stress is in proper case and the second instance is in small case.

The objective is to remove duplicates from each cell (disregarding case sensitivity) and standardize the multiple entries with an Alt+Enter in each cell.  The expected result is shown in the green table below

I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016 and higher versions.  You may download my solution workbook from here.

Flex a Pivot Table to show data for x months ended a certain user defined month

{2 Comments}

In this simple 3 column dataset shown below, one can see the month wise demand and energy charge for 2 years - 2017 and 2018.

The objective is to compute the month wise demand charge for x months ended a certain user defined Year and Month.  So, if a user selects the Year as 2018, Month as June and Duration as 9, then the Pivot Table should show month wise demand charge for the 9 months ended June 2018 i.e. from October 2017 to June 2018.  Likewise, if a user selects Year as 2018, Month as May and Duration as 3, then the Pivot Table show should month wise demand charge for the 3 months ended May 2018 i.e. March 2018 to May 2018.

You may download my solution workbook from here.

Merge 2 work schedules

{0 Comments}

In a certain Organisation, assume that there are two work schedules - one that runs from Friday to Thursday and another that runs from Sunday to Saturday.  In the image below, Work schedule 1 is in range A2:C4 and Work schedule 2 is in range E2:G4.  In each dataset, the start date is the first column i.e. the week starts from there.  So from 28-Dec-2018 to 29-Dec-2018, only workers 1 and 2 would work but from 30-Dec-2018 to 03-Jan-2019, all 4 workers would work.  The objective is to create a combined work schedule as shown in range I2:N7.

The solution is dynamic for new rows that will be added to the source data Tables.  I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016 and higher versions.

You may download my solution workbook from here.