Customer analysis by Country and time period

{2 Comments}

Here is a Sales dataset of 8 columns and 29 rows.  It basically details the revenue earned and cash collected by service type, Customer, Country and Period.  For a selected Country and time period, there could be customers availing of both services or of any 1 service.


There are 2 broad questions that one may want to get answers to:

  1. Determine the number of customers who availed of a certain number of services
  2. Determine customers with whom business was forged for the first time and those who churned out

For a chosen country and Year/Month, the first question stated above further sub-divides into:

  1. How may customers availed of both services - Consultancy and Implementation
  2. How may customers availed of only one of the two services

So if a user selects the Country as India and Year/Month as January 2015, then Customers who availed of both services would be 1,3 and 4.  Note that Customer 2 should not be considered (even though he/she availed of both services) because the revenue earned from one of the services (Implementation) was nil.  For the same selection (India and January 2015), the Customers who availed of only 1 service would be Customer 2 - this customer availed of only the Consultancy service (Revenue was earned from this Customer only for this service).  After applying a filter on the source dataset, the rows for India and January 2015 are:

The expected result is shown below in PowerBI desktop software.  If you are not concerned with who those customers are (you just want the count), then you may simply remove the Customer Name field from the visual.

The second question is to determine the number of new and lost customers.  If a customer was not in the database in any prior month, the customer is identified as new.  To clarify, a customer who availed of the Consultancy service in a prior month also availed of the Implementation service for the first time in the current month would not be counted as a new customer.  If a customer ceases to generate revenue in any month, the customer would be counted as lost (churned) in that month.  So when USA is selected in the Country slicer and Year/Month is February 2015, the expected result is:

I have solved this question with the help of the PowerPivot.  You may download my PowerBI desktop solution file from here and source Excel workbook from here.  This problem can also be solved in MS Excel using the PowerPivot.

Compute Relative Size Factor per vendor

{0 Comments}

Relative size factor (RSF) is a test to identify anomalies where the largest amount for subsets in a given key is outside the norm for those subsets. This test compares the top two amounts for each subset and calculates the RSF for each. In order to identify potential fraudulent activities in invoice payment data, one utilizes the largest and the second-largest amounts to calculate a ratio based on purchases that are grouped by vendors.  You may read more on this topic here.

Here is a 3 column dataset.  The first column is Vendor Number, the second is Invoice number and last is invoice amount.  There can be multiple invoices per vendor.  The objective is to determine the highest invoice value for a vendor and divide that by the second highest invoice value for that same vendor to get a ratio.  The same needs to be done for all vendors.  An interesting case in the dataset below is Vendor_No V4439 - there are 2 instances of highest value for this vendor i.e. 25,378.30 and another 2 instances of second highest value i.e. 24,068.25.  The RSF for this case will be 25,378.30/24,068.25.  If there is no instance of second highest value for a vendor, then the result should be 0.

The expected result is:

I have solved this question with the help of the PowerPivot.  You may download my solution workbook from here.

Analyse free flowing text data or user entered remarks from multiple perspectives

{0 Comments}

Here is a 2 column dataset - UserID in column A and Remarks in Column B.  This dataset basically tabulates the remarks/comments shared by different users.  Entries in the Remarks column are basically free flowing text entries which have the following inconsistencies/nuances:

  1. Users reported multiple errors which are separated by comma, Alt+Enter (same line within the cell) and numbered bullets
  2. Users committed spelling mistakes (see arrows in Table1)
  3. A user ID may be repeated in column A

Given this dataset, one may want to "hunt" for specific "keyword Groups" (column E above) in each user remark cell and get meaningful insights.  Some questions which one would like to have answers to are:

  1. How may users reported each type of keyword Group - "How may users used the Unresponsive keyword?".  See Pivot Table1 below.
  2. Which are the keyword Groups that each user reported - "Which are the different keyword groups reported by UserID A004?".  See Pivot Table2 below.
  3. How many users reported each of the different keyword Groups - "How many users reported all 3 problems of Slow, unresponsiveness and crash".  See Pivot Table 3 below.
  4. How may users who used this keyword group also used this keyword group - "How many users who reported Crash also reported Unresponsive?".  See Pivot Table 4 below.

This was quite a formidable challenge to solve because of spelling mistakes and multiple keywords reported in each cell.  I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook from here.

Determine the top selling location for each product

{0 Comments}

Visualise a 3 column dataset as shown below - Location, Product and Sales.  Each location can have multiple products (Product A has Banana, Apple and Carrot) and each product can be sold in multiple locations (Banana is sold in locations A, B and F).

The objective is to determine the location with highest sales for each product.  So for Banana, maximum sale value is 25 and location of maximum sales value is B.  Likewise for Orange, maximum sales value is 49 and location of maximum sales value is A.  The expected result is:

I have 4 solutions to this problem:

  1. Advanced Filters - This is a static solution.  For any changes in the source data range, one will have to re-enter the 3 inputs in the Advanced Filter window
  2. Formulas - This is a semi-dynamic solution.  To make it fully dynamic, one will have to write an array formula to first extract all unique product names in a column.  The array formula to extract product names in a column can be obtained from here.
  3. Power Query - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All
  4. PowerPivot - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All

You may download my solution workbook from here.

Remove duplicates from each cell of a dataset

{0 Comments}

Here's a dataset with 4 columns and 2 rows.  In column A, there is Patient ID and in the other columns are observations recorded by 3 Doctors for each patient.  As can be seen, in each cell there are multiple observations which are either separated by Alt+Enter (another line in the same cell) and/or a comma.  To make things more complicated, there are duplicate text entries appearing in each cell.  In cell B2, Creative appears twice and in cell D2, Fatigue, headaches and stress each appear twice.  Please also note that the entries appear in mixed case in each cell i.e. in cell D2, the first occurrence of stress is in proper case and the second instance is in small case.

The objective is to remove duplicates from each cell (disregarding case sensitivity) and standardize the multiple entries with an Alt+Enter in each cell.  The expected result is shown in the green table below

I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016 and higher versions.  You may download my solution workbook from here.

Flex a Pivot Table to show data for x months ended a certain user defined month

{2 Comments}

In this simple 3 column dataset shown below, one can see the month wise demand and energy charge for 2 years - 2017 and 2018.

The objective is to compute the month wise demand charge for x months ended a certain user defined Year and Month.  So, if a user selects the Year as 2018, Month as June and Duration as 9, then the Pivot Table should show month wise demand charge for the 9 months ended June 2018 i.e. from October 2017 to June 2018.  Likewise, if a user selects Year as 2018, Month as May and Duration as 3, then the Pivot Table show should month wise demand charge for the 3 months ended May 2018 i.e. March 2018 to May 2018.

You may download my solution workbook from here.

Merge 2 work schedules

{0 Comments}

In a certain Organisation, assume that there are two work schedules - one that runs from Friday to Thursday and another that runs from Sunday to Saturday.  In the image below, Work schedule 1 is in range A2:C4 and Work schedule 2 is in range E2:G4.  In each dataset, the start date is the first column i.e. the week starts from there.  So from 28-Dec-2018 to 29-Dec-2018, only workers 1 and 2 would work but from 30-Dec-2018 to 03-Jan-2019, all 4 workers would work.  The objective is to create a combined work schedule as shown in range I2:N7.

The solution is dynamic for new rows that will be added to the source data Tables.  I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016 and higher versions.

You may download my solution workbook from here.

Show Project wise status in a Pivot Table

{0 Comments}

Visualise a simple 6 column Table as shown below - Project Name and the finish date for each of the 5 stages that the projects go through.  Each project goes through 5 stages - Requirement (Req), Development (Dev), UAT, Implement and Warranty.

The objective is to report on the status of each project at the end of each month based on which stage is/was completed in that month.  So, if a given project's requirements are completed in January and development completes some time in March, the one would expect the output of the report to show the project's status in January and February as "Req" and in March as "Dev" respectively.  February should also show "Req" because the next stage was completed only in March (although it may have started in January).  If multiple stages complete in one month, then the report should display only the most recently completed stage.  So, if Project A completed both Requirements and Development stages in January, the report should show only "Dev" as the stage completed in January.

For the data shared above, the expected result is:

You may download my solution workbook from here.

Rearrange a multi heading dataset into a single heading one which is Pivot ready

{0 Comments}

Any well arranged dataset should be "Pivot Table" ready with the following 3 important properties:

  1. There should be no merged and centered cells; and
  2. Every column should have a unique heading; and
  3. Every column should have only 1 heading

Here's one dataset which violates all rules mentioned above.

  1. Headings in row 1 are merged; and
  2. The headings in row 2 are not unique
  3. Every column has headings in row 1 and row 2.

To be Pivot Table friendly, this dataset will have to be restructured into a 4 column one - Week, Equipment, Jobsite and Widgets as shown below:

I have achieved the desired result by using Data > Get & Transform (also known as Power Query in earlier versions of MS Excel).  The solution is dynamic for new rows and columns added to the data on the Input worksheet - one simple has to go to Data > Refresh All.  You may download my solution workbook from here.

In this workbook, there is another example of how one can transform a multi heading dataset into a Pivot Table ready dataset.  The primary difference between this and the previous dataset is that there are 2 descriptive columns to the left (as against only one in the previous example).

Rearrange travel data to clearly show travel from and travel to locations

{0 Comments}

Consider a 2 column dataset as shown below

User Location
AAA Tokyo
AAA Osaka
AAA Nagoya
AAA Hakone
AAA Kyoto
BBB Sapporo
BBB Nara
CCC Tokyo
CCC Hakone
CCC Osaka
DDD Osaka
DDD Tokyo

Customer AAA travelled from Tokyo to Osaka, Osaka to Nagoya, Nagoya to Hakone and Hakone to Kyoto.  All locations appear in a single column.  To analyse customer travel information very clearly, one may want to rearrange the dataset as follows:

User From To
AAA TOKYO OSAKA
AAA OSAKA NAGOYA
AAA NAGOYA HAKONE
AAA HAKONE KYOTO
BBB SAPPORO NARA
CCC TOKYO HAKONE
CCC HAKONE OSAKA
DDD OSAKA TOKYO

I have solved this problem using Power Query. You may download my solution workbook from here.