Tags: SUMMARIZE

Analyse membership changes from year to year

{2 Comments}

Assume a simple 4 column dataset as shown below.  This data shows which ID had which type of subscription in which year.  So ID A, which started as a "Free" subscriber in 2018 switched to a "Premium" subscriber in 2019 and then churned out in 2020.  Likewise, ID D which started as a "Pro" subscriber in 2018, churned out in 2019 but returned as a "Free" subscriber in 2020.
The objective is to study how subscribers switched from one subscription type to another across year.  So the expected result should look like this


I have solved this question using the PowerPivot.  You may download my MS Excel workbook from here.

Count tasks by status

{0 Comments}

Assume a simple 3 column dataset as shown below - the date of each task and the status of that task.
The objective is to get the status wise count of tasks by the last time stamp.  So for the Status "To-do", the count should be 2 - Task ABC and DEF.  Only these two tasks on their last time stamp have the status as "To-do".  Tasks CED and ADR should not be counted because their last time stamp had a status other than "To-do".  So the final expected result in MS Excel is:

Since the original data is being fetched from an external data source, no additional tables or columns can be created from/in the source data table.

The final result in PBI Desktop is this
You may download my PowerPivot solution workbook from here and PBI Desktop solution file from here.

Customer analysis by Country and time period

{2 Comments}

Here is a Sales dataset of 8 columns and 29 rows.  It basically details the revenue earned and cash collected by service type, Customer, Country and Period.  For a selected Country and time period, there could be customers availing of both services or of any 1 service.


There are 2 broad questions that one may want to get answers to:

  1. Determine the number of customers who availed of a certain number of services
  2. Determine customers with whom business was forged for the first time and those who churned out

For a chosen country and Year/Month, the first question stated above further sub-divides into:

  1. How may customers availed of both services - Consultancy and Implementation
  2. How may customers availed of only one of the two services

So if a user selects the Country as India and Year/Month as January 2015, then Customers who availed of both services would be 1,3 and 4.  Note that Customer 2 should not be considered (even though he/she availed of both services) because the revenue earned from one of the services (Implementation) was nil.  For the same selection (India and January 2015), the Customers who availed of only 1 service would be Customer 2 - this customer availed of only the Consultancy service (Revenue was earned from this Customer only for this service).  After applying a filter on the source dataset, the rows for India and January 2015 are:

The expected result is shown below in PowerBI desktop software.  If you are not concerned with who those customers are (you just want the count), then you may simply remove the Customer Name field from the visual.

The second question is to determine the number of new and lost customers.  If a customer was not in the database in any prior month, the customer is identified as new.  To clarify, a customer who availed of the Consultancy service in a prior month also availed of the Implementation service for the first time in the current month would not be counted as a new customer.  If a customer ceases to generate revenue in any month, the customer would be counted as lost (churned) in that month.  So when USA is selected in the Country slicer and Year/Month is February 2015, the expected result is:

I have solved this question with the help of the PowerPivot.  You may download my PowerBI desktop solution file from here and source Excel workbook from here.  This problem can also be solved in MS Excel using the PowerPivot.

Compute Relative Size Factor per vendor

{2 Comments}

Relative size factor (RSF) is a test to identify anomalies where the largest amount for subsets in a given key is outside the norm for those subsets. This test compares the top two amounts for each subset and calculates the RSF for each. In order to identify potential fraudulent activities in invoice payment data, one utilizes the largest and the second-largest amounts to calculate a ratio based on purchases that are grouped by vendors.  You may read more on this topic here.

Here is a 3 column dataset.  The first column is Vendor Number, the second is Invoice number and last is invoice amount.  There can be multiple invoices per vendor.  The objective is to determine the highest invoice value for a vendor and divide that by the second highest invoice value for that same vendor to get a ratio.  The same needs to be done for all vendors.  An interesting case in the dataset below is Vendor_No V4439 - there are 2 instances of highest value for this vendor i.e. 25,378.30 and another 2 instances of second highest value i.e. 24,068.25.  The RSF for this case will be 25,378.30/24,068.25.  If there is no instance of second highest value for a vendor, then the result should be 0.

The expected result is:

I have solved this question with the help of the PowerPivot.  You may download my solution workbook from here.

Analyse free flowing text data or user entered remarks from multiple perspectives

{0 Comments}

Here is a 2 column dataset - UserID in column A and Remarks in Column B.  This dataset basically tabulates the remarks/comments shared by different users.  Entries in the Remarks column are basically free flowing text entries which have the following inconsistencies/nuances:

  1. Users reported multiple errors which are separated by comma, Alt+Enter (same line within the cell) and numbered bullets
  2. Users committed spelling mistakes (see arrows in Table1)
  3. A user ID may be repeated in column A

Given this dataset, one may want to "hunt" for specific "keyword Groups" (column E above) in each user remark cell and get meaningful insights.  Some questions which one would like to have answers to are:

  1. How may users reported each type of keyword Group - "How may users used the Unresponsive keyword?".  See Pivot Table1 below.
  2. Which are the keyword Groups that each user reported - "Which are the different keyword groups reported by UserID A004?".  See Pivot Table2 below.
  3. How many users reported each of the different keyword Groups - "How many users reported all 3 problems of Slow, unresponsiveness and crash".  See Pivot Table 3 below.
  4. How may users who used this keyword group also used this keyword group - "How many users who reported Crash also reported Unresponsive?".  See Pivot Table 4 below.

This was quite a formidable challenge to solve because of spelling mistakes and multiple keywords reported in each cell.  I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook from here.

Show Project wise status in a Pivot Table

{0 Comments}

Visualise a simple 6 column Table as shown below - Project Name and the finish date for each of the 5 stages that the projects go through.  Each project goes through 5 stages - Requirement (Req), Development (Dev), UAT, Implement and Warranty.

The objective is to report on the status of each project at the end of each month based on which stage is/was completed in that month.  So, if a given project's requirements are completed in January and development completes some time in March, the one would expect the output of the report to show the project's status in January and February as "Req" and in March as "Dev" respectively.  February should also show "Req" because the next stage was completed only in March (although it may have started in January).  If multiple stages complete in one month, then the report should display only the most recently completed stage.  So, if Project A completed both Requirements and Development stages in January, the report should show only "Dev" as the stage completed in January.

For the data shared above, the expected result is:

You may download my solution workbook from here.

Identify Customers that Organisations can upsell or cross sell their products to

{0 Comments}

Here's a simple Sales data of a retail Store which sells Apple Products.  Since a customer can transact multiple times, there will be repetitions in the Cust ID column.  While Cust ID 123 and 782 purchased multiple products from the same Store in one transaction, Cust ID 53 purchased multiple products from different stores (Store ID 165 and 45) [and therefore the Order ID's are also different (Order ID 2 and 6)].

On this small sample, one may want to identify "Cross and up selling opportunities" i.e. one may want to know which are the Customers that can be approached for selling more products to.  So for e.g. one may want to know which Customers have bought only one product so far.  A case in point being the Apple Watch - Customer ID 2442 and 428 bought only this product.  The other Customers who bought the Apple Watch also bought atleast one more product.  Therefore, Cust ID 2442 and 428 could be approached for buying other products as well.

Solving this via conventional Excel formulas and Pivot Tables would prove to be a formidable challenge.  I have solved this problem using a Data visualisation software from Microsoft called PowerBI desktop (it can be downloaded free from the Microsoft website).  This problem can also be solved in MS Excel using Power Query and Power Pivot.

There are 3 sections in the image below - Table at the top (First Table), slicers at the right and another Table at the bottom (Second Table).

Interpretation of First Table

  1. 1 appearing at the intersection of APPLE TV (row labels) and APPLE TV (column labels) represents that there is 1 customer who bought the APPLE TV
    1. 1 appeaing at the intersection of APPLE TV (row labels) and MACBOOK AIR (column labels) represents that the 1 customer who bought the APPLE TV also bought the MACBOOK AIR
  2. 4 appearing at the intersection of APPLE WATCH (row labels) and APPLE WATCH (column labels) represents that there are 4 customers who bought the APPLE WATCH
    1. 1 appearing in other columns of the same row represents other products which those customers bought
    2. When one right click's on APPLE WATCH and selects "Drill down", one will be able to see the Customers who bought the other products as well.
      1. Customer 53 bought the APPLE WATCH, AIRPORT and IPHONE 8S.  Customer 123 bought APPLE WATCH, IPHONE X and IPOD
      2. Customers 2442 and 428 did not buy any other product

Interpretation of Second Table

This table shows a list of Customers (and their transaction details) who bought only and only that one product selected by the user in the filter section (see the red oval selection in the image).  So these two customers could be approached for selling more products to.

You may download my PBI desktop file from here.

Determine the lowest bidding vendor(s) for each product in a Pivot Table

{0 Comments}

Imagine a dataset like this.  This dataset shows vendors that submitted proposals for supplying various parts to a Company.  There is one column for each of the twelve months.

untitled

Via a simple Pivot Table, one can determine the lowest bidding vendor per product (part) for any chosen month.  However, one may also want to know the names of those vendors for each product (as seen in column G below).  Notice, that Vendor 2 and Vendor 3 submitted the lowest bid for Product 1 and therefore both names should appear in the result.

untitled

I have solved this problem using PowerPivot and Power Query a.k.a. Data > Get & Transform in Excel 2016.  You may download my solution workbook from here.

Show sales only for corresponding months in prior years

{2 Comments}

Refer to this simple Sales dataset

untitled

The objective is to create a simple matrix with months in the row labels, years in the column labels and sales figures in the value area section.  The twist in the question is that for years prior to the current year (2018 in this dataset), sales should only appear till the month for which there is data for the current year.  For e.g., for 2018, data is only till Month 4 and therefore for prior years as well, data should only appear till Month 4.  As and when Sales data gets added below row 17, data for prior years should also go up to that month.

The expected result is

untitled1

You may download my PBI file from here. The same solution can be obtained in Excel as well (using Power Query and PowerPivot).

Compute transaction fee based on a tiered pricing model

{0 Comments}

Consider a simple dataset as shown below:
untitled
For each tier, the tier rate is incrementally applied to the volume within the tier volume range.  Given the following transaction volumes, one may want to compute the transaction fee
untitled1
The expected result is shown below
untitled2

As one can observe, for a transaction value of 400,000, the fee has been computed as 3% on the first 100,000 and 2.5% on the next 300,000. You may download my solution workbook from here. In the file, I have shared 2 solutions - a conventional formula based one and a PowerPivot solution.

I have also solved a similar question here.