Category: POWERPIVOT

Identify Customers that Organisations can upsell or cross sell their products to

{0 Comments}

Here's a simple Sales data of a retail Store which sells Apple Products.  Since a customer can transact multiple times, there will be repetitions in the Cust ID column.  While Cust ID 123 and 782 purchased multiple products from the same Store in one transaction, Cust ID 53 purchased multiple products from different stores (Store ID 165 and 45) [and therefore the Order ID's are also different (Order ID 2 and 6)].

On this small sample, one may want to identify "Cross and up selling opportunities" i.e. one may want to know which are the Customers that can be approached for selling more products to.  So for e.g. one may want to know which Customers have bought only one product so far.  A case in point being the Apple Watch - Customer ID 2442 and 428 bought only this product.  The other Customers who bought the Apple Watch also bought atleast one more product.  Therefore, Cust ID 2442 and 428 could be approached for buying other products as well.

Solving this via conventional Excel formulas and Pivot Tables would prove to be a formidable challenge.  I have solved this problem using a Data visualisation software from Microsoft called PowerBI desktop (it can be downloaded free from the Microsoft website).  This problem can also be solved in MS Excel using Power Query and Power Pivot.

There are 3 sections in the image below - Table at the top (First Table), slicers at the right and another Table at the bottom (Second Table).

Interpretation of First Table

  1. 1 appearing at the intersection of APPLE TV (row labels) and APPLE TV (column labels) represents that there is 1 customer who bought the APPLE TV
    1. 1 appeaing at the intersection of APPLE TV (row labels) and MACBOOK AIR (column labels) represents that the 1 customer who bought the APPLE TV also bought the MACBOOK AIR
  2. 4 appearing at the intersection of APPLE WATCH (row labels) and APPLE WATCH (column labels) represents that there are 4 customers who bought the APPLE WATCH
    1. 1 appearing in other columns of the same row represents other products which those customers bought
    2. When one right click's on APPLE WATCH and selects "Drill down", one will be able to see the Customers who bought the other products as well.
      1. Customer 53 bought the APPLE WATCH, AIRPORT and IPHONE 8S.  Customer 123 bought APPLE WATCH, IPHONE X and IPOD
      2. Customers 2442 and 428 did not buy any other product

Interpretation of Second Table

This table shows a list of Customers (and their transaction details) who bought only and only that one product selected by the user in the filter section (see the red oval selection in the image).  So these two customers could be approached for selling more products to.

You may download my PBI desktop file from here.

Determine the most recent status after satisfying certain conditions

{0 Comments}

Assume a three column dataset with Patient ID, Smoking Status and Review Date

PatientID SmokingStatus ReviewDate
P1 10-03-2018
P1 9 09-03-2018
P1 1 08-03-2018
P1 4 07-03-2018
P2 9 10-03-2018
P2 9 09-03-2018
P2 9 08-03-2018
P2 9 07-03-2018
P3 2 10-03-2018
P3 09-03-2018
P3 9 08-03-2018
P4 9 10-03-2018
P4 1 09-03-2018
P4 4 08-03-2018

The objective is the create another 3 column dataset with the following conditions:

  1. If the patient's latest smoking status is other than Blank or 9, then consider that as the smoking status of the patient; and
  2. If the patient's latest smoking status is blank or 9, then consider the previous smoking status that is not blank or 9; and
  3. If the patient's smoking status is blank or 9 on all dates, then consider the smoking status as 9

The expected result is:

PatientID Last date when the smoking status was other than 9 or Blank Smoking status on that date
P1 08-Mar-18 1
P2 10-Mar-18 9
P3 10-Mar-18 2
P4 09-Mar-18 1

I have solved this question using 3 methods - PowerPivot, Advanced Filters and formulas.  You may download my solution workbook from here.

Perform an aggregation on Top x items after satisfying certain conditions

{0 Comments}

Visualise a 5 column dataset as show below.  This is a very small sample of the actual dataset.  It shows the date on which supplies were received for each item from Vendors and whether those supplies had errors in them.  Finally those identified errors have been bucketed into relevant categories.  The Item ID# is a code assigned by the Client Organisation.  The format of the ID# is X [Last two digits of year of receipt] [Four digit Number].  For items received from a certain vendor, the last four digits of any item will always be in ascending order of the Date received.  In other words. for items received from a certain vendor, the last 4 digits of an item received on 13 January 2017 will always be greater than the last 4 digits of an item received on 11 January 2017 from the same vendor.  There will never be repetitions in the Item ID# column.

Vendor Name Item ID# Date Received Error with Item? Error Category
California X170016 1-16-2017 No
California X170014 1-13-2017 Yes Labeling Error
California X170015 1-13-2017 Yes Packaging Error
California X170008 1-9-2017 Yes Quality Issue
California X170003 1-2-2017 No
California X160645 12-26-2016 Yes Packaging Error
California X160646 12-26-2016 No
California X160644 12-25-2016 Yes Labeling Error
California X160638 12-20-2016 Yes Quality Issue
California X160633 12-15-2016 No
California X160626 12-8-2016 No
California X160625 12-7-2016 Yes Packaging Error
California X160624 12-5-2016 Yes Labeling Error
California X160618 11-23-2016 Yes Quality Issue
California X160613 11-13-2016 No
California X160606 10-30-2016 No

The objective is to compute the error rate by vendor and Error category for the 10 most recent transaction dates with that specific vendor.  So, for vendor Name "California" and  Error category as "Packing Error", this ratio should be computed as = Number of packing Errors on 10 most recent dates/10.

Here is a simple snapshot of the Data for California.  I have filtered the dataset where Vendor Name is California and then sorted the Date received column in descending order.  Please note that when i filter the dataset on California, a lot more rows are returned.  I am only showing the Top 10 rows here because that is what is important for solving this question.

Vendor Name Item ID# Date Received Error with Item? Error Category
California X170016 1-16-2017 No
California X170014 1-13-2017 Yes Labeling Error
California X170015 1-13-2017 Yes Packaging Error
California X170008 1-9-2017 Yes Quality Issue
California X170003 1-2-2017 No
California X160645 12-26-2016 Yes Packaging Error
California X160646 12-26-2016 No
California X160644 12-25-2016 Yes Labeling Error
California X160638 12-20-2016 Yes Quality Issue
California X160633 12-15-2016 No

The expected result is:

Row Labels Labeling Error Packaging Error Quality Issue Factory Error
Boise 30.00%
California 20.00% 20.00% 20.00%
Detroit 70.00% 30.00%
Ekalaka 20.00% 20.00%

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Compute the average of values against the 5 most recent dates of each Category

{0 Comments}

Here is a simple 3 column dataset showing Categories, Date and Value

Catagorie Date Value
Fish 08-12-2015 6
Crab 05-12-2015 7
Crab 04-12-2015 6
Bird 27-11-2015 4
Snow 25-11-2015 10
Cat 21-11-2015 7
Dog 12-11-2015 5
Dog 28-10-2015 5
Fish 12-10-2015 3
Bird 11-10-2015 9
Dog 22-09-2015 9
Crab 17-08-2015 8
Cat 11-08-2015 1
Fish 20-07-2015 5
Crab 03-07-2015 2
Fish 02-06-2015 8
Fish 26-05-2015 9
Dog 14-05-2015 4
Snow 07-05-2015 7
Bird 03-05-2015 9
Cat 20-04-2015 10
Cat 15-04-2015 2
Snow 13-04-2015 3
Crab 29-03-2015 5
Crab 23-03-2015 2
Bird 14-03-2015 5
Cat 14-03-2015 1
Dog 26-02-2015 9
Fish 16-02-2015 4
Fish 08-02-2015 6
Bird 18-01-2015 1
Snow 06-01-2015 10

The objective is to compute category wise average of values against the 5 most recent dates.  So the expected result is:

Categorie Category wise average values on 5 most recent dates
Bird 5.60
Cat 4.20
Crab 5.60
Dog 6.40
Fish 6.20
Snow 7.50

So for the Category of Fish (dates are sorted in descending order), the average should be 6.2 (average of 6,3,5,8,9)

Catagorie Date Value
Fish 08-12-2015 6
Fish 12-10-2015 3
Fish 20-07-2015 5
Fish 02-06-2015 8
Fish 26-05-2015 9
Fish 16-02-2015 4
Fish 08-02-2015 6

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Prepare an invigilation schedule for each teacher by different time periods

{3 Comments}

Imagine a multi column exam invigilation schedule with the following information

  1. S. No.
  2. Name of staff
  3. Designation of staff member
  4. Two columns for each day on which there is an exam - one for Morning and another for Afternoon
  5. A * under each column if that particular staff member has to be an invigilator during that time period
S. No. Name of staff Designation 09/11/2017-Morning 09/11/2017-Afternoon 10/11/2017-Morning 10/11/2017-Afternoon
1 Tom Lecturer * * *
2 Sam Lecturer * *

This dataset stretches into many more columns.  So if the exams last for 15 days, there will be 30 columns.  The objective is to condense the column expanding dataset into a 5 column one - S. No., Name of Staff, Designation, Morning and Evening.  Under the Morning and Evening columns, the different dates have to be separated by commas in that one single cell.  The expected result is:

Name of staff S. No. Designation Morning Afternoon
Tom 1 Lecturer 09/11/2017, 10/11/2017, 11/11/2017, 13/11/2017, 14/11/2017, 16/11/2017, 17/11/2017 10/11/2017, 13/11/2017, 14/11/2017, 15/11/2017, 16/11/2017, 17/11/2017
Sam 2 Lecturer 09/11/2017, 13/11/2017, 15/11/2017, 16/11/2017 09/11/2017

I can think of 3 ways to solve this problem. Here's a brief about each of those methods:

  1. Using Power Query and Excel's functions - Using Power Query, one can first unpivot the data and then use the TEXTJOIN() function.  This would be an array formula.
  2. Using Power Query only - Using Power Query, one can first unpivot the data and then modify the code to concatenate all dates in a single cell for that invigilator.  One can then Pivot the data in Power Query itself
  3. Using Power Query and PowerPivot - Using Power Query, one can first unpivot the data and then use the CONCATENATEX() function of the PowerPivot.

As you can observe, the common thread in all solutions is that one has to first unpivot the dataset.

You may refer to all my 3 solutions in this workbook.

Split total patient hospitalisation days into multiple months

{0 Comments}

Here's a dataset with 3 columns - Patient Name, Date of admission and Duration (days).

Patient Date of admission Duration (day)
A 10-10-2017 25
B 20-10-2017 6
C 23-10-2017 12
D 29-10-2017 9

The objective is to split the hospitalization per patient into different months to determine each month's revenue accrual.  The expected result is

Length of stay Month
Patient October November Grand Total
A 21 4 25
B 6 6
C 8 4 12
D 2 7 9
Grand Total 37 15 52

I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook here.

Determine the total number of projects by Status

{0 Comments}

Here's a simple 3 column table showing Date, Project name (Cat.) and Status of the project.  Each project can have multiple status entries on different dates.  So as you can observe, project "alpha_9383993" was In Progress on Oct 2, 2017, remained so on October 5, 2017 but was completed on October 6, 2017.

Date Cat. Status
02-Oct-17 alpha_9383993 In Progress
03-Oct-17 Pulse_9387388 In Progress
04-Oct-17 Pulse_9387388 Rework
05-Oct-17 alpha_9383993 In Progress
06-Oct-17 alpha_9383993 Completed
07-Oct-17 Pulse_9387388 Completed
08-Oct-17 Oppo_tes_9383 In Progress
09-Oct-17 Oppo_Max_8977 Rework

The objective is to determine the count of projects by Status as per the most recent status of every project.  So the expected result is:

Row Labels measure 2
Completed 2
In Progress 1
Rework 1

The result for In Progress should be one because there is only one such project - Oppo_tes_9383.  Project alpha_9383993 should not be counted because it was completed on October 6, 2017.  Likewise the result for Rework should be one because there is only one such project - Oppo_Max_8977.  Project Pulse_9387388 should not be counted because it was completed on October 7,2017.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

In a Pivot Table, compute highest revenue earned on any day from each customer and the date thereof

{0 Comments}

Here's a simple dataset showing the Date of sale, Customer Name and Sales amount.

Date Customer Name Sales amount
12-03-2017 A 1
12-03-2017 A 2
12-03-2017 A 3
12-03-2017 B 4
12-03-2017 B 5
12-03-2017 B 6
12-03-2017 B 7
12-03-2017 B 8
13-03-2017 A 1
13-03-2017 A 1

The objective is to determine Customer wise:

  1. Highest revenue; and
  2. Date on which that highest revenue was earned

The expected result is

Row Labels Highest revenue Date of highest revenue
A 6 12-Mar-2017
B 30 12-Mar-2017
Grand Total 36

So, for A, revenue earned on March 12, 2017 is 6 and on March 13, 2017 is 2.  Since the higher of the two is 6, that is the result which should appear in the Pivot Table.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

In a Pivot Table, show the most frequently appearing text entry by a certain parameter

{0 Comments}

Here's a simple two column dataset

Comment Identifier Intervals
A 3pm-6pm
A 9pm-12pm
S 3pm-6pm
S 3pm-6pm
S 9pm-12pm
A 9pm-12pm
S 9pm-12pm
D 3pm-6pm
A 9pm-12pm
A 9pm-12pm
A 9pm-12pm
A 3pm-6pm
A 3pm-6pm

For identifiers listed in column A, there are time intervals in column B. Note that for a certain identifier, a time interval can appear multiple times. The objective is two-fold:

  1. For each identifier, show the time interval which appears most frequently; and
  2. For each identifier, compute the count of the time interval which appears most frequently

The expected result is:

untitled

As you can observe, the result for S is two time periods - 3pm-6pm and 9pm-12pm.  This is because each of them appears twice.

You may download my solution from here.

Show months with no data which fall within a certain date range of a Pivot Table

{0 Comments}

Here's a simple 4 column dataset

Bacterin Donor# Recovery Agency Date Donor Received DONOR STATUS
B050001 1 09-06-2005 00:00 ACCEPT
B050002 3 09-06-2005 00:00 ACCEPT
B050003 1 09-06-2005 00:00 ACCEPT
B050004 1 09-06-2005 00:00 ACCEPT
B050005 1 09-06-2005 00:00 ACCEPT
B050006 1 09-06-2005 00:00 ACCEPT
B050007 1 09-06-2005 00:00 ACCEPT
B050008 4 09-06-2005 00:00 ACCEPT

The objective is to determine "The number of Bacterin Donor #'s which fall within January 1, 2016 and September 30, 2017 for Agency number 18 split by Donor Status". To answer this question. one can build a Pivot Table by dragging Recovery agency and Date to the Row labels, Donor Status to the column labels and Bacteria Donor # to the value area section. One can then group the dates by years and months and apply a criteria of 18 on the Recovery agency.  One can then apply a Between filter on dates.

untitled

As you can see in the image above August 2016 and May 2017 are missing because there is no data for that period. However one would like to show those 2 months as well. To do so, one can go to Pivot Table Options > Display and click on "Show items with no data on rows". However, as you can see below, that box is greyed out.

untitled1

The only other option left is to go to Field Settings > Layout & Print and check the box for "Show items with no data"

untitled3

On doing so, the problem is that all dates across all months and years show in the Pivot Table despite the Between Date filter staying intact:

untitled4

So the question is, how can one show even those months (within a chosen date range) in which there is no data.  In other words, one has to think of a way to activate the check box of "Show items with no data on rows" available under Pivot Table Options > Display.

untitled5

I have solved this problem with the help of the PowerPivot.  You may download my solution workbook from here.