# Category: POWERPIVOT

Here's a simple Sales data of a retail Store which sells Apple Products.  Since a customer can transact multiple times, there will be repetitions in the Cust ID column.  While Cust ID 123 and 782 purchased multiple products from the same Store in one transaction, Cust ID 53 purchased multiple products from different stores (Store ID 165 and 45) [and therefore the Order ID's are also different (Order ID 2 and 6)].

On this small sample, one may want to identify "Cross and up selling opportunities" i.e. one may want to know which are the Customers that can be approached for selling more products to.  So for e.g. one may want to know which Customers have bought only one product so far.  A case in point being the Apple Watch - Customer ID 2442 and 428 bought only this product.  The other Customers who bought the Apple Watch also bought atleast one more product.  Therefore, Cust ID 2442 and 428 could be approached for buying other products as well.

Solving this via conventional Excel formulas and Pivot Tables would prove to be a formidable challenge.  I have solved this problem using a Data visualisation software from Microsoft called PowerBI desktop (it can be downloaded free from the Microsoft website).  This problem can also be solved in MS Excel using Power Query and Power Pivot.

There are 3 sections in the image below - Table at the top (First Table), slicers at the right and another Table at the bottom (Second Table).

Interpretation of First Table

1. 1 appearing at the intersection of APPLE TV (row labels) and APPLE TV (column labels) represents that there is 1 customer who bought the APPLE TV
1. 1 appeaing at the intersection of APPLE TV (row labels) and MACBOOK AIR (column labels) represents that the 1 customer who bought the APPLE TV also bought the MACBOOK AIR
2. 4 appearing at the intersection of APPLE WATCH (row labels) and APPLE WATCH (column labels) represents that there are 4 customers who bought the APPLE WATCH
1. 1 appearing in other columns of the same row represents other products which those customers bought
2. When one right click's on APPLE WATCH and selects "Drill down", one will be able to see the Customers who bought the other products as well.
1. Customer 53 bought the APPLE WATCH, AIRPORT and IPHONE 8S.  Customer 123 bought APPLE WATCH, IPHONE X and IPOD
2. Customers 2442 and 428 did not buy any other product

Interpretation of Second Table

This table shows a list of Customers (and their transaction details) who bought only and only that one product selected by the user in the filter section (see the red oval selection in the image).  So these two customers could be approached for selling more products to.

Assume a three column dataset with Patient ID, Smoking Status and Review Date

 PatientID SmokingStatus ReviewDate P1 10-03-2018 P1 9 09-03-2018 P1 1 08-03-2018 P1 4 07-03-2018 P2 9 10-03-2018 P2 9 09-03-2018 P2 9 08-03-2018 P2 9 07-03-2018 P3 2 10-03-2018 P3 09-03-2018 P3 9 08-03-2018 P4 9 10-03-2018 P4 1 09-03-2018 P4 4 08-03-2018

The objective is the create another 3 column dataset with the following conditions:

1. If the patient's latest smoking status is other than Blank or 9, then consider that as the smoking status of the patient; and
2. If the patient's latest smoking status is blank or 9, then consider the previous smoking status that is not blank or 9; and
3. If the patient's smoking status is blank or 9 on all dates, then consider the smoking status as 9

The expected result is:

 PatientID Last date when the smoking status was other than 9 or Blank Smoking status on that date P1 08-Mar-18 1 P2 10-Mar-18 9 P3 10-Mar-18 2 P4 09-Mar-18 1

I have solved this question using 3 methods - PowerPivot, Advanced Filters and formulas.  You may download my solution workbook from here.

Visualise a 5 column dataset as show below.  This is a very small sample of the actual dataset.  It shows the date on which supplies were received for each item from Vendors and whether those supplies had errors in them.  Finally those identified errors have been bucketed into relevant categories.  The Item ID# is a code assigned by the Client Organisation.  The format of the ID# is X [Last two digits of year of receipt] [Four digit Number].  For items received from a certain vendor, the last four digits of any item will always be in ascending order of the Date received.  In other words. for items received from a certain vendor, the last 4 digits of an item received on 13 January 2017 will always be greater than the last 4 digits of an item received on 11 January 2017 from the same vendor.  There will never be repetitions in the Item ID# column.

 Vendor Name Item ID# Date Received Error with Item? Error Category California X170016 1-16-2017 No California X170014 1-13-2017 Yes Labeling Error California X170015 1-13-2017 Yes Packaging Error California X170008 1-9-2017 Yes Quality Issue California X170003 1-2-2017 No California X160645 12-26-2016 Yes Packaging Error California X160646 12-26-2016 No California X160644 12-25-2016 Yes Labeling Error California X160638 12-20-2016 Yes Quality Issue California X160633 12-15-2016 No California X160626 12-8-2016 No California X160625 12-7-2016 Yes Packaging Error California X160624 12-5-2016 Yes Labeling Error California X160618 11-23-2016 Yes Quality Issue California X160613 11-13-2016 No California X160606 10-30-2016 No

The objective is to compute the error rate by vendor and Error category for the 10 most recent transaction dates with that specific vendor.  So, for vendor Name "California" and  Error category as "Packing Error", this ratio should be computed as = Number of packing Errors on 10 most recent dates/10.

Here is a simple snapshot of the Data for California.  I have filtered the dataset where Vendor Name is California and then sorted the Date received column in descending order.  Please note that when i filter the dataset on California, a lot more rows are returned.  I am only showing the Top 10 rows here because that is what is important for solving this question.

 Vendor Name Item ID# Date Received Error with Item? Error Category California X170016 1-16-2017 No California X170014 1-13-2017 Yes Labeling Error California X170015 1-13-2017 Yes Packaging Error California X170008 1-9-2017 Yes Quality Issue California X170003 1-2-2017 No California X160645 12-26-2016 Yes Packaging Error California X160646 12-26-2016 No California X160644 12-25-2016 Yes Labeling Error California X160638 12-20-2016 Yes Quality Issue California X160633 12-15-2016 No

The expected result is:

 Row Labels Labeling Error Packaging Error Quality Issue Factory Error Boise 30.00% California 20.00% 20.00% 20.00% Detroit 70.00% 30.00% Ekalaka 20.00% 20.00%

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Here is a simple 3 column dataset showing Categories, Date and Value

 Catagorie Date Value Fish 08-12-2015 6 Crab 05-12-2015 7 Crab 04-12-2015 6 Bird 27-11-2015 4 Snow 25-11-2015 10 Cat 21-11-2015 7 Dog 12-11-2015 5 Dog 28-10-2015 5 Fish 12-10-2015 3 Bird 11-10-2015 9 Dog 22-09-2015 9 Crab 17-08-2015 8 Cat 11-08-2015 1 Fish 20-07-2015 5 Crab 03-07-2015 2 Fish 02-06-2015 8 Fish 26-05-2015 9 Dog 14-05-2015 4 Snow 07-05-2015 7 Bird 03-05-2015 9 Cat 20-04-2015 10 Cat 15-04-2015 2 Snow 13-04-2015 3 Crab 29-03-2015 5 Crab 23-03-2015 2 Bird 14-03-2015 5 Cat 14-03-2015 1 Dog 26-02-2015 9 Fish 16-02-2015 4 Fish 08-02-2015 6 Bird 18-01-2015 1 Snow 06-01-2015 10

The objective is to compute category wise average of values against the 5 most recent dates.  So the expected result is:

 Categorie Category wise average values on 5 most recent dates Bird 5.60 Cat 4.20 Crab 5.60 Dog 6.40 Fish 6.20 Snow 7.50

So for the Category of Fish (dates are sorted in descending order), the average should be 6.2 (average of 6,3,5,8,9)

 Catagorie Date Value Fish 08-12-2015 6 Fish 12-10-2015 3 Fish 20-07-2015 5 Fish 02-06-2015 8 Fish 26-05-2015 9 Fish 16-02-2015 4 Fish 08-02-2015 6

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Imagine a multi column exam invigilation schedule with the following information

1. S. No.
2. Name of staff
3. Designation of staff member
4. Two columns for each day on which there is an exam - one for Morning and another for Afternoon
5. A * under each column if that particular staff member has to be an invigilator during that time period
 S. No. Name of staff Designation 09/11/2017-Morning 09/11/2017-Afternoon 10/11/2017-Morning 10/11/2017-Afternoon 1 Tom Lecturer * * * 2 Sam Lecturer * *

This dataset stretches into many more columns.  So if the exams last for 15 days, there will be 30 columns.  The objective is to condense the column expanding dataset into a 5 column one - S. No., Name of Staff, Designation, Morning and Evening.  Under the Morning and Evening columns, the different dates have to be separated by commas in that one single cell.  The expected result is:

 Name of staff S. No. Designation Morning Afternoon Tom 1 Lecturer 09/11/2017, 10/11/2017, 11/11/2017, 13/11/2017, 14/11/2017, 16/11/2017, 17/11/2017 10/11/2017, 13/11/2017, 14/11/2017, 15/11/2017, 16/11/2017, 17/11/2017 Sam 2 Lecturer 09/11/2017, 13/11/2017, 15/11/2017, 16/11/2017 09/11/2017

I can think of 3 ways to solve this problem. Here's a brief about each of those methods:

1. Using Power Query and Excel's functions - Using Power Query, one can first unpivot the data and then use the TEXTJOIN() function.  This would be an array formula.
2. Using Power Query only - Using Power Query, one can first unpivot the data and then modify the code to concatenate all dates in a single cell for that invigilator.  One can then Pivot the data in Power Query itself
3. Using Power Query and PowerPivot - Using Power Query, one can first unpivot the data and then use the CONCATENATEX() function of the PowerPivot.

As you can observe, the common thread in all solutions is that one has to first unpivot the dataset.

You may refer to all my 3 solutions in this workbook.

Here's a dataset with 3 columns - Patient Name, Date of admission and Duration (days).

 Patient Date of admission Duration (day) A 10-10-2017 25 B 20-10-2017 6 C 23-10-2017 12 D 29-10-2017 9

The objective is to split the hospitalization per patient into different months to determine each month's revenue accrual.  The expected result is

 Length of stay Month Patient October November Grand Total A 21 4 25 B 6 6 C 8 4 12 D 2 7 9 Grand Total 37 15 52

I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook here.

Here's a simple 3 column table showing Date, Project name (Cat.) and Status of the project.  Each project can have multiple status entries on different dates.  So as you can observe, project "alpha_9383993" was In Progress on Oct 2, 2017, remained so on October 5, 2017 but was completed on October 6, 2017.

 Date Cat. Status 02-Oct-17 alpha_9383993 In Progress 03-Oct-17 Pulse_9387388 In Progress 04-Oct-17 Pulse_9387388 Rework 05-Oct-17 alpha_9383993 In Progress 06-Oct-17 alpha_9383993 Completed 07-Oct-17 Pulse_9387388 Completed 08-Oct-17 Oppo_tes_9383 In Progress 09-Oct-17 Oppo_Max_8977 Rework

The objective is to determine the count of projects by Status as per the most recent status of every project.  So the expected result is:

 Row Labels measure 2 Completed 2 In Progress 1 Rework 1

The result for In Progress should be one because there is only one such project - Oppo_tes_9383.  Project alpha_9383993 should not be counted because it was completed on October 6, 2017.  Likewise the result for Rework should be one because there is only one such project - Oppo_Max_8977.  Project Pulse_9387388 should not be counted because it was completed on October 7,2017.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

Here's a simple dataset showing the Date of sale, Customer Name and Sales amount.

 Date Customer Name Sales amount 12-03-2017 A 1 12-03-2017 A 2 12-03-2017 A 3 12-03-2017 B 4 12-03-2017 B 5 12-03-2017 B 6 12-03-2017 B 7 12-03-2017 B 8 13-03-2017 A 1 13-03-2017 A 1

The objective is to determine Customer wise:

1. Highest revenue; and
2. Date on which that highest revenue was earned

The expected result is

 Row Labels Highest revenue Date of highest revenue A 6 12-Mar-2017 B 30 12-Mar-2017 Grand Total 36

So, for A, revenue earned on March 12, 2017 is 6 and on March 13, 2017 is 2.  Since the higher of the two is 6, that is the result which should appear in the Pivot Table.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

Here's a simple two column dataset

 Comment Identifier Intervals A 3pm-6pm A 9pm-12pm S 3pm-6pm S 3pm-6pm S 9pm-12pm A 9pm-12pm S 9pm-12pm D 3pm-6pm A 9pm-12pm A 9pm-12pm A 9pm-12pm A 3pm-6pm A 3pm-6pm

For identifiers listed in column A, there are time intervals in column B. Note that for a certain identifier, a time interval can appear multiple times. The objective is two-fold:

1. For each identifier, show the time interval which appears most frequently; and
2. For each identifier, compute the count of the time interval which appears most frequently

The expected result is:

As you can observe, the result for S is two time periods - 3pm-6pm and 9pm-12pm.  This is because each of them appears twice.

Here's a simple 4 column dataset

 Bacterin Donor# Recovery Agency Date Donor Received DONOR STATUS B050001 1 09-06-2005 00:00 ACCEPT B050002 3 09-06-2005 00:00 ACCEPT B050003 1 09-06-2005 00:00 ACCEPT B050004 1 09-06-2005 00:00 ACCEPT B050005 1 09-06-2005 00:00 ACCEPT B050006 1 09-06-2005 00:00 ACCEPT B050007 1 09-06-2005 00:00 ACCEPT B050008 4 09-06-2005 00:00 ACCEPT

The objective is to determine "The number of Bacterin Donor #'s which fall within January 1, 2016 and September 30, 2017 for Agency number 18 split by Donor Status". To answer this question. one can build a Pivot Table by dragging Recovery agency and Date to the Row labels, Donor Status to the column labels and Bacteria Donor # to the value area section. One can then group the dates by years and months and apply a criteria of 18 on the Recovery agency.  One can then apply a Between filter on dates.

As you can see in the image above August 2016 and May 2017 are missing because there is no data for that period. However one would like to show those 2 months as well. To do so, one can go to Pivot Table Options > Display and click on "Show items with no data on rows". However, as you can see below, that box is greyed out.

The only other option left is to go to Field Settings > Layout & Print and check the box for "Show items with no data"

On doing so, the problem is that all dates across all months and years show in the Pivot Table despite the Between Date filter staying intact:

So the question is, how can one show even those months (within a chosen date range) in which there is no data.  In other words, one has to think of a way to activate the check box of "Show items with no data on rows" available under Pivot Table Options > Display.

I have solved this problem with the help of the PowerPivot.  You may download my solution workbook from here.