Show sales only for corresponding months in prior years

{0 Comments}

Refer to this simple Sales dataset

untitled

The objective is to create a simple matrix with months in the row labels, years in the column labels and sales figures in the value area section.  The twist in the question is that for years prior to the current year (2018 in this dataset), sales should only appear till the month for which there is data for the current year.  For e.g., for 2018, data is only till Month 4 and therefore for prior years as well, data should only appear till Month 4.  As and when Sales data gets added below row 17, data for prior years should also go up to that month.

The expected result is

untitled1

You may download my PBI file from here. The same solution can be obtained in Excel as well (using Power Query and PowerPivot).

Filtering on 2 date fields within one Table

{0 Comments}

This table contains a list of all the inspections created and completed within different time periods.

untitled

The objective is to create two Tables from this single table - one showing the Accounts created within the chosen time period and another showing the those that were closed within the same time period.  Here are screenshots of the expected results.

untitled1 untitled2 untitled3 untitled4

You may download my PowerBI desktop solution workbook from here.  The same solution can be obtained in Excel as well (using Power Query and PowerPivot).

Compute transaction fee based on a tiered pricing model

{0 Comments}

Consider a simple dataset as shown below:
untitled
For each tier, the tier rate is incrementally applied to the volume within the tier volume range.  Given the following transaction volumes, one may want to compute the transaction fee
untitled1
The expected result is shown below
untitled2

As one can observe, for a transaction value of 400,000, the fee has been computed as 3% on the first 100,000 and 2.5% on the next 300,000. You may download my solution workbook from here. In the file, I have shared 2 solutions - a conventional formula based one and a PowerPivot solution.

I have also solved a similar question here.

Determine the most recent status after satisfying certain conditions

{0 Comments}

Assume a three column dataset with Patient ID, Smoking Status and Review Date

PatientID SmokingStatus ReviewDate
P1 10-03-2018
P1 9 09-03-2018
P1 1 08-03-2018
P1 4 07-03-2018
P2 9 10-03-2018
P2 9 09-03-2018
P2 9 08-03-2018
P2 9 07-03-2018
P3 2 10-03-2018
P3 09-03-2018
P3 9 08-03-2018
P4 9 10-03-2018
P4 1 09-03-2018
P4 4 08-03-2018

The objective is the create another 3 column dataset with the following conditions:

  1. If the patient's latest smoking status is other than Blank or 9, then consider that as the smoking status of the patient; and
  2. If the patient's latest smoking status is blank or 9, then consider the previous smoking status that is not blank or 9; and
  3. If the patient's smoking status is blank or 9 on all dates, then consider the smoking status as 9

The expected result is:

PatientID Last date when the smoking status was other than 9 or Blank Smoking status on that date
P1 08-Mar-18 1
P2 10-Mar-18 9
P3 10-Mar-18 2
P4 09-Mar-18 1

I have solved this question using 3 methods - PowerPivot, Advanced Filters and formulas.  You may download my solution workbook from here.

Perform an aggregation on Top x items after satisfying certain conditions

{0 Comments}

Visualise a 5 column dataset as show below.  This is a very small sample of the actual dataset.  It shows the date on which supplies were received for each item from Vendors and whether those supplies had errors in them.  Finally those identified errors have been bucketed into relevant categories.  The Item ID# is a code assigned by the Client Organisation.  The format of the ID# is X [Last two digits of year of receipt] [Four digit Number].  For items received from a certain vendor, the last four digits of any item will always be in ascending order of the Date received.  In other words. for items received from a certain vendor, the last 4 digits of an item received on 13 January 2017 will always be greater than the last 4 digits of an item received on 11 January 2017 from the same vendor.  There will never be repetitions in the Item ID# column.

Vendor Name Item ID# Date Received Error with Item? Error Category
California X170016 1-16-2017 No
California X170014 1-13-2017 Yes Labeling Error
California X170015 1-13-2017 Yes Packaging Error
California X170008 1-9-2017 Yes Quality Issue
California X170003 1-2-2017 No
California X160645 12-26-2016 Yes Packaging Error
California X160646 12-26-2016 No
California X160644 12-25-2016 Yes Labeling Error
California X160638 12-20-2016 Yes Quality Issue
California X160633 12-15-2016 No
California X160626 12-8-2016 No
California X160625 12-7-2016 Yes Packaging Error
California X160624 12-5-2016 Yes Labeling Error
California X160618 11-23-2016 Yes Quality Issue
California X160613 11-13-2016 No
California X160606 10-30-2016 No

The objective is to compute the error rate by vendor and Error category for the 10 most recent transaction dates with that specific vendor.  So, for vendor Name "California" and  Error category as "Packing Error", this ratio should be computed as = Number of packing Errors on 10 most recent dates/10.

Here is a simple snapshot of the Data for California.  I have filtered the dataset where Vendor Name is California and then sorted the Date received column in descending order.  Please note that when i filter the dataset on California, a lot more rows are returned.  I am only showing the Top 10 rows here because that is what is important for solving this question.

Vendor Name Item ID# Date Received Error with Item? Error Category
California X170016 1-16-2017 No
California X170014 1-13-2017 Yes Labeling Error
California X170015 1-13-2017 Yes Packaging Error
California X170008 1-9-2017 Yes Quality Issue
California X170003 1-2-2017 No
California X160645 12-26-2016 Yes Packaging Error
California X160646 12-26-2016 No
California X160644 12-25-2016 Yes Labeling Error
California X160638 12-20-2016 Yes Quality Issue
California X160633 12-15-2016 No

The expected result is:

Row Labels Labeling Error Packaging Error Quality Issue Factory Error
Boise 30.00%
California 20.00% 20.00% 20.00%
Detroit 70.00% 30.00%
Ekalaka 20.00% 20.00%

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Compute the average of values against the 5 most recent dates of each Category

{0 Comments}

Here is a simple 3 column dataset showing Categories, Date and Value

Catagorie Date Value
Fish 08-12-2015 6
Crab 05-12-2015 7
Crab 04-12-2015 6
Bird 27-11-2015 4
Snow 25-11-2015 10
Cat 21-11-2015 7
Dog 12-11-2015 5
Dog 28-10-2015 5
Fish 12-10-2015 3
Bird 11-10-2015 9
Dog 22-09-2015 9
Crab 17-08-2015 8
Cat 11-08-2015 1
Fish 20-07-2015 5
Crab 03-07-2015 2
Fish 02-06-2015 8
Fish 26-05-2015 9
Dog 14-05-2015 4
Snow 07-05-2015 7
Bird 03-05-2015 9
Cat 20-04-2015 10
Cat 15-04-2015 2
Snow 13-04-2015 3
Crab 29-03-2015 5
Crab 23-03-2015 2
Bird 14-03-2015 5
Cat 14-03-2015 1
Dog 26-02-2015 9
Fish 16-02-2015 4
Fish 08-02-2015 6
Bird 18-01-2015 1
Snow 06-01-2015 10

The objective is to compute category wise average of values against the 5 most recent dates.  So the expected result is:

Categorie Category wise average values on 5 most recent dates
Bird 5.60
Cat 4.20
Crab 5.60
Dog 6.40
Fish 6.20
Snow 7.50

So for the Category of Fish (dates are sorted in descending order), the average should be 6.2 (average of 6,3,5,8,9)

Catagorie Date Value
Fish 08-12-2015 6
Fish 12-10-2015 3
Fish 20-07-2015 5
Fish 02-06-2015 8
Fish 26-05-2015 9
Fish 16-02-2015 4
Fish 08-02-2015 6

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Combine unique entries from a range of cells after satisfying a condition

{0 Comments}

Here is a simple three column dataset showing Item ID, Catalogue Name and Catalogue Year

Item ID Catalogue Name Catalogue Year
A Spring 2012
A Spring 2016
A Spring 2017
B Fall 2014
B Fall 2014
B Fall 2016

The second and third last rows are clear duplicate rows. The task is to create a fourth column (titled as All Year) in this dataset with concatenated entries from the Catalogue Year column for each Item ID. So the expected result is:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2016
B Fall 2014 2014, 2016
B Fall 2016 2014, 2016

Notice that for Item ID B, the result in every cell of the fourth column is 2014, 2016 and not 2014,2014,2016.  So while joining (concatenating in technical MS Excel terminology) entries from the "Catalogues Year" column in the "All Year" column, only unique entries for that Item ID have to be considered.

Excel 2016 has a TEXTJOIN() function which gets use close to the final result but does not take into account the unique entries only (it takes all entries into account all entries).  On using the TEXTJOIN() array formula in a spare column, here is the result i got:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2014, 2016
B Fall 2014 2014, 2014, 2016
B Fall 2016 2014, 2014, 2016

Notice that for Item ID B, the result is 2014, 2014, 2017 and not 2014, 2017.  So the TEXTJOIN() function does not work.

I have solved this problem using Power Query (Get and Transform in Excel 2016).  You may download my solution workbook from here.

Restructure the layout of datasets

{0 Comments}

I'd like to discuss 2 cases of restructuring layouts of datasets:

Case 1

Consider the 7 columns dataset below.  For every Doc Number, there are 6 columns - 3 columns for the name of the employee and another 3 columns for the Percent of each employee.

Doc Number Emp 1 Emp 1 % Emp 2 Emp 2 % Emp 3 Emp 3 %
111111 John 5% Eric 6% Jane 7%
222222 Mary 10% Joe 4% Jill 25%

The objective is to restructure the data layout to a 3 column one  - Doc Number, Employee Name and Employee Percent.  Such a revised data layout would enable filtering, Pivoting, charting etc.  Furthermore, the solution should be dynamic for new columns added.  The expected data layout is:

Doc Number Employee Percent
111111 John 5%
111111 Eric 6%
111111 Jane 7%
222222 Mary 10%
222222 Joe 4%
222222 Jill 25%

I have solved this problem using Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Case 2

Consider the 7 column dataset below (there are actually 13 columns.  For want of space only 7 have been shown).  For every Phone Number, there are up to 12 columns - 6 columns for the date on which the call was placed and the other 6 for the remark jotted down by the service agent based on the interaction with the customer.  The software from where the data is downloaded is capable of downloading only 6 columns per Phone Number.  If there were more than 6 calls placed on the same customer, then multiple rows will appear for that same number.  A case in point being Phone number 561 below - you will see that this number appears multiple times in the table below.

Phone Date 1 Remark 1 Date 2 Remark 2 Date 3 Remark 3
561 12/03/17 Interested 17/03/17 Not Interested 22/03/17 Not Reachable
562 13-03-17 Not Interested 18-03-17 Call Busy 23-03-17 Not Reachable
561 14-03-17 Not Interested 19-03-17 Not Reachable 24-03-17 Call Dropped
564 15-03-17 Call Busy 20-03-17 Not Interested 25-03-17 Call Later
562 16-03-17 Call Busy 21-03-17 Call Busy 26-03-17 Call Busy
561 17-03-17 Interested 22-03-17 Call Later 27-03-17 Call Dropped
563 18-03-17 Call Later 23-03-17 Interested 28-03-17 Call Busy
568 19-03-17 Call Dropped 24-03-17 Call Busy 29-03-17 Call Busy
561 20-03-17 Call Busy 25-03-17 Not Reachable 30-03-17 Not Interested
565 21-03-17 Call Dropped 26-03-17 Call Later 31-03-17 Call Later
562 22-03-17 Interested
574 23-03-17 Not Interested 28-03-17 Call Dropped 02-04-17 Not Interested
563 24-03-17 Interested 29-03-17 Not Reachable 03-04-17 Call Later
572 25-03-17 Call Dropped 30-03-17 Call Busy 04-04-17 Interested
573 26-03-17 Call Dropped 31-03-17 Not Interested 05-04-17 Not Reachable

The objective here is to rearrange the layout so that only one row exists per Phone number.  The result should be dynamic for new Phone numbers added and the same number being repeated in many more rows.  The expected data layout is (for want of space only limited columns have been shown here):

Phone Custom.1 Custom.2 Custom.3 Custom.4 Custom.5 Custom.6
561 3/12/2017 Interested 3/17/2017 Not Interested 3/22/2017 Not Reachable
562 3/13/2017 Not Interested 3/18/2017 Call Busy 3/23/2017 Not Reachable
564 3/15/2017 Call Busy 3/20/2017 Not Interested 3/25/2017 Call Later
563 3/18/2017 Call Later 3/23/2017 Interested 3/28/2017 Call Busy
568 3/19/2017 Call Dropped 3/24/2017 Call Busy 3/29/2017 Call Busy
565 3/21/2017 Call Dropped 3/26/2017 Call Later 3/31/2017 Call Later
574 3/23/2017 Not Interested 3/28/2017 Call Dropped 4/2/2017 Not Interested
572 3/25/2017 Call Dropped 3/30/2017 Call Busy 4/4/2017 Interested
573 3/26/2017 Call Dropped 3/31/2017 Not Interested 4/5/2017 Not Reachable

I have solved this problem using formulas and Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Prepare an invigilation schedule for each teacher by different time periods

{3 Comments}

Imagine a multi column exam invigilation schedule with the following information

  1. S. No.
  2. Name of staff
  3. Designation of staff member
  4. Two columns for each day on which there is an exam - one for Morning and another for Afternoon
  5. A * under each column if that particular staff member has to be an invigilator during that time period
S. No. Name of staff Designation 09/11/2017-Morning 09/11/2017-Afternoon 10/11/2017-Morning 10/11/2017-Afternoon
1 Tom Lecturer * * *
2 Sam Lecturer * *

This dataset stretches into many more columns.  So if the exams last for 15 days, there will be 30 columns.  The objective is to condense the column expanding dataset into a 5 column one - S. No., Name of Staff, Designation, Morning and Evening.  Under the Morning and Evening columns, the different dates have to be separated by commas in that one single cell.  The expected result is:

Name of staff S. No. Designation Morning Afternoon
Tom 1 Lecturer 09/11/2017, 10/11/2017, 11/11/2017, 13/11/2017, 14/11/2017, 16/11/2017, 17/11/2017 10/11/2017, 13/11/2017, 14/11/2017, 15/11/2017, 16/11/2017, 17/11/2017
Sam 2 Lecturer 09/11/2017, 13/11/2017, 15/11/2017, 16/11/2017 09/11/2017

I can think of 3 ways to solve this problem. Here's a brief about each of those methods:

  1. Using Power Query and Excel's functions - Using Power Query, one can first unpivot the data and then use the TEXTJOIN() function.  This would be an array formula.
  2. Using Power Query only - Using Power Query, one can first unpivot the data and then modify the code to concatenate all dates in a single cell for that invigilator.  One can then Pivot the data in Power Query itself
  3. Using Power Query and PowerPivot - Using Power Query, one can first unpivot the data and then use the CONCATENATEX() function of the PowerPivot.

As you can observe, the common thread in all solutions is that one has to first unpivot the dataset.

You may refer to all my 3 solutions in this workbook.

Split total patient hospitalisation days into multiple months

{0 Comments}

Here's a dataset with 3 columns - Patient Name, Date of admission and Duration (days).

Patient Date of admission Duration (day)
A 10-10-2017 25
B 20-10-2017 6
C 23-10-2017 12
D 29-10-2017 9

The objective is to split the hospitalization per patient into different months to determine each month's revenue accrual.  The expected result is

Length of stay Month
Patient October November Grand Total
A 21 4 25
B 6 6
C 8 4 12
D 2 7 9
Grand Total 37 15 52

I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook here.