Restructure the layout of datasets

{0 Comments}

I'd like to discuss 2 cases of restructuring layouts of datasets:

Case 1

Consider the 7 columns dataset below.  For every Doc Number, there are 6 columns - 3 columns for the name of the employee and another 3 columns for the Percent of each employee.

Doc Number Emp 1 Emp 1 % Emp 2 Emp 2 % Emp 3 Emp 3 %
111111 John 5% Eric 6% Jane 7%
222222 Mary 10% Joe 4% Jill 25%

The objective is to restructure the data layout to a 3 column one  - Doc Number, Employee Name and Employee Percent.  Such a revised data layout would enable filtering, Pivoting, charting etc.  Furthermore, the solution should be dynamic for new columns added.  The expected data layout is:

Doc Number Employee Percent
111111 John 5%
111111 Eric 6%
111111 Jane 7%
222222 Mary 10%
222222 Joe 4%
222222 Jill 25%

I have solved this problem using Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Case 2

Consider the 7 column dataset below (there are actually 13 columns.  For want of space only 7 have been shown).  For every Phone Number, there are up to 12 columns - 6 columns for the date on which the call was placed and the other 6 for the remark jotted down by the service agent based on the interaction with the customer.  The software from where the data is downloaded is capable of downloading only 6 columns per Phone Number.  If there were more than 6 calls placed on the same customer, then multiple rows will appear for that same number.  A case in point being Phone number 561 below - you will see that this number appears multiple times in the table below.

Phone Date 1 Remark 1 Date 2 Remark 2 Date 3 Remark 3
561 12/03/17 Interested 17/03/17 Not Interested 22/03/17 Not Reachable
562 13-03-17 Not Interested 18-03-17 Call Busy 23-03-17 Not Reachable
561 14-03-17 Not Interested 19-03-17 Not Reachable 24-03-17 Call Dropped
564 15-03-17 Call Busy 20-03-17 Not Interested 25-03-17 Call Later
562 16-03-17 Call Busy 21-03-17 Call Busy 26-03-17 Call Busy
561 17-03-17 Interested 22-03-17 Call Later 27-03-17 Call Dropped
563 18-03-17 Call Later 23-03-17 Interested 28-03-17 Call Busy
568 19-03-17 Call Dropped 24-03-17 Call Busy 29-03-17 Call Busy
561 20-03-17 Call Busy 25-03-17 Not Reachable 30-03-17 Not Interested
565 21-03-17 Call Dropped 26-03-17 Call Later 31-03-17 Call Later
562 22-03-17 Interested
574 23-03-17 Not Interested 28-03-17 Call Dropped 02-04-17 Not Interested
563 24-03-17 Interested 29-03-17 Not Reachable 03-04-17 Call Later
572 25-03-17 Call Dropped 30-03-17 Call Busy 04-04-17 Interested
573 26-03-17 Call Dropped 31-03-17 Not Interested 05-04-17 Not Reachable

The objective here is to rearrange the layout so that only one row exists per Phone number.  The result should be dynamic for new Phone numbers added and the same number being repeated in many more rows.  The expected data layout is (for want of space only limited columns have been shown here):

Phone Custom.1 Custom.2 Custom.3 Custom.4 Custom.5 Custom.6
561 3/12/2017 Interested 3/17/2017 Not Interested 3/22/2017 Not Reachable
562 3/13/2017 Not Interested 3/18/2017 Call Busy 3/23/2017 Not Reachable
564 3/15/2017 Call Busy 3/20/2017 Not Interested 3/25/2017 Call Later
563 3/18/2017 Call Later 3/23/2017 Interested 3/28/2017 Call Busy
568 3/19/2017 Call Dropped 3/24/2017 Call Busy 3/29/2017 Call Busy
565 3/21/2017 Call Dropped 3/26/2017 Call Later 3/31/2017 Call Later
574 3/23/2017 Not Interested 3/28/2017 Call Dropped 4/2/2017 Not Interested
572 3/25/2017 Call Dropped 3/30/2017 Call Busy 4/4/2017 Interested
573 3/26/2017 Call Dropped 3/31/2017 Not Interested 4/5/2017 Not Reachable

I have solved this problem using formulas and Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Prepare an invigilation schedule for each teacher by different time periods

{3 Comments}

Imagine a multi column exam invigilation schedule with the following information

  1. S. No.
  2. Name of staff
  3. Designation of staff member
  4. Two columns for each day on which there is an exam - one for Morning and another for Afternoon
  5. A * under each column if that particular staff member has to be an invigilator during that time period
S. No. Name of staff Designation 09/11/2017-Morning 09/11/2017-Afternoon 10/11/2017-Morning 10/11/2017-Afternoon
1 Tom Lecturer * * *
2 Sam Lecturer * *

This dataset stretches into many more columns.  So if the exams last for 15 days, there will be 30 columns.  The objective is to condense the column expanding dataset into a 5 column one - S. No., Name of Staff, Designation, Morning and Evening.  Under the Morning and Evening columns, the different dates have to be separated by commas in that one single cell.  The expected result is:

Name of staff S. No. Designation Morning Afternoon
Tom 1 Lecturer 09/11/2017, 10/11/2017, 11/11/2017, 13/11/2017, 14/11/2017, 16/11/2017, 17/11/2017 10/11/2017, 13/11/2017, 14/11/2017, 15/11/2017, 16/11/2017, 17/11/2017
Sam 2 Lecturer 09/11/2017, 13/11/2017, 15/11/2017, 16/11/2017 09/11/2017

I can think of 3 ways to solve this problem. Here's a brief about each of those methods:

  1. Using Power Query and Excel's functions - Using Power Query, one can first unpivot the data and then use the TEXTJOIN() function.  This would be an array formula.
  2. Using Power Query only - Using Power Query, one can first unpivot the data and then modify the code to concatenate all dates in a single cell for that invigilator.  One can then Pivot the data in Power Query itself
  3. Using Power Query and PowerPivot - Using Power Query, one can first unpivot the data and then use the CONCATENATEX() function of the PowerPivot.

As you can observe, the common thread in all solutions is that one has to first unpivot the dataset.

You may refer to all my 3 solutions in this workbook.

Split total patient hospitalisation days into multiple months

{0 Comments}

Here's a dataset with 3 columns - Patient Name, Date of admission and Duration (days).

Patient Date of admission Duration (day)
A 10-10-2017 25
B 20-10-2017 6
C 23-10-2017 12
D 29-10-2017 9

The objective is to split the hospitalization per patient into different months to determine each month's revenue accrual.  The expected result is

Length of stay Month
Patient October November Grand Total
A 21 4 25
B 6 6
C 8 4 12
D 2 7 9
Grand Total 37 15 52

I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook here.

Determine the total number of projects by Status

{0 Comments}

Here's a simple 3 column table showing Date, Project name (Cat.) and Status of the project.  Each project can have multiple status entries on different dates.  So as you can observe, project "alpha_9383993" was In Progress on Oct 2, 2017, remained so on October 5, 2017 but was completed on October 6, 2017.

Date Cat. Status
02-Oct-17 alpha_9383993 In Progress
03-Oct-17 Pulse_9387388 In Progress
04-Oct-17 Pulse_9387388 Rework
05-Oct-17 alpha_9383993 In Progress
06-Oct-17 alpha_9383993 Completed
07-Oct-17 Pulse_9387388 Completed
08-Oct-17 Oppo_tes_9383 In Progress
09-Oct-17 Oppo_Max_8977 Rework

The objective is to determine the count of projects by Status as per the most recent status of every project.  So the expected result is:

Row Labels measure 2
Completed 2
In Progress 1
Rework 1

The result for In Progress should be one because there is only one such project - Oppo_tes_9383.  Project alpha_9383993 should not be counted because it was completed on October 6, 2017.  Likewise the result for Rework should be one because there is only one such project - Oppo_Max_8977.  Project Pulse_9387388 should not be counted because it was completed on October 7,2017.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

In a Pivot Table, compute highest revenue earned on any day from each customer and the date thereof

{0 Comments}

Here's a simple dataset showing the Date of sale, Customer Name and Sales amount.

Date Customer Name Sales amount
12-03-2017 A 1
12-03-2017 A 2
12-03-2017 A 3
12-03-2017 B 4
12-03-2017 B 5
12-03-2017 B 6
12-03-2017 B 7
12-03-2017 B 8
13-03-2017 A 1
13-03-2017 A 1

The objective is to determine Customer wise:

  1. Highest revenue; and
  2. Date on which that highest revenue was earned

The expected result is

Row Labels Highest revenue Date of highest revenue
A 6 12-Mar-2017
B 30 12-Mar-2017
Grand Total 36

So, for A, revenue earned on March 12, 2017 is 6 and on March 13, 2017 is 2.  Since the higher of the two is 6, that is the result which should appear in the Pivot Table.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

In a Pivot Table, show the most frequently appearing text entry by a certain parameter

{0 Comments}

Here's a simple two column dataset

Comment Identifier Intervals
A 3pm-6pm
A 9pm-12pm
S 3pm-6pm
S 3pm-6pm
S 9pm-12pm
A 9pm-12pm
S 9pm-12pm
D 3pm-6pm
A 9pm-12pm
A 9pm-12pm
A 9pm-12pm
A 3pm-6pm
A 3pm-6pm

For identifiers listed in column A, there are time intervals in column B. Note that for a certain identifier, a time interval can appear multiple times. The objective is two-fold:

  1. For each identifier, show the time interval which appears most frequently; and
  2. For each identifier, compute the count of the time interval which appears most frequently

The expected result is:

untitled

As you can observe, the result for S is two time periods - 3pm-6pm and 9pm-12pm.  This is because each of them appears twice.

You may download my solution from here.

Filter a column of a Pivot Table on a certain condition but also show other items from that column

{0 Comments}

The title sounds confusing!!!!.  Please bear with me and read on.  Here's a simple dataset

Client ID Client Name Resource Project ID Billable amount
1 Alpha David 1000 10
1 Alpha Henry 1001 20
1 Alpha Rakesh 1002 30
1 Alpha Alice 1003 40
2 Beta Alice 1000 50
2 Beta Alicia 1002 60
2 Beta Patrick 1003 70
2 Beta Mukesh 1004 80
2 Beta Suresh 1006 90
2 Beta Ajay 1005 100
3 Gamma Rama 1004 110
3 Gamma Sakshi 1006 120
4 Theta Prabhu 1005 130
5 Epsilon Alice 1000 140
5 Epsilon Alicia 1001 150
5 Epsilon Prabhu 1002 160
5 Epsilon Sakshi 1003 170
5 Epsilon Raghav 1008 180
5 Epsilon David 1010 190
5 Epsilon Henry 1012 200

Here's a Pivot Table built from the dataset above.

untitled

The question is "Is there a way to show only those rows of data which have Alice but also show others who worked with Alice".  While the first part of the question can be answered easily by filtering the Resource column on Alice, the second part (italicized for your reference) of the question is the real challenge.  When one filters the Resource column on Alice, the result is as seen below:

untitled1

This view does not show me who else worked with Alice.  The result I am expecting to see is:

untitled2

This problem can be resolved with the help of the Query Editor (Power Query).  The basic idea is to create another column in the original dataset where we create a string of all resources for every row.  So for example, in every row of Client ID1, the sixth column should show David,Henry,Rakesh.Alice and so on.  Once this is done, one can simply take this column to the Report filter section of the Pivot Table (see last image above) and filter on Alice.

You may refer to my solution in this workbook.

Show months with no data which fall within a certain date range of a Pivot Table

{0 Comments}

Here's a simple 4 column dataset

Bacterin Donor# Recovery Agency Date Donor Received DONOR STATUS
B050001 1 09-06-2005 00:00 ACCEPT
B050002 3 09-06-2005 00:00 ACCEPT
B050003 1 09-06-2005 00:00 ACCEPT
B050004 1 09-06-2005 00:00 ACCEPT
B050005 1 09-06-2005 00:00 ACCEPT
B050006 1 09-06-2005 00:00 ACCEPT
B050007 1 09-06-2005 00:00 ACCEPT
B050008 4 09-06-2005 00:00 ACCEPT

The objective is to determine "The number of Bacterin Donor #'s which fall within January 1, 2016 and September 30, 2017 for Agency number 18 split by Donor Status". To answer this question. one can build a Pivot Table by dragging Recovery agency and Date to the Row labels, Donor Status to the column labels and Bacteria Donor # to the value area section. One can then group the dates by years and months and apply a criteria of 18 on the Recovery agency.  One can then apply a Between filter on dates.

untitled

As you can see in the image above August 2016 and May 2017 are missing because there is no data for that period. However one would like to show those 2 months as well. To do so, one can go to Pivot Table Options > Display and click on "Show items with no data on rows". However, as you can see below, that box is greyed out.

untitled1

The only other option left is to go to Field Settings > Layout & Print and check the box for "Show items with no data"

untitled3

On doing so, the problem is that all dates across all months and years show in the Pivot Table despite the Between Date filter staying intact:

untitled4

So the question is, how can one show even those months (within a chosen date range) in which there is no data.  In other words, one has to think of a way to activate the check box of "Show items with no data on rows" available under Pivot Table Options > Display.

untitled5

I have solved this problem with the help of the PowerPivot.  You may download my solution workbook from here.

Distribute projected revenue annually

{0 Comments}

Here is a dataset showing Project wise forecast of open opportunities.

  1. Topic is the Project Name
  2. Est. Close Date is the date by when the opportunity would be closed i.e. the project would be won from that Client
  3. Duration is the time (in months) for which the project would run
  4. Amount is the total amount that would be billed for that project

Clients are invoiced annually only. So in the example below:

  1. Project ABC is for US$1 million with a duration of 24 months and is expected to be closed in Oct. 2017.  We need to model the data to show the billing every 12 months.  So for ABC US$500K would be billed in Oct-2017 and another US$500K in Oct-2018.
  2. Project GEF is for US$2 million with a duration of 18 months and is expected to be closed in Feb. 2018. We need to model the data to show US$1.3 million in Feb-2018 and another US$666K in Feb-2019.  The monthly billing is US$2 million divided by 18 and then multiplied by 12 - this amounts to US$1.3 million.
Topic Est. Close Date Duration (Months) Amount
ABC 01-10-2017 24 1,000,000
GEF 01-02-2018 18 2,000,000
XYZ 01-03-2018 30 1,000,000

The expected result should look like this:

Row Labels Oct-17 Feb-18 Mar-18 Oct-18 Feb-19 Mar-19 Mar-20 Total
ABC 500,000 500,000 1,000,000
GEF 1,333,333 666,667 2,000,000
XYZ 400,000 400,000 200,000 1,000,000
Grand Total 500,000 1,333,333 400,000 500,000 666,667 400,000 200,000 4,000,000

I have solved this problem using Power Query and PowerPivot. You may download my solution workbook from here.

Alter the behaviour of a filter/slicer from OR to AND

{3 Comments}

Here is a simple two column dataset which shows EmpID in column A and DateWorked in column B.  My objective is to filter the table to show those employees who worked ONLY on August 17 and August 18.  I'd like to exclude employees who:

  1. Did not work on both these days; and
  2. Worked on both these dates but also worked on other dates

A simple filter on the DatesWorked column with the criteria of August 17 and August 18, would return 7 employees.  This is clearly wrong because this will show employees who:

  1. Worked on any of the two chosen dates; and
  2. Worked on dates other than the two chosen dates

From Table 1, one can clearly observe that the correct result should be EmpID E.  Only EmpID E worked on August 17 and 18.  While others also worked on both these dates, they worked on other dates as well and should therefore be excluded.

The problem outlined above holds true for selections made via a slicer in a Pivot Table as well.  So this is a generic problem with a filter/slicer - conditions specified therein behave like OR conditions, not like AND conditions.

Base dataset

EmpID DateWorked
A 14-08-2017
A 15-08-2017
A 16-08-2017
A 17-08-2017
A 18-08-2017
B 15-08-2017
B 16-08-2017
B 18-08-2017
C 14-08-2017
C 15-08-2017
C 16-08-2017
C 17-08-2017
C 18-08-2017
D 14-08-2017
D 15-08-2017
D 16-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 16-08-2017
F 17-08-2017
G 18-08-2017

Filtered dataset (after specifying the criteria as 17 Aug and 18 Aug)

EmpID DateWorked
A 17-08-2017
A 18-08-2017
B 18-08-2017
C 17-08-2017
C 18-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 17-08-2017
G 18-08-2017

So the question is "How does one change the behavior of a filter/slicer from OR to AND?". To make things worse, the data is arranged as follows:

EmpID DateWorked
A 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
B 15/8/17;16/8/17;18/8/17
C 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
D 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
E 17/8/17;18/8/17
F 16/8/17;17/8/17
G 18-08-17

As you can see, there are multiple dates in a single cell separated by a ;.

You may refer to my Power Query and PowerPivot solution here.