Category: POWER QUERY

Rearrange travel data to clearly show travel from and travel to locations

{0 Comments}

Consider a 2 column dataset as shown below

User Location
AAA Tokyo
AAA Osaka
AAA Nagoya
AAA Hakone
AAA Kyoto
BBB Sapporo
BBB Nara
CCC Tokyo
CCC Hakone
CCC Osaka
DDD Osaka
DDD Tokyo

Customer AAA travelled from Tokyo to Osaka, Osaka to Nagoya, Nagoya to Hakone and Hakone to Kyoto.  All locations appear in a single column.  To analyse customer travel information very clearly, one may want to rearrange the dataset as follows:

User From To
AAA TOKYO OSAKA
AAA OSAKA NAGOYA
AAA NAGOYA HAKONE
AAA HAKONE KYOTO
BBB SAPPORO NARA
CCC TOKYO HAKONE
CCC HAKONE OSAKA
DDD OSAKA TOKYO

I have solved this problem using Power Query. You may download my solution workbook from here.

Sort, comma separated entries appearing in a cell, in ascending order

{5 Comments}

Assume a single column dataset as shown below.  As you can observe, there are multiple entries appearing the same cell separated by commas.

untitled

The objective is to sort, in ascending order, the entries in each cell.  The expected result is shown below.

untitled

I have solved this problem using Power Query a.k.a Data > Get & Transform in Excel 2016.  You may download my solution workbook from here.

Search for multiple phrases within a cell and extract all those phrases in another column

{0 Comments}

Assume a simple two column dataset as shown below.  In column A are some text strings.  In each cell are multiple phrases separated by commas.  In column C are some phrases - one in each cell.  The objective is to search for all phrases in each cell and extract the phrases found in another column of the same row.

untitled

The expected result is:

untitled

I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016.  You may download my solution workbook from here.

Combine unique entries from a range of cells after satisfying a condition

{0 Comments}

Here is a simple three column dataset showing Item ID, Catalogue Name and Catalogue Year

Item ID Catalogue Name Catalogue Year
A Spring 2012
A Spring 2016
A Spring 2017
B Fall 2014
B Fall 2014
B Fall 2016

The second and third last rows are clear duplicate rows. The task is to create a fourth column (titled as All Year) in this dataset with concatenated entries from the Catalogue Year column for each Item ID. So the expected result is:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2016
B Fall 2014 2014, 2016
B Fall 2016 2014, 2016

Notice that for Item ID B, the result in every cell of the fourth column is 2014, 2016 and not 2014,2014,2016.  So while joining (concatenating in technical MS Excel terminology) entries from the "Catalogues Year" column in the "All Year" column, only unique entries for that Item ID have to be considered.

Excel 2016 has a TEXTJOIN() function which gets use close to the final result but does not take into account the unique entries only (it takes all entries into account all entries).  On using the TEXTJOIN() array formula in a spare column, here is the result i got:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2014, 2016
B Fall 2014 2014, 2014, 2016
B Fall 2016 2014, 2014, 2016

Notice that for Item ID B, the result is 2014, 2014, 2017 and not 2014, 2017.  So the TEXTJOIN() function does not work.

I have solved this problem using Power Query (Get and Transform in Excel 2016).  You may download my solution workbook from here.

Restructure the layout of datasets

{1 Comment}

I'd like to discuss 2 cases of restructuring layouts of datasets:

Case 1

Consider the 7 columns dataset below.  For every Doc Number, there are 6 columns - 3 columns for the name of the employee and another 3 columns for the Percent of each employee.

Doc Number Emp 1 Emp 1 % Emp 2 Emp 2 % Emp 3 Emp 3 %
111111 John 5% Eric 6% Jane 7%
222222 Mary 10% Joe 4% Jill 25%

The objective is to restructure the data layout to a 3 column one  - Doc Number, Employee Name and Employee Percent.  Such a revised data layout would enable filtering, Pivoting, charting etc.  Furthermore, the solution should be dynamic for new columns added.  The expected data layout is:

Doc Number Employee Percent
111111 John 5%
111111 Eric 6%
111111 Jane 7%
222222 Mary 10%
222222 Joe 4%
222222 Jill 25%

I have solved this problem using Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Case 2

Consider the 7 column dataset below (there are actually 13 columns.  For want of space only 7 have been shown).  For every Phone Number, there are up to 12 columns - 6 columns for the date on which the call was placed and the other 6 for the remark jotted down by the service agent based on the interaction with the customer.  The software from where the data is downloaded is capable of downloading only 6 columns per Phone Number.  If there were more than 6 calls placed on the same customer, then multiple rows will appear for that same number.  A case in point being Phone number 561 below - you will see that this number appears multiple times in the table below.

Phone Date 1 Remark 1 Date 2 Remark 2 Date 3 Remark 3
561 12/03/17 Interested 17/03/17 Not Interested 22/03/17 Not Reachable
562 13-03-17 Not Interested 18-03-17 Call Busy 23-03-17 Not Reachable
561 14-03-17 Not Interested 19-03-17 Not Reachable 24-03-17 Call Dropped
564 15-03-17 Call Busy 20-03-17 Not Interested 25-03-17 Call Later
562 16-03-17 Call Busy 21-03-17 Call Busy 26-03-17 Call Busy
561 17-03-17 Interested 22-03-17 Call Later 27-03-17 Call Dropped
563 18-03-17 Call Later 23-03-17 Interested 28-03-17 Call Busy
568 19-03-17 Call Dropped 24-03-17 Call Busy 29-03-17 Call Busy
561 20-03-17 Call Busy 25-03-17 Not Reachable 30-03-17 Not Interested
565 21-03-17 Call Dropped 26-03-17 Call Later 31-03-17 Call Later
562 22-03-17 Interested
574 23-03-17 Not Interested 28-03-17 Call Dropped 02-04-17 Not Interested
563 24-03-17 Interested 29-03-17 Not Reachable 03-04-17 Call Later
572 25-03-17 Call Dropped 30-03-17 Call Busy 04-04-17 Interested
573 26-03-17 Call Dropped 31-03-17 Not Interested 05-04-17 Not Reachable

The objective here is to rearrange the layout so that only one row exists per Phone number.  The result should be dynamic for new Phone numbers added and the same number being repeated in many more rows.  The expected data layout is (for want of space only limited columns have been shown here):

Phone Custom.1 Custom.2 Custom.3 Custom.4 Custom.5 Custom.6
561 3/12/2017 Interested 3/17/2017 Not Interested 3/22/2017 Not Reachable
562 3/13/2017 Not Interested 3/18/2017 Call Busy 3/23/2017 Not Reachable
564 3/15/2017 Call Busy 3/20/2017 Not Interested 3/25/2017 Call Later
563 3/18/2017 Call Later 3/23/2017 Interested 3/28/2017 Call Busy
568 3/19/2017 Call Dropped 3/24/2017 Call Busy 3/29/2017 Call Busy
565 3/21/2017 Call Dropped 3/26/2017 Call Later 3/31/2017 Call Later
574 3/23/2017 Not Interested 3/28/2017 Call Dropped 4/2/2017 Not Interested
572 3/25/2017 Call Dropped 3/30/2017 Call Busy 4/4/2017 Interested
573 3/26/2017 Call Dropped 3/31/2017 Not Interested 4/5/2017 Not Reachable

I have solved this problem using formulas and Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Prepare an invigilation schedule for each teacher by different time periods

{3 Comments}

Imagine a multi column exam invigilation schedule with the following information

  1. S. No.
  2. Name of staff
  3. Designation of staff member
  4. Two columns for each day on which there is an exam - one for Morning and another for Afternoon
  5. A * under each column if that particular staff member has to be an invigilator during that time period
S. No. Name of staff Designation 09/11/2017-Morning 09/11/2017-Afternoon 10/11/2017-Morning 10/11/2017-Afternoon
1 Tom Lecturer * * *
2 Sam Lecturer * *

This dataset stretches into many more columns.  So if the exams last for 15 days, there will be 30 columns.  The objective is to condense the column expanding dataset into a 5 column one - S. No., Name of Staff, Designation, Morning and Evening.  Under the Morning and Evening columns, the different dates have to be separated by commas in that one single cell.  The expected result is:

Name of staff S. No. Designation Morning Afternoon
Tom 1 Lecturer 09/11/2017, 10/11/2017, 11/11/2017, 13/11/2017, 14/11/2017, 16/11/2017, 17/11/2017 10/11/2017, 13/11/2017, 14/11/2017, 15/11/2017, 16/11/2017, 17/11/2017
Sam 2 Lecturer 09/11/2017, 13/11/2017, 15/11/2017, 16/11/2017 09/11/2017

I can think of 3 ways to solve this problem. Here's a brief about each of those methods:

  1. Using Power Query and Excel's functions - Using Power Query, one can first unpivot the data and then use the TEXTJOIN() function.  This would be an array formula.
  2. Using Power Query only - Using Power Query, one can first unpivot the data and then modify the code to concatenate all dates in a single cell for that invigilator.  One can then Pivot the data in Power Query itself
  3. Using Power Query and PowerPivot - Using Power Query, one can first unpivot the data and then use the CONCATENATEX() function of the PowerPivot.

As you can observe, the common thread in all solutions is that one has to first unpivot the dataset.

You may refer to all my 3 solutions in this workbook.

Filter a column of a Pivot Table on a certain condition but also show other items from that column

{0 Comments}

The title sounds confusing!!!!.  Please bear with me and read on.  Here's a simple dataset

Client ID Client Name Resource Project ID Billable amount
1 Alpha David 1000 10
1 Alpha Henry 1001 20
1 Alpha Rakesh 1002 30
1 Alpha Alice 1003 40
2 Beta Alice 1000 50
2 Beta Alicia 1002 60
2 Beta Patrick 1003 70
2 Beta Mukesh 1004 80
2 Beta Suresh 1006 90
2 Beta Ajay 1005 100
3 Gamma Rama 1004 110
3 Gamma Sakshi 1006 120
4 Theta Prabhu 1005 130
5 Epsilon Alice 1000 140
5 Epsilon Alicia 1001 150
5 Epsilon Prabhu 1002 160
5 Epsilon Sakshi 1003 170
5 Epsilon Raghav 1008 180
5 Epsilon David 1010 190
5 Epsilon Henry 1012 200

Here's a Pivot Table built from the dataset above.

untitled

The question is "Is there a way to show only those rows of data which have Alice but also show others who worked with Alice".  While the first part of the question can be answered easily by filtering the Resource column on Alice, the second part (italicized for your reference) of the question is the real challenge.  When one filters the Resource column on Alice, the result is as seen below:

untitled1

This view does not show me who else worked with Alice.  The result I am expecting to see is:

untitled2

This problem can be resolved with the help of the Query Editor (Power Query).  The basic idea is to create another column in the original dataset where we create a string of all resources for every row.  So for example, in every row of Client ID1, the sixth column should show David,Henry,Rakesh.Alice and so on.  Once this is done, one can simply take this column to the Report filter section of the Pivot Table (see last image above) and filter on Alice.

You may refer to my solution in this workbook.

Distribute projected revenue annually

{2 Comments}

Here is a dataset showing Project wise forecast of open opportunities.

  1. Topic is the Project Name
  2. Est. Close Date is the date by when the opportunity would be closed i.e. the project would be won from that Client
  3. Duration is the time (in months) for which the project would run
  4. Amount is the total amount that would be billed for that project

Clients are invoiced annually only. So in the example below:

  1. Project ABC is for US$1 million with a duration of 24 months and is expected to be closed in Oct. 2017.  We need to model the data to show the billing every 12 months.  So for ABC US$500K would be billed in Oct-2017 and another US$500K in Oct-2018.
  2. Project GEF is for US$2 million with a duration of 18 months and is expected to be closed in Feb. 2018. We need to model the data to show US$1.3 million in Feb-2018 and another US$666K in Feb-2019.  The monthly billing is US$2 million divided by 18 and then multiplied by 12 - this amounts to US$1.3 million.
Topic Est. Close Date Duration (Months) Amount
ABC 01-10-2017 24 1,000,000
GEF 01-02-2018 18 2,000,000
XYZ 01-03-2018 30 1,000,000

The expected result should look like this:

Row Labels Oct-17 Feb-18 Mar-18 Oct-18 Feb-19 Mar-19 Mar-20 Total
ABC 500,000 500,000 1,000,000
GEF 1,333,333 666,667 2,000,000
XYZ 400,000 400,000 200,000 1,000,000
Grand Total 500,000 1,333,333 400,000 500,000 666,667 400,000 200,000 4,000,000

I have solved this problem using Power Query and PowerPivot. You may download my solution workbook from here.

Alter the behaviour of a filter/slicer from OR to AND

{3 Comments}

Here is a simple two column dataset which shows EmpID in column A and DateWorked in column B.  My objective is to filter the table to show those employees who worked ONLY on August 17 and August 18.  I'd like to exclude employees who:

  1. Did not work on both these days; and
  2. Worked on both these dates but also worked on other dates

A simple filter on the DatesWorked column with the criteria of August 17 and August 18, would return 7 employees.  This is clearly wrong because this will show employees who:

  1. Worked on any of the two chosen dates; and
  2. Worked on dates other than the two chosen dates

From Table 1, one can clearly observe that the correct result should be EmpID E.  Only EmpID E worked on August 17 and 18.  While others also worked on both these dates, they worked on other dates as well and should therefore be excluded.

The problem outlined above holds true for selections made via a slicer in a Pivot Table as well.  So this is a generic problem with a filter/slicer - conditions specified therein behave like OR conditions, not like AND conditions.

Base dataset

EmpID DateWorked
A 14-08-2017
A 15-08-2017
A 16-08-2017
A 17-08-2017
A 18-08-2017
B 15-08-2017
B 16-08-2017
B 18-08-2017
C 14-08-2017
C 15-08-2017
C 16-08-2017
C 17-08-2017
C 18-08-2017
D 14-08-2017
D 15-08-2017
D 16-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 16-08-2017
F 17-08-2017
G 18-08-2017

Filtered dataset (after specifying the criteria as 17 Aug and 18 Aug)

EmpID DateWorked
A 17-08-2017
A 18-08-2017
B 18-08-2017
C 17-08-2017
C 18-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 17-08-2017
G 18-08-2017

So the question is "How does one change the behavior of a filter/slicer from OR to AND?". To make things worse, the data is arranged as follows:

EmpID DateWorked
A 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
B 15/8/17;16/8/17;18/8/17
C 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
D 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
E 17/8/17;18/8/17
F 16/8/17;17/8/17
G 18-08-17

As you can see, there are multiple dates in a single cell separated by a ;.

You may refer to my Power Query and PowerPivot solution here.

Fill out a matrix with a user defined value which has variable start and end points

{0 Comments}

Here is a sample dataset with Item and Number of buckets in the row labels and Year-Month in the column labels.  In the value area section are some numbers.

Item Number of Buckets 2016-10 2016-11 2016-12 2017-1 2017-2 2017-3 2017-4 2017-5
ABC 8 1500
PQR 12 40
RPS 4 100
CHA 11 30
MUM 12 90
CHE 2 24

The objective is to repeat the number in every row of the value area section as many times as the number mentioned in the Number of buckets column.  As a case in point, for RPS, 100 should appear 3 more times in that row (till 2017-7).  The result should look like this

Item Number of Buckets 2016-10 2016-11 2016-12 2017-1 2017-2 2017-3 2017-4 2017-5
ABC 8 1500
PQR 12 40 40 40 40 40
RPS 4 100 100
CHA 11 30 30 30 30
MUM 12 90 90 90 90 90 90 90 90
CHE 2 24

For want of space I have deleted the columns from the right.

I have solved this problem using Power Query a.k.a. Get & Transform in Excel 2016 (available under Data).  You may download my workbook from here.