Category: POWER QUERY

Determine the top selling location for each product

{0 Comments}

Visualise a 3 column dataset as shown below - Location, Product and Sales.  Each location can have multiple products (Product A has Banana, Apple and Carrot) and each product can be sold in multiple locations (Banana is sold in locations A, B and F).

The objective is to determine the location with highest sales for each product.  So for Banana, maximum sale value is 25 and location of maximum sales value is B.  Likewise for Orange, maximum sales value is 49 and location of maximum sales value is A.  The expected result is:

I have 4 solutions to this problem:

  1. Advanced Filters - This is a static solution.  For any changes in the source data range, one will have to re-enter the 3 inputs in the Advanced Filter window
  2. Formulas - This is a semi-dynamic solution.  To make it fully dynamic, one will have to write an array formula to first extract all unique product names in a column.  The array formula to extract product names in a column can be obtained from here.
  3. Power Query - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All
  4. PowerPivot - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All

You may download my solution workbook from here.

Remove duplicates from each cell of a dataset

{0 Comments}

Here's a dataset with 4 columns and 2 rows.  In column A, there is Patient ID and in the other columns are observations recorded by 3 Doctors for each patient.  As can be seen, in each cell there are multiple observations which are either separated by Alt+Enter (another line in the same cell) and/or a comma.  To make things more complicated, there are duplicate text entries appearing in each cell.  In cell B2, Creative appears twice and in cell D2, Fatigue, headaches and stress each appear twice.  Please also note that the entries appear in mixed case in each cell i.e. in cell D2, the first occurrence of stress is in proper case and the second instance is in small case.

The objective is to remove duplicates from each cell (disregarding case sensitivity) and standardize the multiple entries with an Alt+Enter in each cell.  The expected result is shown in the green table below

I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016 and higher versions.  You may download my solution workbook from here.

Merge 2 work schedules

{0 Comments}

In a certain Organisation, assume that there are two work schedules - one that runs from Friday to Thursday and another that runs from Sunday to Saturday.  In the image below, Work schedule 1 is in range A2:C4 and Work schedule 2 is in range E2:G4.  In each dataset, the start date is the first column i.e. the week starts from there.  So from 28-Dec-2018 to 29-Dec-2018, only workers 1 and 2 would work but from 30-Dec-2018 to 03-Jan-2019, all 4 workers would work.  The objective is to create a combined work schedule as shown in range I2:N7.

The solution is dynamic for new rows that will be added to the source data Tables.  I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016 and higher versions.

You may download my solution workbook from here.

Rearrange a multi heading dataset into a single heading one which is Pivot ready

{2 Comments}

Any well arranged dataset should be "Pivot Table" ready with the following 3 important properties:

  1. There should be no merged and centered cells; and
  2. Every column should have a unique heading; and
  3. Every column should have only 1 heading

Here's one dataset which violates all rules mentioned above.

  1. Headings in row 1 are merged; and
  2. The headings in row 2 are not unique
  3. Every column has headings in row 1 and row 2.

To be Pivot Table friendly, this dataset will have to be restructured into a 4 column one - Week, Equipment, Jobsite and Widgets as shown below:

I have achieved the desired result by using Data > Get & Transform (also known as Power Query in earlier versions of MS Excel).  The solution is dynamic for new rows and columns added to the data on the Input worksheet - one simple has to go to Data > Refresh All.  You may download my solution workbook from here.

In this workbook, there is another example of how one can transform a multi heading dataset into a Pivot Table ready dataset.  The primary difference between this and the previous dataset is that there are 2 descriptive columns to the left (as against only one in the previous example).

Rearrange travel data to clearly show travel from and travel to locations

{0 Comments}

Consider a 2 column dataset as shown below

User Location
AAA Tokyo
AAA Osaka
AAA Nagoya
AAA Hakone
AAA Kyoto
BBB Sapporo
BBB Nara
CCC Tokyo
CCC Hakone
CCC Osaka
DDD Osaka
DDD Tokyo

Customer AAA travelled from Tokyo to Osaka, Osaka to Nagoya, Nagoya to Hakone and Hakone to Kyoto.  All locations appear in a single column.  To analyse customer travel information very clearly, one may want to rearrange the dataset as follows:

User From To
AAA TOKYO OSAKA
AAA OSAKA NAGOYA
AAA NAGOYA HAKONE
AAA HAKONE KYOTO
BBB SAPPORO NARA
CCC TOKYO HAKONE
CCC HAKONE OSAKA
DDD OSAKA TOKYO

I have solved this problem using Power Query. You may download my solution workbook from here.

Sort, comma separated entries appearing in a cell, in ascending order

{5 Comments}

Assume a single column dataset as shown below.  As you can observe, there are multiple entries appearing the same cell separated by commas.

untitled

The objective is to sort, in ascending order, the entries in each cell.  The expected result is shown below.

untitled

I have solved this problem using Power Query a.k.a Data > Get & Transform in Excel 2016.  You may download my solution workbook from here.

Search for multiple phrases within a cell and extract all those phrases in another column

{0 Comments}

Assume a simple two column dataset as shown below.  In column A are some text strings.  In each cell are multiple phrases separated by commas.  In column C are some phrases - one in each cell.  The objective is to search for all phrases in each cell and extract the phrases found in another column of the same row.

untitled

The expected result is:

untitled

I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016.  You may download my solution workbook from here.

Combine unique entries from a range of cells after satisfying a condition

{2 Comments}

Here is a simple three column dataset showing Item ID, Catalogue Name and Catalogue Year

Item ID Catalogue Name Catalogue Year
A Spring 2012
A Spring 2016
A Spring 2017
B Fall 2014
B Fall 2014
B Fall 2016

The second and third last rows are clear duplicate rows. The task is to create a fourth column (titled as All Year) in this dataset with concatenated entries from the Catalogue Year column for each Item ID. So the expected result is:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2016
B Fall 2014 2014, 2016
B Fall 2016 2014, 2016

Notice that for Item ID B, the result in every cell of the fourth column is 2014, 2016 and not 2014,2014,2016.  So while joining (concatenating in technical MS Excel terminology) entries from the "Catalogues Year" column in the "All Year" column, only unique entries for that Item ID have to be considered.

Excel 2016 has a TEXTJOIN() function which gets use close to the final result but does not take into account the unique entries only (it takes all entries into account all entries).  On using the TEXTJOIN() array formula in a spare column, here is the result i got:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2014, 2016
B Fall 2014 2014, 2014, 2016
B Fall 2016 2014, 2014, 2016

Notice that for Item ID B, the result is 2014, 2014, 2017 and not 2014, 2017.  So the TEXTJOIN() function does not work.

I have solved this problem using Power Query (Get and Transform in Excel 2016).  You may download my solution workbook from here.

Restructure the layout of datasets

{2 Comments}

I'd like to discuss 2 cases of restructuring layouts of datasets:

Case 1

Consider the 7 columns dataset below.  For every Doc Number, there are 6 columns - 3 columns for the name of the employee and another 3 columns for the Percent of each employee.

Doc Number Emp 1 Emp 1 % Emp 2 Emp 2 % Emp 3 Emp 3 %
111111 John 5% Eric 6% Jane 7%
222222 Mary 10% Joe 4% Jill 25%

The objective is to restructure the data layout to a 3 column one  - Doc Number, Employee Name and Employee Percent.  Such a revised data layout would enable filtering, Pivoting, charting etc.  Furthermore, the solution should be dynamic for new columns added.  The expected data layout is:

Doc Number Employee Percent
111111 John 5%
111111 Eric 6%
111111 Jane 7%
222222 Mary 10%
222222 Joe 4%
222222 Jill 25%

I have solved this problem using Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Case 2

Consider the 7 column dataset below (there are actually 13 columns.  For want of space only 7 have been shown).  For every Phone Number, there are up to 12 columns - 6 columns for the date on which the call was placed and the other 6 for the remark jotted down by the service agent based on the interaction with the customer.  The software from where the data is downloaded is capable of downloading only 6 columns per Phone Number.  If there were more than 6 calls placed on the same customer, then multiple rows will appear for that same number.  A case in point being Phone number 561 below - you will see that this number appears multiple times in the table below.

Phone Date 1 Remark 1 Date 2 Remark 2 Date 3 Remark 3
561 12/03/17 Interested 17/03/17 Not Interested 22/03/17 Not Reachable
562 13-03-17 Not Interested 18-03-17 Call Busy 23-03-17 Not Reachable
561 14-03-17 Not Interested 19-03-17 Not Reachable 24-03-17 Call Dropped
564 15-03-17 Call Busy 20-03-17 Not Interested 25-03-17 Call Later
562 16-03-17 Call Busy 21-03-17 Call Busy 26-03-17 Call Busy
561 17-03-17 Interested 22-03-17 Call Later 27-03-17 Call Dropped
563 18-03-17 Call Later 23-03-17 Interested 28-03-17 Call Busy
568 19-03-17 Call Dropped 24-03-17 Call Busy 29-03-17 Call Busy
561 20-03-17 Call Busy 25-03-17 Not Reachable 30-03-17 Not Interested
565 21-03-17 Call Dropped 26-03-17 Call Later 31-03-17 Call Later
562 22-03-17 Interested
574 23-03-17 Not Interested 28-03-17 Call Dropped 02-04-17 Not Interested
563 24-03-17 Interested 29-03-17 Not Reachable 03-04-17 Call Later
572 25-03-17 Call Dropped 30-03-17 Call Busy 04-04-17 Interested
573 26-03-17 Call Dropped 31-03-17 Not Interested 05-04-17 Not Reachable

The objective here is to rearrange the layout so that only one row exists per Phone number.  The result should be dynamic for new Phone numbers added and the same number being repeated in many more rows.  The expected data layout is (for want of space only limited columns have been shown here):

Phone Custom.1 Custom.2 Custom.3 Custom.4 Custom.5 Custom.6
561 3/12/2017 Interested 3/17/2017 Not Interested 3/22/2017 Not Reachable
562 3/13/2017 Not Interested 3/18/2017 Call Busy 3/23/2017 Not Reachable
564 3/15/2017 Call Busy 3/20/2017 Not Interested 3/25/2017 Call Later
563 3/18/2017 Call Later 3/23/2017 Interested 3/28/2017 Call Busy
568 3/19/2017 Call Dropped 3/24/2017 Call Busy 3/29/2017 Call Busy
565 3/21/2017 Call Dropped 3/26/2017 Call Later 3/31/2017 Call Later
574 3/23/2017 Not Interested 3/28/2017 Call Dropped 4/2/2017 Not Interested
572 3/25/2017 Call Dropped 3/30/2017 Call Busy 4/4/2017 Interested
573 3/26/2017 Call Dropped 3/31/2017 Not Interested 4/5/2017 Not Reachable

I have solved this problem using formulas and Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Prepare an invigilation schedule for each teacher by different time periods

{5 Comments}

Imagine a multi column exam invigilation schedule with the following information

  1. S. No.
  2. Name of staff
  3. Designation of staff member
  4. Two columns for each day on which there is an exam - one for Morning and another for Afternoon
  5. A * under each column if that particular staff member has to be an invigilator during that time period
S. No. Name of staff Designation 09/11/2017-Morning 09/11/2017-Afternoon 10/11/2017-Morning 10/11/2017-Afternoon
1 Tom Lecturer * * *
2 Sam Lecturer * *

This dataset stretches into many more columns.  So if the exams last for 15 days, there will be 30 columns.  The objective is to condense the column expanding dataset into a 5 column one - S. No., Name of Staff, Designation, Morning and Evening.  Under the Morning and Evening columns, the different dates have to be separated by commas in that one single cell.  The expected result is:

Name of staff S. No. Designation Morning Afternoon
Tom 1 Lecturer 09/11/2017, 10/11/2017, 11/11/2017, 13/11/2017, 14/11/2017, 16/11/2017, 17/11/2017 10/11/2017, 13/11/2017, 14/11/2017, 15/11/2017, 16/11/2017, 17/11/2017
Sam 2 Lecturer 09/11/2017, 13/11/2017, 15/11/2017, 16/11/2017 09/11/2017

I can think of 3 ways to solve this problem. Here's a brief about each of those methods:

  1. Using Power Query and Excel's functions - Using Power Query, one can first unpivot the data and then use the TEXTJOIN() function.  This would be an array formula.
  2. Using Power Query only - Using Power Query, one can first unpivot the data and then modify the code to concatenate all dates in a single cell for that invigilator.  One can then Pivot the data in Power Query itself
  3. Using Power Query and PowerPivot - Using Power Query, one can first unpivot the data and then use the CONCATENATEX() function of the PowerPivot.

As you can observe, the common thread in all solutions is that one has to first unpivot the dataset.

You may refer to all my 3 solutions in this workbook.