Category: POWER QUERY

Return best possible fit, to manually entered dimensions, with the intent to minimise wastage

{5 Comments}

Refer to a simple 5 column representative inventory dataset of a Glass manufacturer:

Model Length (MM) Wide (MM) Thk (MM) CAT
HX9-G-ARD 1071 273 3.5 A
MYP-G-3RD 580 535 3.2 B
EPO-G-3RD 580 535 3.2 A
MYG-G-3R 966 350 3.2 A
MYN-G-3RD 649 530 3.2 A
GM SPIN-G-3FD 882 395 3.2 A
MY8-G-AR 880 400 3.5 B
GM2-G-AR 880 400 3.5 A

From this inventory data, one has to furnish customer orders based on specific dimensions demanded by them.  A typical Customer request would be to supply glass sheets as per the following dimensions

Length (MM) Wide (MM) Thk (MM) CAT
780 542 3.5 A

The firm may or may not have glass sheets of this specific size.  The objective is to identify glass sheets, from the inventory on hand, which match customer specifications.  If there is no exact match, then one must be able to obtain all inventory items which have the same Thk (MM) and CAT as the customer specified dimensions but the Length and thickness should be more than equal to the customer specified dimensions.  The length and width can then be trimmed to match the exact customer dimensions.  Furthermore, the result returned should:

  1. List only the Top 30 glass sheets available in inventory; and
  2. List those Top 30 glass sheets in ascending order of wastage (wastage caused when the glass sheet is trimmed to match the customer specified dimensions)

You may refer to my solution in this workbook.  I have shared two solutions - one using Excel formulas and the other using Power Query a.k.a. Get and Transform in Excel 2016.  Please read the Comments in cells F1, J9 and J16 of the "Solutions" worksheet.  The difference between the 2 solutions is:

  1. Formula driven solution - This is in range J10:AM14 of the Solutions worksheet.  This is a semi dynamic solution (as compared to the Power Query solution).  To get the models in ascending order of wastage, one will have to create an Area column in the base data and sort that column in ascending order.
  2. Power Query solution - This is in range J17:AM21 of the Solutions worksheet.  This is a dynamic solution.  Just change the customer specified dimensions in range G2:J2 of the Data and Query worksheet.  Thereafter just right click on any cell in the range below and select refresh.

Generate a list of assignees for different projects based on a competency matrix

{4 Comments}

Here is a small sample of a Project matrix which shows tasks to be accomplished for various projects.  There can only be upto 6 tasks per project.

Project Name Task1 Task2 Task3 Task4 Task5 Task6
Project1 Painting Chef Gardener
Project2 Tiling Digging Engineering
Project3 Mechanic Engineering

Here is a competency matrix showing the competencies of employees on different tasks.  1 indicates that the employee is competent to perform that task.

Task Tom Jane Mary Paddy Lynda
Painting 1 1 1 1 1
Tiling 1 1 1 1 1
Plastering 1 1 1 1 1
Digging 1 0 1 1 1
Mechanic 1 1 1 0 1
Detective 1 1 1 1 1
Engineering 1 1 0 1 1
Boxer 1 0 1 1 1
Chef 1 1 1 1 1
Gardener 1 1 0 1 1
Banker 1 1 1 1 0

From these two tables, one may want to generate another table showing which employees can be assigned to which project (only those employees should be assigned to a project who can complete all tasks).  So the ideal solution is to create another column (8th column) in the Project matrix table above which should have a drop down (Data > Data Validation) for every project showing which employees are competent for that project.

Here's an illustration:

Assuming that the Project matrix is in range A1:G4 (headers are in row 1)

  1. In cell H2 (for Project1), the drop down should show Jane, Lynda, Paddy and Tom.  Mary should not appear there because she cannot perform one of the 3 tasks required to complete the project i.e. Gardener.
  2. In cell H3 (for Project2), the drop down should show Lynda, Paddy and Tom.  Jane and Mary should not appear there because they cannot perform the Digging and Engineering tasks respectively.

The solution is dynamic for the following:

  1. Projects added to the Project matrix Table; and
  2. Tasks added (upto 6 only) or edited in the Project matric Table; and
  3. Employees added to the Competency matrix Table; and
  4. Tasks added to the Competency matrix Table

I have solved this problem by using:

  1. Power Query; and
  2. Formulas in Data > Data Validation.

You may download my solution workbook from here or here.

Sales data modelling and interactive visualisations of an E-Commerce Company

{9 Comments}

In this workbook, I have Sales data of an E-Commerce Company for 3 months.  The typical columns in the base data are:

1. Order Date/Time
2. City to which orders were shipped
3. Order Number
4. Payment Type i.e. Cash on delivery, Net Banking, EMI's
5. Order Status i.e. Delivered or cancelled
6. SKU's which the ordered items fall into
7. Products which the ordered SKU's fall into
8. Categories which the ordered products fall into

Given this simple tabular representation, one may want to analyse and visualize this dataset from multiple perspectives based on user selections, such as

"What was the revenue earned from the Top 5 products in the A100 category in April for orders shipped to New Delhi?"

In this query framed above, the end user should have the leeway to select any/all of the underlined facets.  So one can either choose revenue earned or Number of orders.  Likewise, one can either select Top 5 products or Top 15 products/Top 5 SKU's etc.

With relative ease, one should also be able to "Perform an affiliate analysis" showing which categories are ordered together (to study affiliations).  Please review this post for an independent discussion on "Affinity Analysis".

Furthermore, one should be able to perform a free form timeline search such as  - "I would like to study growth in Total revenue of March 2-8 2015 over Feb 1-4 2015"

You may download the workbook from the link shared above.

You may watch similar videos showcasing the capabilities of Business Intelligence in MS Excel:

1. Analyse Sales data of a Beverage Company
2.
Analyse Training data of a Company

Here's a video showing the capabilities of this Sales data model

You may also watch this short video to see how I visualized the revenue flow from Categories to Shipping cities during different Order periods using Custom visuals available in PowerBI desktop.

Please feel free to download the PowerBI desktop workbook of the video shown above from here.

For a detailed overview of Sankey diagrams (a Custom visual available in PowerBI desktop), you may refer to my Blog article here.

Another great Custom visual (Sand Dance) which allows data discovery has been shown at this link.  At that link, you will also be able to see how I queried the underlying dataset using "Natural Language".

Show multiple text entries in one cell of a Pivot Table

{0 Comments}

Imagine a three column dataset as shown below

ID Prize Year
A X 9
A Y 10
B X 9
B Y 9
B Z 10

As you can observe, there are duplicates that appear in each column.  The task it to convert this tabular data structure to a matrix like one with ID's appearing down one column, Years appearing in one row and at each intersecting cell, the prizes should appear.  Please note that there can be multiple prizes for each ID and year combination - ID is B and year is 9 in two rows but the prizes are different.

The expected solution is shown below

ID 9 10
A X Y
B X,Y Z

I have solved this problem by using Power Query.  You may download the workbook from here.

You may also watch a short video here

Transpose data column wise

{0 Comments}

Here's a simple two dataset:

Project ID Employee
A00 Ashish Mathur
A00 Henry
A00 Paul
A00 Sanjay Singh
L02 Geeta Puri
L02 Ajay Singh
L02 Robert
L02 Mike
L02 Ajay Kumar Venkatesh
L02 Ashish Mathur
L02 Geeta Kukreja
A00 Sameer Mehta
L02 Harish Rai

As one can observe, there are repetitions in the Project ID column.  One may want to transpose this data column wise like this

A00 L02
Ashish Mathur Geeta Puri
Henry Ajay Singh
Paul Robert
Sanjay Singh Mike
Sameer Mehta Ajay Kumar Venkatesh
Ashish Mathur
Geeta Kukreja
Harish Rai

In this workbook, I have solved the problem using Excel formulas and Power Query.

You may watch a short video of my solution here

Create a daily work schedule

{2 Comments}

Imagine a 4 column dataset as shown below:

Resource Name Start End Projects
Raghav 02-06-2015 05-06-2015 p4
Raghav 02-06-2015 07-06-2015 p5
Raghav 08-06-2015 12-06-2015 p1
Raghav 16-06-2015 19-06-2015 p2
Raghav 22-06-2015 26-06-2015 p3
Ashish 03-06-2015 09-06-2015 p6
Ashish 04-06-2015 07-06-2015 p7

One may want to transform this dataset to the below mentioned one.  There will obviously be a lot more columns to the right (I'm only showing a snip)

Employee Name 01-06-2015 02-06-2015 03-06-2015 04-06-2015 05-06-2015
Raghav p4 & P5 p4 & P5 p4 & P5 p4 & P5
Ashish p6 p6 & P7 p6 & P7

I have solved this problem with the help of Microsoft Power Query. You may download my solution workbook from here.

You may watch a short video of my solution at this link

Remove duplicates from rows

{0 Comments}

Assume a simple matrix like data structure as shown below.  As one can observe, for every unique ID, there are duplicate traits appearing.  The objective is to remove duplicate traits from each row.  The Expected solution is show in the second table.

Unique ID Trait1 Trait2 Trait3 Trait4 Trait5
a 1 1 2
b 1 2 1 3 3
c 2 2 1 3 4
Unique ID Trait 1 Trait 2 Trait 3 Trait 4
a 1 2
b 1 2 3
c 1 2 3 4

While there is a feature to remove duplicates from columns, there is no feature to do so from rows.  You may download my workbook from here.

You may also watch a short video of my solution here

Flip a string with conditions

{2 Comments}

Assume a column with some IP addresses.  Each cell has 4 portions separate by a dot.  The number of digits in each portion can either be 2 or 3.  The following tasks need to be performed on this dataset:

1. Drop the last portion of the string
2. Flip the string
3. Add the following text to the flipped string - "in-addr.arpa" (without double quotes)

The base data and expected solution is shown below

While this can be resolved by writing formulas, I have shown a Power Query solution in this workbook.

You may also watch a short video of my solution here

Consolidate multiple rows of data and remove blank rows

{2 Comments}

Imagine a dataset representing questions answered by various respondents in a survey.  The first column has the Respondent's Name and thereafter there is one column each for a question posed in the survey.  The data extracted from the system suffers from the following fallacies:

1. There are blank rows; and
2. Multiple questions answered by one respondent appear in multiple lines rather than in one line.

One may want to delete the blank rows and consolidate the dataset to show only one row per respondent.

Here's a snapshot of the source data and expected result

You may refer to my solution in this workbook.

Here's a short video of my solution

Merge data from multiple cells into a single cell

{10 Comments}

Assume a simple two column dataset with Name in column A and Department in column B.  There are repetitions in the Name column.  Here's a snapshot of the base data:

One may want to merge data from multiple cells into a single cell - basically a view where all unique names are listed in a column and all departments for each name appear in a single cell (separated by commas).  The expected result should look like this:

You may view my solution in this workbook.

You may also watch a short video of my solution here

To accomplish the reverse of this i.e. start at Table 2, process the dataset and arrive at Table1, you may refer to my solution at this link.