Category: POWER QUERY

Segregating data appearing in a single column into multiple columns where there is blank row between records

{0 Comments}

Here is a sample dataset downloaded from an external application.  As you can observe, there are anomalies in the dataset:

  1. All the data appears in a single column when Infact it should be segregated into different columns; and
  2. There is no separating one record from another.  All we know is that each record ends with a number.  So:
    1. Record1 starts at row 2 and ends at row 5
    2. Record2 starts at row 6 and ends at row 9
    3. Record3 starts at 14 and ends at row 17
  3. Not all records have 4 rows.  As an exception, the highlighted range below has 5 rows (row 82 to 86).  The text entry "Part paid" appears in row 85.  In no other record is there the payment status before the number.


The objective is to clean the dataset and transform it into a 5 column one as shown below:
I have solved this question using the Query Editor (available in both MS Excel and PowerBI Desktop).  You may download my solution workbook from here.

Tabulating data from multiple unstructured Excel files

{0 Comments}

Many a times data downloaded from Applications/ERP's are not in a filter/Pivot ready format.  In such cases, a lot of time has to first be invested in getting that data in proper order before even beginning to analyse that data.  What makes this situation worse is that data is downloaded every month in that unstructured manner and the effort which was spent on "cleaning up" the data in the previous month has to be replicated this month as well.

Here is one such dataset.  Some easy to spot problems here are:

  1. All data is in a single column instead of being in multiple columns
  2. Data which should ideally appear in a single row is appearing in multiple rows.  See data in rows 26 to 32.  Information from rows 27 to 32 should appear in row 26
  3. Rows 1 to 6 are not required at all

Data should appear in the following format to make it "filter/Pivot ready".  Furthermore, when multiple such files (as the one shown above) are placed in a certain folder, one should simply be able to open the output file (shown below) and go to Data > Refresh All.

I  have been able to solve this problem using Data > Get & Transform.  Following these steps to get my solution up and working on your system:

  1. Download the input file from here
  2. Download the output file from here
  3. Create a folder on your desktop and place the input file in this folder.  Store as many input files in this folder.  Change data in those input files, if you so wish.
  4. Save the output file anywhere outside the folder that you created in step 3 above
  5. Ensure that all input workbooks are closed
  6. Open the output file and go to Data > Queries and Connections.  Right click on the Result Query > Edit
  7. Double click on Source and select the folder that you created in 3 above
  8. Click on Close and Load
  9. Anytime there is any change in the source folder files, just go to the Output file and click on Data > Refresh All

Append data from multiple worksheets of multiple workbooks where each worksheet has a different heading

{0 Comments}

In a folder there are multiple workbooks with an unknown number of worksheets in each workbook.  Each worksheet has data for one year and has 13 columns - the first is for the Product and the other 12 are for each month of the year.  So sheet1 of Book1 has Product in column1, 1 Jan 2015 in column2, 1 Feb 2015 in column3 and so on until 1 Dec 2015 in column13.  Likewise, sheet2 of Book1 has Product in column1, 1 Jan 2016 in column2, 1 Feb 2016 in column3 and so on until 1 Dec 2016 in column13.  The worksheet which has data for the current year will have fewer columns since data for all months is not available.  The number of rows on each worksheet vary depending upon the number of products sold in that year (as can be seen in the two images below).

The objective is to append data from all worksheet of all workbooks to create a simple 3 column Table - Date in column1, Product in column2 and Value in column3.

The solution should be dynamic enough to accommodate the following:

  1. More files added to/deleted from the folder
  2. More worksheets added to/deleted from the existing files
  3. Data added by rows/columns to the existing worksheets
  4. Heading changes in the existing worksheets i.e. in sheet1 of the DATA1 workbook (first image above), one may want to change the year appearing in the first row from 2015 to 2013

This question would have been an easy one to solve had the headings on all worksheets of all workbooks been the same such as Month1, Month2, Month3, Month4 etc.  In that scenario, the technique shown in this YouTube vide would have worked very well.  However, since the headers on each worksheet of each workbook are different, this technique yields an unexpected result (shown below) - Please note that I have not used the Table.PromoteHeaders([Data]) command as described in the video because that would take away the row with dates i.e. row1, row5, row11, row16.  If I do not run that step, I get the output as shown in the image below.  The dates appear as part of the Data rather than being identified as headers.

I have solved this question using the Power Query Editor.  You may download my source data from here and solution workbook from here.

  1. Download the source data workbooks and save them in a folder.
  2. Download the solution workbook and save it anywhere else.
  3. Open the solution workbook and go to Data > Queries and Connections.
  4. In the pane that opens up, right click on the Query > Edit.
  5. In the Applied steps window, double click on Source and specify the path where you have saved the source data workbooks (step1 above).
  6. Click on OK > Close and Apply.

Determine the top selling location for each product

{0 Comments}

Visualise a 3 column dataset as shown below - Location, Product and Sales.  Each location can have multiple products (Product A has Banana, Apple and Carrot) and each product can be sold in multiple locations (Banana is sold in locations A, B and F).

The objective is to determine the location with highest sales for each product.  So for Banana, maximum sale value is 25 and location of maximum sales value is B.  Likewise for Orange, maximum sales value is 49 and location of maximum sales value is A.  The expected result is:

I have 4 solutions to this problem:

  1. Advanced Filters - This is a static solution.  For any changes in the source data range, one will have to re-enter the 3 inputs in the Advanced Filter window
  2. Formulas - This is a semi-dynamic solution.  To make it fully dynamic, one will have to write an array formula to first extract all unique product names in a column.  The array formula to extract product names in a column can be obtained from here.
  3. Power Query - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All
  4. PowerPivot - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All

You may download my solution workbook from here.

Remove duplicates from each cell of a dataset

{0 Comments}

Here's a dataset with 4 columns and 2 rows.  In column A, there is Patient ID and in the other columns are observations recorded by 3 Doctors for each patient.  As can be seen, in each cell there are multiple observations which are either separated by Alt+Enter (another line in the same cell) and/or a comma.  To make things more complicated, there are duplicate text entries appearing in each cell.  In cell B2, Creative appears twice and in cell D2, Fatigue, headaches and stress each appear twice.  Please also note that the entries appear in mixed case in each cell i.e. in cell D2, the first occurrence of stress is in proper case and the second instance is in small case.

The objective is to remove duplicates from each cell (disregarding case sensitivity) and standardize the multiple entries with an Alt+Enter in each cell.  The expected result is shown in the green table below

I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016 and higher versions.  You may download my solution workbook from here.

Merge 2 work schedules

{0 Comments}

In a certain Organisation, assume that there are two work schedules - one that runs from Friday to Thursday and another that runs from Sunday to Saturday.  In the image below, Work schedule 1 is in range A2:C4 and Work schedule 2 is in range E2:G4.  In each dataset, the start date is the first column i.e. the week starts from there.  So from 28-Dec-2018 to 29-Dec-2018, only workers 1 and 2 would work but from 30-Dec-2018 to 03-Jan-2019, all 4 workers would work.  The objective is to create a combined work schedule as shown in range I2:N7.

The solution is dynamic for new rows that will be added to the source data Tables.  I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016 and higher versions.

You may download my solution workbook from here.

Rearrange a multi heading dataset into a single heading one which is Pivot ready

{6 Comments}

Any well arranged dataset should be "Pivot Table" ready with the following 3 important properties:

  1. There should be no merged and centered cells; and
  2. Every column should have a unique heading; and
  3. Every column should have only 1 heading

Here's one dataset which violates all rules mentioned above.

  1. Headings in row 1 are merged; and
  2. The headings in row 2 are not unique
  3. Every column has headings in row 1 and row 2.

To be Pivot Table friendly, this dataset will have to be restructured into a 4 column one - Week, Equipment, Jobsite and Widgets as shown below:

I have achieved the desired result by using Data > Get & Transform (also known as Power Query in earlier versions of MS Excel).  The solution is dynamic for new rows and columns added to the data on the Input worksheet - one simple has to go to Data > Refresh All.  You may download my solution workbook from here.

In this workbook, there is another example of how one can transform a multi heading dataset into a Pivot Table ready dataset.  The primary difference between this and the previous dataset is that there are 2 descriptive columns to the left (as against only one in the previous example).

Rearrange travel data to clearly show travel from and travel to locations

{0 Comments}

Consider a 2 column dataset as shown below

User Location
AAA Tokyo
AAA Osaka
AAA Nagoya
AAA Hakone
AAA Kyoto
BBB Sapporo
BBB Nara
CCC Tokyo
CCC Hakone
CCC Osaka
DDD Osaka
DDD Tokyo

Customer AAA travelled from Tokyo to Osaka, Osaka to Nagoya, Nagoya to Hakone and Hakone to Kyoto.  All locations appear in a single column.  To analyse customer travel information very clearly, one may want to rearrange the dataset as follows:

User From To
AAA TOKYO OSAKA
AAA OSAKA NAGOYA
AAA NAGOYA HAKONE
AAA HAKONE KYOTO
BBB SAPPORO NARA
CCC TOKYO HAKONE
CCC HAKONE OSAKA
DDD OSAKA TOKYO

I have solved this problem using Power Query. You may download my solution workbook from here.

Sort, comma separated entries appearing in a cell, in ascending order

{7 Comments}

Assume a single column dataset as shown below.  As you can observe, there are multiple entries appearing the same cell separated by commas.

untitled

The objective is to sort, in ascending order, the entries in each cell.  The expected result is shown below.

untitled

I have solved this problem using Power Query a.k.a Data > Get & Transform in Excel 2016.  You may download my solution workbook from here.

Search for multiple phrases within a cell and extract all those phrases in another column

{0 Comments}

Assume a simple two column dataset as shown below.  In column A are some text strings.  In each cell are multiple phrases separated by commas.  In column C are some phrases - one in each cell.  The objective is to search for all phrases in each cell and extract the phrases found in another column of the same row.

untitled

The expected result is:

untitled

I have solved this problem using Power Query a.k.a. Data > Get & Transform in Excel 2016.  You may download my solution workbook from here.