Category: DATA STRUCTURING

Restructure the layout of datasets

{0 Comments}

I'd like to discuss 2 cases of restructuring layouts of datasets:

Case 1

Consider the 7 columns dataset below.  For every Doc Number, there are 6 columns - 3 columns for the name of the employee and another 3 columns for the Percent of each employee.

Doc Number Emp 1 Emp 1 % Emp 2 Emp 2 % Emp 3 Emp 3 %
111111 John 5% Eric 6% Jane 7%
222222 Mary 10% Joe 4% Jill 25%

The objective is to restructure the data layout to a 3 column one  - Doc Number, Employee Name and Employee Percent.  Such a revised data layout would enable filtering, Pivoting, charting etc.  Furthermore, the solution should be dynamic for new columns added.  The expected data layout is:

Doc Number Employee Percent
111111 John 5%
111111 Eric 6%
111111 Jane 7%
222222 Mary 10%
222222 Joe 4%
222222 Jill 25%

I have solved this problem using Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Case 2

Consider the 7 column dataset below (there are actually 13 columns.  For want of space only 7 have been shown).  For every Phone Number, there are up to 12 columns - 6 columns for the date on which the call was placed and the other 6 for the remark jotted down by the service agent based on the interaction with the customer.  The software from where the data is downloaded is capable of downloading only 6 columns per Phone Number.  If there were more than 6 calls placed on the same customer, then multiple rows will appear for that same number.  A case in point being Phone number 561 below - you will see that this number appears multiple times in the table below.

Phone Date 1 Remark 1 Date 2 Remark 2 Date 3 Remark 3
561 12/03/17 Interested 17/03/17 Not Interested 22/03/17 Not Reachable
562 13-03-17 Not Interested 18-03-17 Call Busy 23-03-17 Not Reachable
561 14-03-17 Not Interested 19-03-17 Not Reachable 24-03-17 Call Dropped
564 15-03-17 Call Busy 20-03-17 Not Interested 25-03-17 Call Later
562 16-03-17 Call Busy 21-03-17 Call Busy 26-03-17 Call Busy
561 17-03-17 Interested 22-03-17 Call Later 27-03-17 Call Dropped
563 18-03-17 Call Later 23-03-17 Interested 28-03-17 Call Busy
568 19-03-17 Call Dropped 24-03-17 Call Busy 29-03-17 Call Busy
561 20-03-17 Call Busy 25-03-17 Not Reachable 30-03-17 Not Interested
565 21-03-17 Call Dropped 26-03-17 Call Later 31-03-17 Call Later
562 22-03-17 Interested
574 23-03-17 Not Interested 28-03-17 Call Dropped 02-04-17 Not Interested
563 24-03-17 Interested 29-03-17 Not Reachable 03-04-17 Call Later
572 25-03-17 Call Dropped 30-03-17 Call Busy 04-04-17 Interested
573 26-03-17 Call Dropped 31-03-17 Not Interested 05-04-17 Not Reachable

The objective here is to rearrange the layout so that only one row exists per Phone number.  The result should be dynamic for new Phone numbers added and the same number being repeated in many more rows.  The expected data layout is (for want of space only limited columns have been shown here):

Phone Custom.1 Custom.2 Custom.3 Custom.4 Custom.5 Custom.6
561 3/12/2017 Interested 3/17/2017 Not Interested 3/22/2017 Not Reachable
562 3/13/2017 Not Interested 3/18/2017 Call Busy 3/23/2017 Not Reachable
564 3/15/2017 Call Busy 3/20/2017 Not Interested 3/25/2017 Call Later
563 3/18/2017 Call Later 3/23/2017 Interested 3/28/2017 Call Busy
568 3/19/2017 Call Dropped 3/24/2017 Call Busy 3/29/2017 Call Busy
565 3/21/2017 Call Dropped 3/26/2017 Call Later 3/31/2017 Call Later
574 3/23/2017 Not Interested 3/28/2017 Call Dropped 4/2/2017 Not Interested
572 3/25/2017 Call Dropped 3/30/2017 Call Busy 4/4/2017 Interested
573 3/26/2017 Call Dropped 3/31/2017 Not Interested 4/5/2017 Not Reachable

I have solved this problem using formulas and Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Converting a matrix data layout to a tabular layout

{67 Comments}

In this file, the source data sheet is a pasted special pivot table.  The task is to convert this pivot table data layout to the result shown in the "Desired result" sheet.  Effectively, we need to convert a matrix like data layout to a tabular layout.  We need to "denormalise" the pivot table.

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).  You may refer to Power Query result worksheet.

You may watch a short video of my solution here

Solution for all versions of MS Excel

Please try the following steps:

1. Select A6:F234 on the Source Data worksheet
2. Press Ctrl+G > Special > Blanks > OK.  This process will select all blank cells in range A6:F234
3. With the selection in place, press the = key
4. Press the up arrow key
5. Press Ctrl+Enter.  You will now see all blanks cells filled up the values just above.
6. Select A6:F234, copy the range and paste them as values (Alt+E+S+V)

To denormaise the pivot table, the multiple columns appearing in the row labels area of the pivot table (range A6:F234) will need to be crunched into one column by concatenating.  The result is shown in G7:G35 of "Source data adjusted" sheet.  Likewise, the multiple rows appearing in the columns labels area of the pivot table (range G4:DE5) will need to be crunched into one row by concatenating.  The result is shown in H6:DF6 of "Source data adjusted" sheet.

The process of denormalising is as follows:

1. Select range G6:DF235 of the "Source data adjusted" sheet
2. Press Alt+D+P > Multiple Consolidation ranges > Next
3. Select "I will create the page fields" > Next
4. In the range, box, select G6:DF235 > Add > Next
5. Select New Worksheet > Finish.  A new worksheet will get created with a pivot table in it.
6. From the pivot table field list, uncheck Row and Column.  Only Value should remain checked.  The output should appear as shown in "Sheet2"
7. Double click on 6167 in the pivot table.  Another sheet (sheet3) will open up with data in three columns
8. Insert five columns after column A
9. Select range A2:A23588 and go to Data > Text to columns > Delimited > Next > Other > : > Next > Finish
10. Insert one column after column G
11. Select range G2:G23588 and go to Data > Text to columns > Delimited > Next > Other > : > Next > Finish
12. Give meaningful headings in range A1:I1

To exactly match the data on the "Desired Result" sheet, select range A1:I23588 on "Sheet3" and sort column E (CFA code) in ascending order.

Display text entries in the data area of a pivot table

{56 Comments}

Assume a two column database of approximately 20,000 rows (say from A1:B20237).  The first column has vendor names and the second column has Part codes (alphanumeric string).  Headings are in A1:B1, say Vendor in cell A1 and Part_code in cell B1.  One vendor supplies multiple parts and therefore there would be repetitions of vendor names in column A.

The task is to reorient this database to show all part numbers in column A and parts supplied by those vendors should appear in different columns of that row.  If one attempts to create a pivot table, the problem would be with the data area of the pivot table.  When one drags the part codes (alphanumeric string) column to the data area, count of part number would appear instead of the actual part number.  This is the very nature of a pivot table.  Since it is a summarisation tool, one can never see text entries in the data area of a pivot table.

You may download the workbook showing both solutions (using MS Query and Power Query) from here.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (minimal formulas) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).  You may refer to Solution2 - Power Query worksheet.

You may watch a short video of my solution here

Solution for all versions of MS Excel

You may refer to Solution1 - MS Query worksheet.  To reorient the data i.e. to view text entries in the data area of a pivot table, try the following approach:

1. In cell C1, type Count.  So the value in cell A1 is Vendor, in cell B1 is Part_code and in cell C1 is Count
2. In cell C2, enter =$B$1&" "&COUNTIF(A$2:A2,A2) and copy down.  Since the number of rows here are quite many, i would suggest copying in batches i.e. first copy down 5,000 rows, then another 5,000 rows and so on.  This is solely to prevent Excel from crashing and/or taking unduly long to process.
3. Select range A1:C20237 and press Ctrl+F3 > New.  In the name box, enter Dummy
4. Select range A1:C20237 and press Ctrl+T to convert to a Table.  This is to take care of data being added beyond row 20237.
5. Save the file and open a new worksheet (in the same workbook)
6. While on any cell in this new worksheet, go to Data > From Other Sources > From Microsoft Query > Excel Files > OK
7. Navigate to the folder where you saved the file in step 5 above, select the file and click on OK
8. With Dummy selected by default, press the > symbol.  This will show you three column headings in the right hand side box
9. Click on Next three times
10. Select the second option button and click on Finish
11. Click on the SQL button and delete all contents that you see there.  Type the following in the blank white space

TRANSFORM first(Part_code)
SELECT Vendor
FROM dummy
GROUP BY Vendor
PIVOT count

12. Click on OK
13. Go to Data > Return Data to MS Excel
14. In the Import Data box, select Table and in Existing sheet, select cell A1
15. Click on Finish

Data should now appear in the desired format.

For any changes to data in range A2:B20237, right-click on any cell in the output range and select Refresh.  Furthermore, since the range has been converted to a Table, you may even add further rows of data and simply right-click to Refresh the output data range.

To improve performance of your workbook, you may want to copy the formulas in column C and paste them as Values (Paste Special > Values).  Once again, do not paste special all values at once - do so in groups of 5,000 rows to prevent Excel from crashing and/or taking unduly long to process.