Restructure the layout of datasets

{ 4 Comments }

I’d like to discuss 2 cases of restructuring layouts of datasets:

Case 1

Consider the 7 columns dataset below.  For every Doc Number, there are 6 columns – 3 columns for the name of the employee and another 3 columns for the Percent of each employee.

Doc Number Emp 1 Emp 1 % Emp 2 Emp 2 % Emp 3 Emp 3 %
111111 John 5% Eric 6% Jane 7%
222222 Mary 10% Joe 4% Jill 25%

The objective is to restructure the data layout to a 3 column one  – Doc Number, Employee Name and Employee Percent.  Such a revised data layout would enable filtering, Pivoting, charting etc.  Furthermore, the solution should be dynamic for new columns added.  The expected data layout is:

Doc Number Employee Percent
111111 John 5%
111111 Eric 6%
111111 Jane 7%
222222 Mary 10%
222222 Joe 4%
222222 Jill 25%

I have solved this problem using Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Case 2

Consider the 7 column dataset below (there are actually 13 columns.  For want of space only 7 have been shown).  For every Phone Number, there are up to 12 columns – 6 columns for the date on which the call was placed and the other 6 for the remark jotted down by the service agent based on the interaction with the customer.  The software from where the data is downloaded is capable of downloading only 6 columns per Phone Number.  If there were more than 6 calls placed on the same customer, then multiple rows will appear for that same number.  A case in point being Phone number 561 below – you will see that this number appears multiple times in the table below.

Phone Date 1 Remark 1 Date 2 Remark 2 Date 3 Remark 3
561 12/03/17 Interested 17/03/17 Not Interested 22/03/17 Not Reachable
562 13-03-17 Not Interested 18-03-17 Call Busy 23-03-17 Not Reachable
561 14-03-17 Not Interested 19-03-17 Not Reachable 24-03-17 Call Dropped
564 15-03-17 Call Busy 20-03-17 Not Interested 25-03-17 Call Later
562 16-03-17 Call Busy 21-03-17 Call Busy 26-03-17 Call Busy
561 17-03-17 Interested 22-03-17 Call Later 27-03-17 Call Dropped
563 18-03-17 Call Later 23-03-17 Interested 28-03-17 Call Busy
568 19-03-17 Call Dropped 24-03-17 Call Busy 29-03-17 Call Busy
561 20-03-17 Call Busy 25-03-17 Not Reachable 30-03-17 Not Interested
565 21-03-17 Call Dropped 26-03-17 Call Later 31-03-17 Call Later
562 22-03-17 Interested
574 23-03-17 Not Interested 28-03-17 Call Dropped 02-04-17 Not Interested
563 24-03-17 Interested 29-03-17 Not Reachable 03-04-17 Call Later
572 25-03-17 Call Dropped 30-03-17 Call Busy 04-04-17 Interested
573 26-03-17 Call Dropped 31-03-17 Not Interested 05-04-17 Not Reachable

The objective here is to rearrange the layout so that only one row exists per Phone number.  The result should be dynamic for new Phone numbers added and the same number being repeated in many more rows.  The expected data layout is (for want of space only limited columns have been shown here):

Phone Custom.1 Custom.2 Custom.3 Custom.4 Custom.5 Custom.6
561 3/12/2017 Interested 3/17/2017 Not Interested 3/22/2017 Not Reachable
562 3/13/2017 Not Interested 3/18/2017 Call Busy 3/23/2017 Not Reachable
564 3/15/2017 Call Busy 3/20/2017 Not Interested 3/25/2017 Call Later
563 3/18/2017 Call Later 3/23/2017 Interested 3/28/2017 Call Busy
568 3/19/2017 Call Dropped 3/24/2017 Call Busy 3/29/2017 Call Busy
565 3/21/2017 Call Dropped 3/26/2017 Call Later 3/31/2017 Call Later
574 3/23/2017 Not Interested 3/28/2017 Call Dropped 4/2/2017 Not Interested
572 3/25/2017 Call Dropped 3/30/2017 Call Busy 4/4/2017 Interested
573 3/26/2017 Call Dropped 3/31/2017 Not Interested 4/5/2017 Not Reachable

I have solved this problem using formulas and Power Query a.k.a. Get & Transform under the Data menu in the Ribbon.  You may download my solution from here.

Leave a Comment

Your email address will not be published.

*

  • I found your article on structuring interesting but i have a different problem. I have several tables in one csv file like so

    Population
    State 2006 2007 2008 2009
    Austin 1515.485 1577.856 1633.87 1682.338
    Baltimore 2658.162 2667.619 2679.819 2696.018
    Birmingham 1098.818 1107.256 1117.101 1125.271
    Boston 4427.356 4447.838 4483.141 4527.219854
    Buffalo 1141.712 1137.678101 1136.364 1135.377075
    Employment
    Austin 736.3416874 772.041675 791.06665 774.1500124
    Baltimore 1304.041663 1315.591656 1311.883331 1271.283331
    Birmingham 532.78335 537.5750094 533.65835 504.3749939
    Boston 2432.032742 2469.946328 2488.538051 2409.106774
    GDP 2006 2006 2006 2006
    Austin 75.2357625 78.907995 81.0912475 79.07640491
    Baltimore 142.4562092 143.0674591 143.7930262 144.0364532
    Birmingham 47.63894824 47.92583742 48.00674872 45.80583746
    Boston 295.4216638 303.5958897 304.4910121 300.8553522

    three tables in one, repeated dates and header for each table in its on rows. I have tried , transposing, filling and pivoting. None have been effective?

    • Hi,

      Sorry fort the delay in replying. I saw your post only today. Your data has not been pasted properly. Please share the link from where I can download your MS Excel file. In that file, please show the result clearly.

  • Hello sir, for the case 1. I can’t seem to find the solution from the link you have provide, instead I only can see the excel of your data before and after. Is that possible for me to know how did you do for the case 1 problem? Thank you and have a nice day

    • Hi,

      I have solved the problem using the Query Editor. Go to Data > queries and Connections. In the right hand side pane, right click on the Query and click on Edit. In the Applied step pane, you will be able to see all steps which i followed in the Query Editor.