Category: DATA APPEND

Merge and append data from two worksheets

{4 Comments}

Visualise a workbook with two worksheets. The tab name of the first worksheet is "My Table" and of the other worksheet is "My colleagues Table". Each worksheet has 4 columns - First Name, Last Name, Pin code and Mobile Number. In "My Table", the first, second and third columns are filled up and in "My colleagues table", first, second and fourth columns are filled up. Furthermore, there are a few additional names on "My colleagues worksheet". The task is to create a combined database from both these worksheets and for doing so, the following sub tasks have to be performed:

1. Bring over data from the 4th column of "My colleagues Table" to the 4th column of "My Table" (Merge data); and
2. Bring over the additional rows from "My colleagues Table" to "My Table" (Append data)

If you are using the Power Query add-in, then a few simple steps (no formulas) can solve this problem. The result will be dynamic and refreshable (just as in a Pivot Table).

You may download the workbook from here

You may watch a short video of my solution here

Append data from alternate columns of the same table

{0 Comments}

On the Data worksheet, data for payments received from different customers is shown invoice wise.  Since payments are received in tranches, they are shown in different columns - column C and D show dates and amounts received in the first tranche respectively and column E and F show dates and amounts received in the second tranche respectively.

One may want to generate the list of Invoice Number, Customer Name and the Amount for a particular date.  Since dates are split across two columns, it will not be possible to filter the dataset.  In order to filter the dataset, one will have to append data from alternate columns.

I have been able to accomplish this by using Power Query - a free add-in from Microsoft for Excel 2010 and higher versions.

You may refer to my solution in this workbook.

You may watch a short video of my solution here

Append data from two worksheets with different structures

{2 Comments}

Assume a file with two worksheets.  The first worksheet has data in column A:H (8 columns) and the second one has data from B:J (9 columns).  One may want to create a third worksheet with specific columns of data appended from both worksheets - the columns to be shown in the third worksheet are not  at the same position on both worksheets i.e. while the Inventory Date data is in column A on sheet1, it is in column H on sheet2.

You may refer to my solution in "Resultant Data - final" worksheet of this workbook.

I have solved this problem using Power Query - a free add-in from Microsoft for Excel 2010 and higher versions.  The result is dynamic and refreshable (just as in a Pivot Table).

You may watch a short video of my solution here

Compute MODE of all numbers split across multiple worksheets

{10 Comments}

Assume numbers are typed in range A1:A2 of multiple worksheets in a workbook.  The task is the compute the MODE of these numbers.  Mode is defined as the value which appears most frequently in a range of cells.  So, if one types 1,3,4,3,5,6 in range A1:A6, then the mode will be 3 - 3 appears maximum number of times in the range.

In MS Excel, there is a built in way to compute the MODE.  The formula for the same is

=MODE(A1:A6)

Unfortunately, MODE() is not a 3D function and therefore, something like this return a #REF error

=MODE(sheet1:sheet3!A1:A6)

This behavior seems somewhat vague because other basic Mathematical and Statistical functions such as SUM(), COUNT(), AVERAGE(), MAX(), MIN(), VAR(), and STDEV() work just fine across multiple worksheets.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem.

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas at all) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).  In Power Query, there is a feature to append data from multiple worksheets into one running range.  Once this is done, the running range can be transferred to an MS Excel worksheet on which the MODE function can be used.

Solution for all versions of MS Excel

To compute MODE across multiple worksheets, you may refer to my solution in this workbook.

Create a Pivot Table from multiple worksheets in different workbooks

{65 Comments}

Assume two workbooks named Book1.xls and Book2.xls.  In Book1.xls, there is a sheet named Jan and in the Book2.xls, there is a worksheet named Feb.  In both the worksheets - Jan and Feb, the following exist

1. Same number of columns on both worksheets; and
2. Same order of columns on both worksheets; and
3. Same spellings of headings on both worksheets

As you can observe in zipped file, all three aspects mentioned above are in place.  One may want to perform any of the following tasks:

1. Create one pivot table from both worksheets (which are in different workbooks); and/or
2. Consolidate data from both worksheets (which are in different workbooks) one below the other

To curtail the workbook size, I have deleted rows of data.  In each workbook, there were 65,000 rows of data on each Jan worksheet and Feb worksheet.

While one simple way would be to copy and paste data from Jan and Feb worksheet in a third worksheet and then create a pivot table, the following shortcomings exist with this method

1. Copying and pasting data from multiple workbooks into one is a manual process; and
2. Since the pivot table will be based on the manually created sheet (by copying and pasting), changes in the two original workbooks will not update the pivot table on refreshing unless the same changes are carried out in the consolidated sheet (created by copying and pasting above) as well.  So this leads to duplication of effort.

Both problems outlined above can be resolved by using MS Query.  The technique mentioned below will work as long as the three conditions mentioned above are satisfied.  Furthermore, since MS Query will only recognize named ranged with rows up to 65,536, the number of rows of data in each of the worksheets (Jan and Feb) should not exceed 65,536.  The combined rows in both worksheets could be any number.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas at all) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).  In order for the Power Query solution to work, all files will have to be saved as CSV files in the same folder (Since a CSV files can have only one worksheet, the Jan and Feb worksheets in the other workbook will have to become two workbooks - Jan.csv and Feb.csv).  In Power Query, there is a feature to append data from multiple CSV files lying in a specific folder into one running range.  Once this is done, the running range can either be transferred to an MS Excel worksheet (if the number or rows are up to 1 million) from where a Pivot Table can be created OR the running range can be loaded to the Data Model (Excel 2013) [the Data Model will be able to accommodate more than 1 million rows depending upon the hardware and software configuration of the machine] from where a Pivot Table can be created.

You may watch a short video here:

Solution for all versions of MS Excel

The steps for creating a pivot table from multiple worksheets (both in the same file) are:

1. Both workbooks - Book1.xls and Book2.xls are saved on the desktop.
2. Open Book1.xls, select the data on the Jan sheet (including the first row as the header row - on the Jan sheet, it will be A1:S4.  Ensure that the header row has some distinctive formatting such as Bold or some colour) and press Ctrl+F3 > New.  In the Name box, type Dummy and click on OK > Close.
3. To cross check that the name assigned above has indeed been assigned correctly, select the data range once again and in the Name box (left of the formula bar), Dummy should appear.
4. Select A1:S4 of the Jan sheet again and press Ctrl+T to convert this range into a Table.  Ensure that the "My Table has headers" box is checked.  Save and close Book1.xks
5. Repeat steps 2 - 4 for the Feb sheet on Book2.xls as well.  In step 3, just change the name to Dummy1.  Save and close Book2.xls
6. Open a new workbook and go to Data > From Other Sources > From Microsoft Query
7. Under Databases, select Excel files > OK
8. In the Directories dialog box, navigate to the folder on the desktop where the Book1.xls file is saved.  So for me, it is saved under C:\Users\Ashish\Desktop\ and double click on the folder where Book1.xls file is saved
9. In the left hand side window, select the Book1.xls file and click on OK
10. With Dummy selected, click on the > symbol to bring over all columns of this named range to the right hand side box
11. Click on Next three times
12. Select the option of Return Data to Microsoft Excel
13. At this stage, if you wish to get data in a Table form then select Table.  If you wish to create a pivot table, select the second option button - Pivot Table.  Select any cell where you would like to the result to appear, say cell A1.
14. Click on OK.  A counter will run at the bottom left hand side with the title of Reading Data
15. If you had selected pivot table in step 13 above, then the pivot table grid/layout will appear
16. You may now drag fields to create a pivot table

So the Table or pivot table which you have created so far is only from Jan sheet of Book1.xls.  We have to also include data from Feb sheet of Book2.xls.

17. If you had created a Table in step 13 above, then click on any cell in the Table and under Table Tools (yellow button in the MS Excel title bar) > Options, click on the small drop down arrow under Refresh and select Connection Properties.  If you had created a Pivot Table in step 13 above, then select any cell in the pivot table and under Pivot Table Tools (red button in the MS Excel title bar) > Options, click on the small drop down arrow under Refresh and select Connection Properties.
18. In the Command text box, you see something like this

SELECT dummy.Invoice_number, dummy.Date, dummy.`Article Number`, dummy.`Article Name`, dummy.`Tax Type`, dummy.Brand, dummy.Category, dummy.`Sports Code`, dummy.`Product Type`, dummy.MRP, dummy.`Customer #`, dummy.`Customer Name`, dummy.Region, dummy.`Area Code`, dummy.Quantity, dummy.`Line Discount`, dummy.`Sales Value`, dummy.`Before Tax`, dummy.`VAT Amount` FROM `C:\Users\Ashish\Desktop\Book1.xls`.dummy dummy

Replace this lengthy statement with the following shorter one

SELECT * from `C:\Users\Ashish\Desktop\Book1.xls`.dummy
Union all
SELECT * from `C:\Users\Ashish\Desktop\Book2.xls`.dummy1

19. Click on OK and the Table/Pivot Table should update
20. Save the file as Consolidated.xls and close it.

Update Table/Pivot Table for addition/editing in existing named ranges

To see the effect of a dynamic Table/pivot table, edit data in Jan and/or Feb sheets of Book1.xls and/or Book2.xls.  Save and close the file(s).  Open Consolidated.xls and refresh the Table/Pivot Table (Right click and Refresh).  At the bottom right had side the counter will run again and (step 14 above) and once it has read all data, the Table/pivot table will update.

Furthermore, since the two named ranges (Dummy and Dummy1) have been converted into Tables, even if you add data by rows (with no row being left entirely blank), when you right click to Refresh the pivot Table, data of new rows will appear.

Update Table/Pivot Table for addition of new worksheets

Assume you now want to add data from a worksheet titled March in Book3.xls.  To include this sheet in the pivot Table, follow the under mentioned steps:

1. Open Book3.xls

2. Follow steps 2 - 4 mentioned above for the March sheet.  In step 3, just change the name to Dummy3

3. Save and close the workbook

4. Open Consolidated.xls

5. Select any cell in the pivot table and under Pivot Table Tools (red button in the MS Excel title bar) > Options, click on the small drop down arrow under Refresh and select Connection Properties.  If you selected Table (instead of Pivot Table) in step 13 above, click on any cell in the Table and under Table Tools (yellow button in the MS Excel title bar) > Options, click on the small drop down arrow under Refresh and select Connection Properties.

6. Click on the Definition tab and under Command Text, add the following at the end of the SQL query:

Union all
SELECT * from `C:\Users\Ashish\Desktop\Book3.xls`.dummy3

7. When you click on OK, the counter will run again and the pivot should reflect data from March sheet.

Create a Pivot Table from multiple worksheets in the same workbook

{310 Comments}

Assume a workbook with two worksheets - Jan and Feb.  In both the worksheets, the following exist

1. Same number of columns on both worksheets; and
2. Same order of columns on both worksheets; and
3. Same spellings of headings on both worksheets

As you can observe in this workbook, all three aspects mentioned above are in place.  One may want to perform any of the following tasks:

1. Create one pivot table from both worksheets; and/or
2. Consolidate data from both worksheets (one below the other)

To curtail the workbook size, I have deleted rows of data.  In the actual workbook, there were 65,000 rows of data on each worksheet.

While one simple way would be to copy and paste data from both worksheets into one and then create a pivot table, the following shortcomings exist with this method

1. Copying and pasting data from multiple worksheets into one is a manual process; and
2. Since the pivot table will be based on the manually created sheet (by copying and pasting), changes in the two original sheets will not update the pivot table on refreshing unless the same changes are carried out in the consolidated sheet (created by copying and pasting above) as well.  So this leads to duplication of effort.

Both problems outlined above can be resolved by using MS Query.  The technique mentioned below will work as long as the three conditions mentioned above are satisfied.  Furthermore, since MS Query will only recognize named ranged with rows up to 65,536, the number of rows of data in each of the worksheets should not exceed 65,536.  The combined rows in both worksheets could be any number.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem.

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas at all) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).  In Power Query, there is a feature to append data from multiple worksheets into one running range.  Once this is done, the running range can either be transferred to an MS Excel worksheet (if the number or rows are up to 1 million) from where a Pivot Table can be created OR the running range can be loaded to the Data Model (Excel 2013) [the Data Model will be able to accommodate more than 1 million rows depending upon the hardware and software configuration of the machine] from where a Pivot Table can be created.

Solution for all versions of MS Excel

You may watch a short video of my solution here

The steps for creating a pivot table from multiple worksheets (both in the same file) are:

1. Suppose the workbook with the two worksheets (Jan and Feb) is named Monthly data.xls and is saved on the desktop.
2. Select the data on the Jan sheet (including the first row as the header row - on the Jan sheet, it will be A1:S4.  Ensure that the header row has some distinctive formatting such as Bold or some colour) and press Ctrl+F3 > New.  In the Name box, type Dummy and click on OK > Close.
3. To cross check that the name assigned above has indeed been assigned correctly, select the data range once again and in the Name box (left of the formula bar), Dummy should appear.
4. Select A1:S4 of the Jan sheet again and press Ctrl+T to convert this range into a Table.  Ensure that the "My Table has headers" box is checked.
5. Repeat steps 2 - 4 for the Feb sheet as well.  In step 3, just change the name to Dummy1
6. Save the workbook as Consolidated.xls (save as Excel workbook if the combined rows in Dummy and Dummy1 exceed 65,536 rows) and close it
7. Open a new workbook (Ctrl+N) and go to Data > From Other Sources > From Microsoft Query
8. Under Databases, select Excel files > OK
9. In the Directories dialog box, navigate to the folder on the desktop where the Monthly Data.xls file is saved.  So for me, it is saved under C:\Users\Ashish\Desktop\ and double click on the folder where Monthly Data.xls file is saved
10. In the left hand side window, select the Monthly Data file and click on OK
11. With Dummy selected, click on the > symbol to bring over all columns of this named range to the right hand side box
12. Select Dummy1 and click on the > symbol to bring over all columns of this named range to the right hand side box
13. Click on Next and on the message box which appears, click on OK
14. Click on the SQL button and delete all content from the SQL Statement box
15. Type this SQL statement in the box there

Select * from dummy
union all
Select * from dummy1

16. Click on OK and on the message box which appears, click on OK
17. Under File, select the last option - Return Data to Microsoft Excel
18. At this stage, if you wish to get data from Dummy and Dummy1, one below the other, then select Table.  If you directly want a pivot table, select the second option button - Pivot Table.  Select any cell where you would like to the result to appear, say cell A1.
19. Click on OK.  A counter will run at the bottom left hand side with the title of Reading Data
20. If you had selected pivot table in step 18 above, then the pivot table grid/layout will appear.
21. You may now drag fields to create a pivot table
22. Close the workbook

Update Table/Pivot Table for addition/editing in existing named ranges

To see the effect of a dynamic Table/Pivot Table, edit data in Jan and/or Feb sheets of Monthly data.xls.  Save and close the file.  Open Consolidated.xls and refresh the Table/Pivot Table (Right click and Refresh).  At the bottom right had side the counter will run again and (step 19 above) and once it has read all data, the Table/Pivot Table will update.

Furthermore, since the two named ranges (Dummy and Dummy1) have been converted into Tables, even if you add data by rows (with no row being left entirely blank), when you right click to Refresh the Table/Pivot Table, data of new rows will appear.

Update Table/Pivot Table for addition of new worksheets

Assume you now add a new worksheet, titled March in Monthly data.xls.  To include this sheet in the pivot Table, follow the under mentioned steps:

1. Open Monthly data.xls

2. Follow steps 2 - 4 mentioned above for the March sheet.  In step 3, just change the name to Dummy2

3. Save and close the workbook

4. Open Consolidated.xls

5. Select any cell in the pivot table and under Pivot Table Tools (red button in the MS Excel title bar) > Options, click on the small drop down arrow under Refresh and select Connection Properties.  If you selected Table (instead of Pivot Table) in step 18 above, click on any cell in the Table and under Table Tools (yellow button in the MS Excel title bar) > Options, click on the small drop down arrow under Refresh and select Connection Properties.

6. Click on the Definition tab and under Command Text, add the following at the end of the SQL query:

union all
Select * from dummy2

7. When you click on OK, the counter will run again and the pivot should reflect data from March sheet.

Update Table/Pivot Table when files are mailed to someone else

If you now mail both files - Monthly Data.xls and Consolidated.xls to someone else, that other person will not be able to refresh the pivot table because the path specified in step 9 above would be different for that other person.  To reestablish the path, follow the under mentioned steps:

1. Open Consolidated.xls

2. Select any cell in the pivot table and under Pivot Table Tools (red button in the MS Excel title bar) > Options, click on the small drop down arrow under Refresh and select Connection Properties.  If you selected Table (instead of Pivot Table) in step 18 above, click on any cell in the Table and under Table Tools (yellow button in the MS Excel title bar) > Options, click on the small drop down arrow under Refresh and select Connection Properties.

3. Click on the Definition tab and under Connection String, edit the following:

a. From DBQ= till the next ;, change the file path to the new path where the mail recipient has saved the file.  So, on my system, between DBQ= and the next ;, the path is C:\Users\Ashish\Desktop\Jan.xls.  This will change to C:\Users\John\Desktop\Data\Jan.xls

b. From DefaultDir= till next ;, change the folder path to the new path where the mail recipient has saved the file.  So, on my system, between DefaultDir= and the next ;, the path is C:\Users\Ashish\Desktop\.  This will change to C:\Users\John\Desktop\Data\