Category: DATA EXTRACTION

Compare two sheets and prepare a Discrepancy Report

{4 Comments}

In this workbook, the worksheet named Bookings shows flight booking data as per the company's records.  The Worksheet named Vendor data shows bookings as per the vendor's records.  There are discrepancies at three levels between these two sheets.

1. There could be PNR's in the company records (column B of Bookings) which are not present in the vendor's records (column C of Vendor Data)
2. There could be PNR's in the vendor's records (column C of Vendor Data) which are not present in the company records (column B of Bookings)
3. For PNR's that match between two sheets, the fares could be different (column K of bookings sheet and column L of Vendor sheet)

The objective is to:

1. Create a sheet named PNR discrepancy which lists down points 1 and 2 mentioned above
2. Create a sheet named Fare Mismatches which lists down PNR wise fare mismatches between two sheets - point 3 above.

Worksheet formulas in a newly copied worksheet should point to the previous worksheet

{13 Comments}

Assume a worksheet with formulas referring to the previous worksheet.  When this sheet is copied by right clicking the sheet and selecting Move or Copy > Copy, formulas in this newly created sheet, should change to refer to the previous sheet.

In other words, if sheet2 has formulas referring to sheet1, then when sheet3 is created (by Move or Copy > Copy of sheet2), the formulas in sheet3 should refer to sheet2.

You may refer to my solution and Instructions in this workbook.

Extract specific number of characters from an alphanumeric string without breaking any word

{2 Comments}

When downloading data from an ERP into Excel, a Remarks column has entries which can be as lengthy as 300 characters per cell.  Before uploading this data into another database, entries in the Remarks column need to be split into multiple cells of upto 50 characters per cell.  Furthermore, when splitting data into multiple cells, it should be ensured that no word gets split/broken.  So if the 50th character is alphabet k of the word Like, then that cell should hold words till the word which appears just before Like so that the length does not exceed 50 characters.

You may refer to my solution in the attached workbook.

Split data from a master document into various worksheets based on a template sheet

{19 Comments}

Assume a base data sheet with information for various vendors.  There is also a worksheets which has the template of the reconciliation statement which is sent to all vendors.  At the click of a button, one may want vendor worksheets to be created from the template.

Furthermore, one may want to do either of the following:

1. Create sheets for all vendors at once;

2. Create individual sheets selected by the end user

You may refer to my solution in this workbook.

Extract text from a custom formatted cell

{23 Comments}

In range G5:G44, some numbers are entered.  These numbers are custom formatted with General" Dr" for debit entries and General" Cr" for credit entries.

One may want to extract the Dr and Cr in range H5:H44.

The following process will get the desired result:

1. Press Ctrl+F3 > New and in the Name box, type cell_format
2. In the Refers to box, enter this formula and click on Close

=GET.CELL(53,data!$G5)

3. In cell H5, enter this formula and copy down till cell H44

=RIGHT(cell_format,2)

Dr and Cr should now get appear in range H5:H44.

Transfer data from one Excel file to multiple Excel files

{20 Comments}

Assume a two column database on a sheet named Demerge.  There are duplicate values appearing in column A of this database.  One may want to transfer all records of each unique entry in column A to as many different workbooks are there are unique entries in column A.  So if there are six unique values in column A, then six workbooks should get created (with two columns) and these workbooks should get stored on a specific folder in the desktop.

You may refer to my solution in this workbook.

Performing an iterative lookup to return closest match

{3 Comments}

Suppose a worksheet name is Code_details.  The following values lie in range A2:A14 of this sheet:

A12,AS478,QW447,EQ46,RYT9985,VCX147,BNM159,ASQ478,VC325,ASW675,A123,
ASDFG1234567890 and ASDFG123456789

Suppose there is another worksheet named Summary where the following values lie in range B5:B7:

ASQ47809876,ASW675458 and QWERT

Now one may want to lookup ASQ47809876 in A2:A14 of Code_details sheet.  If this exact value is not found there, then the lookup value should be trimmed by one digit/character from the extreme right i.e. the lookup value should now become ASQ4780987.  The search should carry on till an exact match is found.

You may refer to my solution in this workbook.

Force Excel tables to work across workbooks

{2 Comments}

Assume a two column database in range B3:C11 of sheet1 of Book1.xlsx.  Headings are in B3:C3.  B4:B11 has names and C4:C11 has numbers.  Save this file on the desktop.

Open another workbook (Book2.xlsx) and type any name in cell B6 of sheet1 (the name should be one of the names mentioned in range B4:B11 of sheet1 of Book1.xlsx).  Save this file on the desktop.

With both workbooks open, enter the following formula in cell C6 of sheet1 of Book2.xlsx

=VLOOKUP(B6,[book1.xlsx]Sheet1!$B$4:$C$11,2,0)

The result would be correct.  Now enter a text value in cell B12 of sheet1 of Book1.xlsx and a corresponding number in cell C12.  Enter the same name in cell B6 (overwrite the existing value) of sheet1 of Book2.xlsx.

The result will be #N/A in cell C12 because the range in the VLOOKUP() function is still till row 11 only.  While the easiest way to get the answer would be to edit C11 to C12 in the formula, this would be a manual process for every occasion when data expands in sheet1 of Book1.xlsx.  To make the range in the VLOOKUP() auto expand for new entries added in sheet1 of Book1.xlsx, you may try the following:

1. Delete the data in range B12:C12 of sheet1 of Book1.xlsx
2. Select B3:C11 and press Ctrl+L or Ctrl+T (Insert > Table) to convert the range to a Table
3. In cell B6 of sheet1 of Book2.xlsx, enter any name from range B4:B11 of sheet1 of Book1.xlsx
4. In cell C6 of sheet1 of Book2.xlsx, rewrite the VLOOKUP() function.  It will now look like this

=VLOOKUP(B6,book1.xlsx!Table1[#Data],2,0)

The benefit of converting the source data range into a Table is that for new rows added to the source data table, the range in the VLOOKUP() function will keep expanding.  Therefore, you will not need to manually edit the range in the VLOOKUP() function.

However, now try this to see where the Table feature fails:

1. Save and close Book1.xlsx
2. Press F2 on cell C6 of sheet1 of Book2.xlsx and then Enter

The result will be #REF!.  This happens because named ranged do not work across workbooks - when converting the source data range into a table, each column inherited a name from the column heading (book1.xlsx!Table1[#Data]).  When you open Book1.xlsx, the #REF! error will be replaced with the actual value.

To make the formula work when the source workbook is closed, try this

1. In any of the two Excel workbooks, go to File > Options > Formula  and uncheck the box for "Use Table names in formulas" (available under "Working with Formulas").  This path is for Excel 2010.
2. In cell D6 of sheet1 of Book2.xlsx, rewrite the VLOOKUP() function.  It will now look like this

=VLOOKUP(B6,[book1.xlsx]Sheet1!$B$4:$C$12,2,0)

Close Book1.xlsx.  When you press F2 and Enter on C6 and D6 of sheet1 of Book2.xlsx, C6 will return the #REF! error while cell D6 would show the result just fine.

Therefore, when working with Excel tables across workbooks, ensure that the setting under Excel Options is deactivated.

Return an exact value via the LOOKUP() function

{8 Comments}

Assume a two column database in range A3:B6.  Headings are in A2:B2.  In range A3:A6 are numbers, say 60,30,20 and 10.  In range B3:B6 are some names, say Ashish, Sameer, Gautam and Sanjay.

In cell A7, enter any name, say Sameer.  The task is to extract the corresponding number from the range (A3:B6) into cell B7.  Needless to say that VLOOKUP() will not work here because the lookup_value i.e. Sameer is not in the first column of the table_array i.e. in range A3:A6.  To get the answer, one may write the following INDEX() and MATCH() combination of functions in cell B7

=INDEX($A$3:$A$6,MATCH($A7,$B$3:$B$6,0),1)

So with Sameer in cell A7, the result in cell B7 will be 30.  One may further wrap this arround the IFERROR() function to trap errors.

One may also try the LOOKUP function in cell C7:

=LOOKUP(A7,$B$3:$B$6,$A$3:$A$6)

The answer would be 30 which is correct.  However, if you type Gautam in cell A7, the result will change to 60 in cell C7 (which is wrong!!!).  The answer in cell B7 will be correct i.e. 20.

The reason for the LOOKUP() function anomaly is that range B3:B6 is not sorted in ascending order.  The Help menu on the LOOKUP() function states that the lookup_vector of the LOOKUP() function should be sorted in ascending order.  Therefore, if you now select A2:B6 and sort the name column in ascending order, the result in cell C7 will become 20.

So one difference between the INDEX(), MATCH() combination of funtions and LOOKUP() function is that for the LOOKUP() function to work, the lookup_vector should be sorted in ascending order.  There is no requirement for sorting when using the INDEX() and MATCH() combination of functions.

Now type Samir (not Sameer) in cell A8.  When you copy B7:C7 to cell B8, the result in cell B8 will be #N/A and in C8 will be 30.  The result should ideally be #N/A because there is no Samir in range B3:B8.  Surprisingly, even though the name is misspelt, the LOOKUP returns a result (not an error).  This is happening because if the LOOKUP() function does not find an exact match, it finds for the largest value less than the lookup_value (provided the lookup_vector is sorted in ascending order).  The largest value less than Samir in range B3:B6 is Sameer.  Likewise, if you enter Ashwin in cell A9 and copy B8:C8 to cell B9, the result will be #N/A and 60.

The names entered in range A8:A9 would suggest that the LOOKUP() function can take care of spelling mistakes in the lookup_value.  Well this is not true because if you enter Asheesh as the lookup_value in cell A10, the result will be #N/A in B10:C10.  This is because since it cannot find Asheesh in range B3:B6, it searches for values less than Asheesh which do not Exist - Ashish > Asheesh.

From the discussion above, one may therefore conclude that the LOOKUP() function is not the appropriate function to use for exact matches especially when dealing with text lookups.

However, with a small tweak in the LOOKUP() function, one can coerce an exact search:

=LOOKUP(2,1/($B$3:$B$6=$A7),$A$3:$A$6)

If you try this formula in cell D7 and copy down, the result will be the same as the INDEX() and MATCH combination of functions which you wrote in B7:B10.

Extract data from multiple cells of closed Excel files

{219 Comments}

Assume a folder named Test on the desktop.  In this Test folder, there is another folder named Survey reports.  In the Survey report folder, there are multiple Excel files.  All the Excel files have the same structure i.e. if there is an Operator Names in cell C9 of one Excel file, then in other Excel files as well, there will be the operator name in cell C9 itself.

In another workbook (say Book1.xlsx), one may want to extract data from specific cells of all Excel files lying in the Survey report folder.  Therefore, in sheet1 of Book1.xlsx, one may want to show in A3:C3 data from cells C4, R9 and cell C16 of the first Excel file from the Survey report folder.  Likewise, in sheet1 of Book1.xlsx, one may want to show in A4:C4 data from cell C4, R9 and cell C16 of the second Excel file from the Survey report folder and so on.

While one way is to open each file manually and then link individual cells, this is a very time-consuming process if there are many files in the Survey reports folder.

You may follow the process mentioned below to extract data from specific cells of multiple closed Excel files.

1. All files in the Survey reports folder should be closed
2. In sheet1 of Book1.xlsx, type C4, R9 and C16 in A1:C1
3. From cell D3 downwards, type names of all Excel files (along with their extension i.e. xls, xlsx) in the Survey report folder.  To generate this list of Excel files names automatically, refer to the following post.
4. Download and install the Morefunc addin from here.  This addin will allow us to use the INDIRECT.EXT function.  For MS Excel 2007 and prior versions, one can simply download, install and use this addin.  In Excel 2007, once this addin is installed, it will appear under Formulas > Morefunc.  For Excel 2010, the process for installing this addin is as follows:

a. Unzip the downloaded folder and double-click on the Setup file
b. Navigate to the following folder C:\Program Files\Morefunc and copy three files - Morefunc, Morefunc11 and Morefunc12
c. Navigate to the following folder C:\Program Files\Microsoft Office\Office14\Library and paste the files there
d. Open MS Excel 2010
e. Go to Files > Options > Addins > Manage Excel Add-ins > Go
f. Check the following three boxes - Morefunc (add-in functions), Morefunc Tools and Morefunc12
g. Click on OK
h. Close MS Excel 2010 and reopen
i. Morefunc should now appear under the Formulas Tab in the Ribbon

5. In cell A3 of sheet1 of Book1.xlsx, enter the following formula and copy till C3 and downwards

=INDIRECT.EXT("'C:\Users\Ashish\Desktop\test\Survey Reports\["&$D3&"]Sheet1'!"&A$1)

You will observe that even with all Excel files in the Survey reports folder closed, the INDIRECT.EXT function will display the data from respective cells of those Excel files.

Since data is being extracted from closed Excel files via an addin, performance of the workbook will take a hit.  Please try this on limited Excel files first.