Category: DATA EXTRACTION

Search for multiple phrases within a cell and extract all those phrases in another column

{ 0 Comments }

Assume a simple two column dataset as shown below.  In column A are some text strings.  In each cell are multiple phrases separated by commas.  In column C are some phrases – one in each cell.  The objective is to search for all phrases in each cell and extract the phrases found in another column […]

Read More →

Remove special characters from a string

{ 4 Comments }

Hi, Assume a column of names as follows: Name Mohammed Zia-Ul Haque Steven Thomas – ,-Rohit Sunil Ahir-Chowdhary.- Anuj ———– Sameer — ..,Mohit — Rajeev Nair. Monalisa . Das Vijeta … –,.Anjana. M.U..,- Please observe that there are special characters before the name, within the name and after the name.  The task is to remove […]

Read More →

Extract City, State and Pin code from an address string

{ 66 Comments }

For a lot of Excel users, a common problem is to extract City, State and Pin Code from an address string.  What compounds the issue is that there is no standardisation in an address string.  For e.g., one may end the address with a Pin code, while others may end it with a State and Country.  […]

Read More →

Remove abbreviations appearing before a name

{ 2 Comments }

Assume there is a list of names appearing in range A2:A9.  Before these names, there are abbreviations such as Dr., Mr. Mrs. etc.  To worsen the situation, there may/may not be a space after the abbreviation.  A sample dataset may look like this: Mr.Ramakrishna Mr Ramakrishna Mr. Ramakrishna MrRamakrishna PhD.Saurav Dr (Mrs.) Indu Sharma Dr. (Mr) […]

Read More →

Perform a lookup with inexact text strings and/or spelling mistakes

{ 31 Comments }

Any lookup function i.e., VLOOKUP(), LOOKUP(), INDEX()/MATCH(), HLOOKUP() requires the text lookup variable (the first input of the VLOOKUP(), HLOOKUP() etc. ) to be spelled exactly the same as the entries in the lookup column.  If the spelling does not match, the result is #NA.  To some extent, inexact searches can be performed by concatenating wildcard […]

Read More →

Creating Exception Reports

{ 23 Comments }

One worksheet shows the date wise courses (External Course, Internal Course and Elearning Course) taken by staff members.  The data also shows the Line Managers that they report to.  Another worksheet lists down all staff members and the Line Managers that they report to. From data on the first worksheet, one can easily determine (via a Pivot Table) […]

Read More →

Speeding up a lookup task on a large database

{ 0 Comments }

Performing calculations or writing formulas on large databases pose the following major problems: 1. Time taken to process is unduly long; and 2. File size increases disproportionately A typical problem is fetching data from another worksheet or workbook via the VLOOKUP() function.  Quite often, when one confirms the VLOOKUP() formula in a single cell with the Enter […]

Read More →

Generate a list of missing invoice numbers

{ 2 Comments }

In a range of cells, assume there is a list of invoices raised during a certain period.  Since adequate care was not taken to number the invoices sequentially, there are missing invoice numbers in the Excel range.  So, it is possible that after raising invoice number 501, 502 and 503 the next invoice raised was 507. Given […]

Read More →

LOOKUP unique data from multiple columns where search string appears multiple times

{ 2 Comments }

In a list with multiple occurrences, the VLOOKUP() function in Excel will only return the first occurrence of the string being searched.  Furthermore,  the VLOOKUP() function returns data from one column only. To work with a list where there are multiple occurrences of a particular string and data has to be extracted from multiple columns, […]

Read More →

Filtering a database by both rows and columns

{ 34 Comments }

In range A1:BG2185 of this workbook, there is macro economic and demographic data for years from 1984 to 2009 (26 years) for 84 countries .  Furthermore, there are 57 parameters being tracked for each of the 26 years for all 84 countries. There are missing observations for some years across many parameters and many countries.  […]

Read More →