# Tags: VLOOKUP

Assume a three column dataset with Patient ID, Smoking Status and Review Date

 PatientID SmokingStatus ReviewDate P1 10-03-2018 P1 9 09-03-2018 P1 1 08-03-2018 P1 4 07-03-2018 P2 9 10-03-2018 P2 9 09-03-2018 P2 9 08-03-2018 P2 9 07-03-2018 P3 2 10-03-2018 P3 09-03-2018 P3 9 08-03-2018 P4 9 10-03-2018 P4 1 09-03-2018 P4 4 08-03-2018

The objective is the create another 3 column dataset with the following conditions:

1. If the patient's latest smoking status is other than Blank or 9, then consider that as the smoking status of the patient; and
2. If the patient's latest smoking status is blank or 9, then consider the previous smoking status that is not blank or 9; and
3. If the patient's smoking status is blank or 9 on all dates, then consider the smoking status as 9

The expected result is:

 PatientID Last date when the smoking status was other than 9 or Blank Smoking status on that date P1 08-Mar-18 1 P2 10-Mar-18 9 P3 10-Mar-18 2 P4 09-Mar-18 1

I have solved this question using 3 methods - PowerPivot, Advanced Filters and formulas.  You may download my solution workbook from here.

Here is a sample dataset of learners who have cleared different stages of multiple courses on offer within an Organisation:

 Learner Stage completed Course Bill Stage 1 Public Speaking Bill Stage 2 Public Speaking Bill Stage 3 Public Speaking Susan Stage 1 Effective Communication Bob Stage 1 Public Speaking Bob Stage 2 Public Speaking Sheila Stage 1 Effective Communication Sheila Stage 2 Effective Communication Sheila Stage 3 Effective Communication Frank Stage 1 Effective Communication Frank Stage 2 Effective Communication Henry Stage 1 Public Speaking Henry Stage 2 Public Speaking Bill Stage 1 Effective Communication Bill Stage 2 Effective Communication

From this sample dataset, one may want to know how many participants have completed each stage of these multiple courses.  The expected result is shown below:

 Row Labels Stage 1 Stage 2 Stage 3 Effective Communication 1 2 1 Public Speaking 2 1 Grand Total 1 3 2

In this workbook, I have shared 2 solutions - one using formulas and the other using the Power Query & PowerPivot.

Refer to a simple 5 column representative inventory dataset of a Glass manufacturer:

 Model Length (MM) Wide (MM) Thk (MM) CAT HX9-G-ARD 1071 273 3.5 A MYP-G-3RD 580 535 3.2 B EPO-G-3RD 580 535 3.2 A MYG-G-3R 966 350 3.2 A MYN-G-3RD 649 530 3.2 A GM SPIN-G-3FD 882 395 3.2 A MY8-G-AR 880 400 3.5 B GM2-G-AR 880 400 3.5 A

From this inventory data, one has to furnish customer orders based on specific dimensions demanded by them.  A typical Customer request would be to supply glass sheets as per the following dimensions

 Length (MM) Wide (MM) Thk (MM) CAT 780 542 3.5 A

The firm may or may not have glass sheets of this specific size.  The objective is to identify glass sheets, from the inventory on hand, which match customer specifications.  If there is no exact match, then one must be able to obtain all inventory items which have the same Thk (MM) and CAT as the customer specified dimensions but the Length and thickness should be more than equal to the customer specified dimensions.  The length and width can then be trimmed to match the exact customer dimensions.  Furthermore, the result returned should:

1. List only the Top 30 glass sheets available in inventory; and
2. List those Top 30 glass sheets in ascending order of wastage (wastage caused when the glass sheet is trimmed to match the customer specified dimensions)

You may refer to my solution in this workbook.  I have shared two solutions - one using Excel formulas and the other using Power Query a.k.a. Get and Transform in Excel 2016.  Please read the Comments in cells F1, J9 and J16 of the "Solutions" worksheet.  The difference between the 2 solutions is:

1. Formula driven solution - This is in range J10:AM14 of the Solutions worksheet.  This is a semi dynamic solution (as compared to the Power Query solution).  To get the models in ascending order of wastage, one will have to create an Area column in the base data and sort that column in ascending order.
2. Power Query solution - This is in range J17:AM21 of the Solutions worksheet.  This is a dynamic solution.  Just change the customer specified dimensions in range G2:J2 of the Data and Query worksheet.  Thereafter just right click on any cell in the range below and select refresh.

Performing calculations or writing formulas on large databases pose the following major problems:

1. Time taken to process is unduly long; and
2. File size increases disproportionately

A typical problem is fetching data from another worksheet or workbook via the VLOOKUP() function.  Quite often, when one confirms the VLOOKUP() formula in a single cell with the Enter key, then at the bottom right of your MS Excel screen (the task bar), a white progress bar appears which reads "Calculating (4 processors) 1%".  Once has to wait for the processing to complete before copying the formula down.  The process of copying is obviously just as time consuming and resource intensive.

In these situations, a workaround is the JOIN the tables in MS Access and then get the data back in MS Excel.  For accomplishing this, one must have at least some working knowledge of "SQL Query" writing.

If you are using the Power Query add-in for MS Excel 2010 and higher versions, then one can fetch data from another worksheet or workbook without writing a single formula.  Furthermore, there is absolutely no lag in generating the result.

You may refer to my Power Query solution in this workbook (This is a 20 MB file)

You may watch a short video of my solution here

Assume that there are three separate tables showing the following information:

1. Date of visit data for visitors to a certain recreation facility.  The same visitor may visit the facility multiple times
2. Fee per visit in different cites
3. Region in which each city lies

The question is to analyse the three tables above via a Pivot Table to generate the following:

1. Region wise and visitor wise:
a.  Fee per visit
b.  Frequency of visit
c.  Revenue
2. Revenue collected by month wise and by visitor

In this workbook, I have shared two solutions:

1. PowerPivot solution - This solution answers both questions above without using any calculated/ancillary columns in the base data.  By establishing simple relationships in the PowerPivot window and by writing two calculated Field formulas, both questions above have been answered.  To use this PowerPivot solution, you need to be using the PowerPivot add-in for MS Excel.  This add-in is only available for Excel 2010 and higher versions.

2. Pivot Table solution - This solution answers both questions above by using calculated/ancillary columns in the base data.

Here's another example.  Assume a four column table showing Date of session, Client, Location and Participant Name.  Assume another four column table showing Client, Date of session, Date of invoice and Amount Billed.  The task is to determine the Amount billed per location.  You may refer to my PowerPivot and Pivot Table solution in this workbook.

Assume a two column database of patient ID's and service availed.  One patient may avail the same service multiple times in a year due to which that record may appear as many times as the service is availed.  For e.g., if patient A001 avails the Radiology service twice, then A001 and Radiology will appear in two rows.

Once may want to create the following two reports from this database:

1. A list containing all those records where the patient availed just one service; and
2. A list containing all those records where the patient availed more than one service

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the PowerPivot add-in, then a calculated column formula can resolve this problem.

Solution for all versions of MS Excel

I have shared two solutions here:

1. Array formula and advanced filters; and
2. Only array formulas

For better understanding of the question and to view the final solution, please refer to this workbook.

The VLOOKUP() function returns data from a lookup_array irrespective of the filter setting of the lookup_array.  To make the lookup_array of the VLOOKUP() function work only on the visible cells of a filtered range, refer to this workbook.

One may want to create interactive graphs by allowing users to select indicators and years to be plotted (via check boxes and scroll bars).  I have presented three cases in this workbook.

Validate a range of cells based on multiple conditions - the exact question can be read from here.  You may refer to my solution in the this file.

The VLOOKUP() function accepts a definite lookup array (second input in the VLOOKUP()) specified by the end-user.  In a situation where the lookup array is unknown, a function may have to cycle through various lookup ranges to cull out the required value.

You may refer to my solution in this workbook.