Category: FILTERS

Determine the most recent status after satisfying certain conditions

{0 Comments}

Assume a three column dataset with Patient ID, Smoking Status and Review Date

PatientID SmokingStatus ReviewDate
P1 10-03-2018
P1 9 09-03-2018
P1 1 08-03-2018
P1 4 07-03-2018
P2 9 10-03-2018
P2 9 09-03-2018
P2 9 08-03-2018
P2 9 07-03-2018
P3 2 10-03-2018
P3 09-03-2018
P3 9 08-03-2018
P4 9 10-03-2018
P4 1 09-03-2018
P4 4 08-03-2018

The objective is the create another 3 column dataset with the following conditions:

  1. If the patient's latest smoking status is other than Blank or 9, then consider that as the smoking status of the patient; and
  2. If the patient's latest smoking status is blank or 9, then consider the previous smoking status that is not blank or 9; and
  3. If the patient's smoking status is blank or 9 on all dates, then consider the smoking status as 9

The expected result is:

PatientID Last date when the smoking status was other than 9 or Blank Smoking status on that date
P1 08-Mar-18 1
P2 10-Mar-18 9
P3 10-Mar-18 2
P4 09-Mar-18 1

I have solved this question using 3 methods - PowerPivot, Advanced Filters and formulas.  You may download my solution workbook from here.

Filtering a database by both rows and columns

{34 Comments}

In range A1:BG2185 of this workbook, there is macro economic and demographic data for years from 1984 to 2009 (26 years) for 84 countries .  Furthermore, there are 57 parameters being tracked for each of the 26 years for all 84 countries.

There are missing observations for some years across many parameters and many countries.  If any country has more than 11 missing observations in any one parameter, then they should be depicted in the sheet named "Missing Observations".  The "Missing Observations" sheet should show three columns:

1. Country Name
2. Parameter with more than 11 missing observations
3. Observations missing

Filter on a column of Date and time values

{4 Comments}

Assume a three column database - Site ID, Reason & Date and time.  The data is available in range A2:C6 of sheet1 of this workbook.  The objective is to view all those records where Mains failed between 23:00 and 03:00 (irrespective of the date).

Applying a Custom Filter > Between on the Data and time column (column C) with a value of 23:00 in the "Greater than or equal to" and 03:00 in the "Less than or equal to" would be incorrect because dates are also present in the same column.  The other option would be to use Data > Text to Columns to separate Date and Time into two columns.  If one then applies the same custom filtering criteria on the Time column, nothing would show up - this is because MS Excel will understand 23:00 as greater then 03:00.  If one reverses the custom filter criteria i.e. "Greater than or equal to" as 03:00 and "Less than or equal to" 23:00, the incorrect results would show up.

The solution is to solve this via Advanced Filters.  The steps for solving are mentioned in the file.

Compare value of one cell with value of next visible cell of a filtered range

{0 Comments}

Assume data in range B4:B18 (heading is in cell B3).  To compare the value in the current row with the value in the row below, one may enter the following formula in cell C4 and copy down

=B4=B5

True's indicate matching values with the value in the next row.  However, the above formula would fail when an auto filter is applied to the range.

To compare the value in the current row with the value in the next visible row below, refer to my solution in this workbook.

Dynamically filter data from one worksheet to another

{67 Comments}

Assume there is data in range A2:BG2185 of a worksheet (Range A1:O1 have headings).  In column A are years and in column B are Countries.  Other columns have some text/numeric data.  In column A are years ranging from 1984 to 2009 for each country mentioned in column B.  Therefore, Albania would appear in range B2:B27 and 1984-2009 would appear in range A2:A27.  Next, Algeria would appear in range B28:C53 and 1984-2010 would appear in range A28:A53 and so on.  For 84 countries, the number of rows occupied will be 2,184.  You may download the workbook from here.

The objective is to show in another worksheet of the same workbook, all data from year 1990 onwards (year 1990 included) for each country.  Furthermore, if data gets added (by rows) or edited in this base data sheet, the result sheet should update.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas at all) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).

You may refer to my solution at this link

Solution for all versions of MS Excel

The following process should get the desired result

1. Select range A1:O82 and press Ctrl+F3 > New > Name.  In the Name box, enter Dummy and click on OK.
2. Select range A1:O82 and press Ctrl+T.  Ensure that the "My data has headers" box is checked and click on OK
3. Save the file (assume on the Desktop for now)
4. Open a blank worksheet and go to Data > From Other Sources > From Microsoft Query
5. Select Excel files and click on OK
6. In the right had side panel, navigate to the folder where you saved the file in step 3 (Desktop in this case).  In the left hand side panel, select the specific Excel file which you saved in step 3 and click on Next
7. With Dummy appearing in the left hand side panel, click on the > symbol to transfer all columns of Dummy to the right hand side panel and click on Next
8. In the column to Filter box, select year and in the right hand side Year box, select Greater than Equal to and then select 1990
9. Click on OK Twice
10. Select "Return Data to Microsoft Excel" and click on Finish
11. In the Import Data box, click on OK

The data on this sheet will now show data for all countries from year 1990 onwards (year 1990 included) and this will be linked to the source data sheet.  If you add data by rows to the source data sheet or edit data in existing rows, just right click on any cell in the result sheet and select Refresh.

Shade alternate band of rows in a filtered range

{0 Comments}

Assume the following numbers in range C5:C22.  Heading is in cell C4.

Range C5:C7 - 11
Range C8:C11 - 14
Range C12:C13 - 23
Range C14-C19 - 56
Range C20-C22 - 78

One may want to colour cells for every change in number in range C5:C22 with the following two modifications:

1. Shading should be for alternate band of rows i.e. C5:C7 should be coloured, C8:C11 should not be coloured, C12:C13 should be coloured, C14:C19 should not be coloured and C20:C22 should be coloured.

2. Shading should vary with the range being filtered i.e. If range C5:C22 is filtered to exclude 14 and 56, then numbers 11 and 78 should be coloured i.e. range C5:C7 and C20:C22 should be coloured.

To accomplish this, try the following:

a. Select cell C5 and go to Home > Format > Conditional formatting > New Rule > Use a formula to determine which cells to format

b. In the formula box there, enter the following formula

=AND(ISODD(COUNT(1/FREQUENCY(IF(SUBTOTAL(3,OFFSET(C$5,ROW(C$5:C5)-ROW(C$5),)),MATCH(C$5:C5,C$5:C5,0)),ROW(C$5:C5)-ROW(C$5)))),COUNT(1/FREQUENCY(IF(SUBTOTAL(3,OFFSET(C$5,ROW($C$5:$C$22)-ROW(C$5),)),MATCH($C$5:$C$22,$C$5:$C$22,0)),ROW($C$5:$C$22)-ROW(C$5)))>1)

c. Click on Format and select any Fill colour

d. Click on OK > OK > Apply

e. Copy cell C5, select range C6:C22 and right-click > Paste Special > Formats

With no filter in place, colouring will happen as mentioned in point 1 above.  Auto filter the range now on some values and the colour banding should change.

Extract farthest/latest date based on multiple conditions

{0 Comments}

Assume a three column database showing Site ID, Customer, Status and Requested Date.  On the same site ID, the same customer may have different status on different dates.   In such a scenario, one may want to know the farthest/latest requested date and its corresponding status for all unique combinations of Site ID and Customer.

I initially attempted to solve this problem by using a pivot table but the pivot output was incorrect.  The pivot was returning the farthest/latest date for all status' of a particular Site ID and Customer.  Ideally, it should show only the farthest/latest date and its corresponding status for a particular Site ID and Customer.  Therefore, for a particular Site ID and Customer combination, only one row should show up in the final output.  Therefore, the Pivot Table solution did not work.

You may download this workbook for a better description of the problem and my workaround.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas at all) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table)

Solution for all versions of MS Excel

You may refer my Advanced Filter (along with formula) solution in the workbook.

Remove duplicates after satisfying additional conditions

{3 Comments}

Assume a two column database of patient ID's and service availed.  One patient may avail the same service multiple times in a year due to which that record may appear as many times as the service is availed.  For e.g., if patient A001 avails the Radiology service twice, then A001 and Radiology will appear in two rows.

Once may want to create the following two reports from this database:

1. A list containing all those records where the patient availed just one service; and
2. A list containing all those records where the patient availed more than one service

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the PowerPivot add-in, then a calculated column formula can resolve this problem.

Solution for all versions of MS Excel

I have shared two solutions here:

1. Array formula and advanced filters; and
2. Only array formulas

For better understanding of the question and to view the final solution, please refer to this workbook.

Sum visible cells of a filtered range ignoring errors

{4 Comments}

In a multi column database, assume a filter has been applied on some columns.  Post filtering, some error values (such as #DIV/0!, #NA etc.) appear in the one of the numeric columns.   The objective is to sum numbers in a numeric column.  The usual SUM() function would not work because the range of sum function should be error free.  Furthmore, the SUM() function would also include the invisible rows of a filtered range.  The SUBTOTAL(9,range) function would not work either because the range in the SUBOTAL() function should also be error free.

Assume that the database is in range A11:B20.  Column A has been filtered and column B is the column of numbers which also has the error values.

For Excel 2007 and prior versions, the following array formula (confirmed by Ctrl+Shift+Enter) will sum the visible rows a filtered range ignoring error values.

=SUM(IF(NOT(ISERROR(SUBTOTAL(109,OFFSET(B10,ROW(B11:B20)-ROW(B10),)))),SUBTOTAL(109,OFFSET(B10,ROW(B11:B20)-ROW(B10),))))

For Excel 2010 and higher versions, the following formula will work

=AGGREGATE(9,7,B11:B20)