Category: FILTERS

After filtering a dataset, allow the user to display only specific columns in the result

{0 Comments}

Here is a simple 5 column dataset showing basic employee information.

The objective is to show only those rows of data in which the Score (column D) is greater then 3.  While one can solve this with a simple filter, the solution will not be dynamic.  To get a dynamic solution, one may use the FILTER() dynamic array function that comes with the Microsoft 365 subscription service.

In cell G2, one may simply write this formula

=FILTER(A2:E9,D2:D9>3)

This is a far better solution because it is formula driven and thereby dynamic.  So all good till here.  Now let's make it a little interesting.

The revised objective is to show only those rows of data in which the Score (column D) is greater then 3 and only display 3 columns - Name, Department and Score (columns 1,3 and 4) in the end result.  This can be solved using Data > Advanced filter but the result will be static.  To get a dynamic solution, one may use a nested FILTER() function in cell G2

=FILTER(FILTER(A2:E9,D2:D9>3),{1,0,1,1,0})

This formula will return the same number of rows (3 rows) as the previous FILTER() function returned with only 3 columns - Name, Department and Score.  The 1's and 0's in the formula denote whether one would like to see the particular column in the end result or not.  So once again, all good till here.

Now, let's take it a notch higher.  What is there were 35 columns in the original dataset and one wanted to see data appearing in columns 1,6,22,25,29 and 34.  It will be quite time consuming to enter the 1's and 0's in order within the FILTER() function.  So the question here is how does one save time and effort?  Ideally one should be able to just enter the column headings one wants to see in the end result.

As you can observe in the image above, one has to simply specify the columns to extract in column C and the result populates from column K rightwards and downwards.  Using dynamic array formulas and the FILTER() function, one saves effort in entering 1's and 0's in the FILTER() function (the FILTER() function has been written in cell K2 - download link of the file is shared below).  If one types Address in cell I5, then Address would automatically appear in cell N1 and so will the entries in range N2:N4.  So this does seem like a good solution.  So while it is good, it is not a perfect solution.  In column I, if one changes the order of the headings i.e. one enters Name, Dept, Address and Score (rather than Name, Dept, Score and Address), the result under columns M and N would be incorrect (see image below).

Solving this challenge led me to using Data > Get & Transform.  I made use of the latest feature introduced in Power Query called "Data Types" (I received this feature update on December 4, 2020).

As one can see in the image above, one simply has to select any heading one wants and that appears in the next available column.

You may download my solution workbook from here in which i have shown both formula based and the Power Query solution.

Determine the top selling location for each product

{0 Comments}

Visualise a 3 column dataset as shown below - Location, Product and Sales.  Each location can have multiple products (Product A has Banana, Apple and Carrot) and each product can be sold in multiple locations (Banana is sold in locations A, B and F).

The objective is to determine the location with highest sales for each product.  So for Banana, maximum sale value is 25 and location of maximum sales value is B.  Likewise for Orange, maximum sales value is 49 and location of maximum sales value is A.  The expected result is:

I have 4 solutions to this problem:

  1. Advanced Filters - This is a static solution.  For any changes in the source data range, one will have to re-enter the 3 inputs in the Advanced Filter window
  2. Formulas - This is a semi-dynamic solution.  To make it fully dynamic, one will have to write an array formula to first extract all unique product names in a column.  The array formula to extract product names in a column can be obtained from here.
  3. Power Query - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All
  4. PowerPivot - This is a dynamic solution.  For any changes in the source data sheet, one just has to go to Data > Refresh All

You may download my solution workbook from here.

Determine the most recent status after satisfying certain conditions

{0 Comments}

Assume a three column dataset with Patient ID, Smoking Status and Review Date

PatientID SmokingStatus ReviewDate
P1 10-03-2018
P1 9 09-03-2018
P1 1 08-03-2018
P1 4 07-03-2018
P2 9 10-03-2018
P2 9 09-03-2018
P2 9 08-03-2018
P2 9 07-03-2018
P3 2 10-03-2018
P3 09-03-2018
P3 9 08-03-2018
P4 9 10-03-2018
P4 1 09-03-2018
P4 4 08-03-2018

The objective is the create another 3 column dataset with the following conditions:

  1. If the patient's latest smoking status is other than Blank or 9, then consider that as the smoking status of the patient; and
  2. If the patient's latest smoking status is blank or 9, then consider the previous smoking status that is not blank or 9; and
  3. If the patient's smoking status is blank or 9 on all dates, then consider the smoking status as 9

The expected result is:

PatientID Last date when the smoking status was other than 9 or Blank Smoking status on that date
P1 08-Mar-18 1
P2 10-Mar-18 9
P3 10-Mar-18 2
P4 09-Mar-18 1

I have solved this question using 3 methods - PowerPivot, Advanced Filters and formulas.  You may download my solution workbook from here.

Filtering a database by both rows and columns

{34 Comments}

In range A1:BG2185 of this workbook, there is macro economic and demographic data for years from 1984 to 2009 (26 years) for 84 countries .  Furthermore, there are 57 parameters being tracked for each of the 26 years for all 84 countries.

There are missing observations for some years across many parameters and many countries.  If any country has more than 11 missing observations in any one parameter, then they should be depicted in the sheet named "Missing Observations".  The "Missing Observations" sheet should show three columns:

1. Country Name
2. Parameter with more than 11 missing observations
3. Observations missing

Filter on a column of Date and time values

{4 Comments}

Assume a three column database - Site ID, Reason & Date and time.  The data is available in range A2:C6 of sheet1 of this workbook.  The objective is to view all those records where Mains failed between 23:00 and 03:00 (irrespective of the date).

Applying a Custom Filter > Between on the Data and time column (column C) with a value of 23:00 in the "Greater than or equal to" and 03:00 in the "Less than or equal to" would be incorrect because dates are also present in the same column.  The other option would be to use Data > Text to Columns to separate Date and Time into two columns.  If one then applies the same custom filtering criteria on the Time column, nothing would show up - this is because MS Excel will understand 23:00 as greater then 03:00.  If one reverses the custom filter criteria i.e. "Greater than or equal to" as 03:00 and "Less than or equal to" 23:00, the incorrect results would show up.

The solution is to solve this via Advanced Filters.  The steps for solving are mentioned in the file.

Compare value of one cell with value of next visible cell of a filtered range

{0 Comments}

Assume data in range B4:B18 (heading is in cell B3).  To compare the value in the current row with the value in the row below, one may enter the following formula in cell C4 and copy down

=B4=B5

True's indicate matching values with the value in the next row.  However, the above formula would fail when an auto filter is applied to the range.

To compare the value in the current row with the value in the next visible row below, refer to my solution in this workbook.

Dynamically filter data from one worksheet to another

{71 Comments}

Assume there is data in range A2:BG2185 of a worksheet (Range A1:O1 have headings).  In column A are years and in column B are Countries.  Other columns have some text/numeric data.  In column A are years ranging from 1984 to 2009 for each country mentioned in column B.  Therefore, Albania would appear in range B2:B27 and 1984-2009 would appear in range A2:A27.  Next, Algeria would appear in range B28:C53 and 1984-2010 would appear in range A28:A53 and so on.  For 84 countries, the number of rows occupied will be 2,184.  You may download the workbook from here.

The objective is to show in another worksheet of the same workbook, all data from year 1990 onwards (year 1990 included) for each country.  Furthermore, if data gets added (by rows) or edited in this base data sheet, the result sheet should update.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas at all) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).

You may refer to my solution at this link

Solution for all versions of MS Excel

The following process should get the desired result

1. Select range A1:O82 and press Ctrl+F3 > New > Name.  In the Name box, enter Dummy and click on OK.
2. Select range A1:O82 and press Ctrl+T.  Ensure that the "My data has headers" box is checked and click on OK
3. Save the file (assume on the Desktop for now)
4. Open a blank worksheet and go to Data > From Other Sources > From Microsoft Query
5. Select Excel files and click on OK
6. In the right had side panel, navigate to the folder where you saved the file in step 3 (Desktop in this case).  In the left hand side panel, select the specific Excel file which you saved in step 3 and click on Next
7. With Dummy appearing in the left hand side panel, click on the > symbol to transfer all columns of Dummy to the right hand side panel and click on Next
8. In the column to Filter box, select year and in the right hand side Year box, select Greater than Equal to and then select 1990
9. Click on OK Twice
10. Select "Return Data to Microsoft Excel" and click on Finish
11. In the Import Data box, click on OK

The data on this sheet will now show data for all countries from year 1990 onwards (year 1990 included) and this will be linked to the source data sheet.  If you add data by rows to the source data sheet or edit data in existing rows, just right click on any cell in the result sheet and select Refresh.

Shade alternate band of rows in a filtered range

{2 Comments}

Assume the following numbers in range C5:C22.  Heading is in cell C4.

Range C5:C7 - 11
Range C8:C11 - 14
Range C12:C13 - 23
Range C14-C19 - 56
Range C20-C22 - 78

One may want to colour cells for every change in number in range C5:C22 with the following two modifications:

1. Shading should be for alternate band of rows i.e. C5:C7 should be coloured, C8:C11 should not be coloured, C12:C13 should be coloured, C14:C19 should not be coloured and C20:C22 should be coloured.

2. Shading should vary with the range being filtered i.e. If range C5:C22 is filtered to exclude 14 and 56, then numbers 11 and 78 should be coloured i.e. range C5:C7 and C20:C22 should be coloured.

To accomplish this, try the following:

a. Select cell C5 and go to Home > Format > Conditional formatting > New Rule > Use a formula to determine which cells to format

b. In the formula box there, enter the following formula

=AND(ISODD(COUNT(1/FREQUENCY(IF(SUBTOTAL(3,OFFSET(C$5,ROW(C$5:C5)-ROW(C$5),)),MATCH(C$5:C5,C$5:C5,0)),ROW(C$5:C5)-ROW(C$5)))),COUNT(1/FREQUENCY(IF(SUBTOTAL(3,OFFSET(C$5,ROW($C$5:$C$22)-ROW(C$5),)),MATCH($C$5:$C$22,$C$5:$C$22,0)),ROW($C$5:$C$22)-ROW(C$5)))>1)

c. Click on Format and select any Fill colour

d. Click on OK > OK > Apply

e. Copy cell C5, select range C6:C22 and right-click > Paste Special > Formats

With no filter in place, colouring will happen as mentioned in point 1 above.  Auto filter the range now on some values and the colour banding should change.

Extract farthest/latest date based on multiple conditions

{0 Comments}

Assume a three column database showing Site ID, Customer, Status and Requested Date.  On the same site ID, the same customer may have different status on different dates.   In such a scenario, one may want to know the farthest/latest requested date and its corresponding status for all unique combinations of Site ID and Customer.

I initially attempted to solve this problem by using a pivot table but the pivot output was incorrect.  The pivot was returning the farthest/latest date for all status' of a particular Site ID and Customer.  Ideally, it should show only the farthest/latest date and its corresponding status for a particular Site ID and Customer.  Therefore, for a particular Site ID and Customer combination, only one row should show up in the final output.  Therefore, the Pivot Table solution did not work.

You may download this workbook for a better description of the problem and my workaround.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas at all) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table)

Solution for all versions of MS Excel

You may refer my Advanced Filter (along with formula) solution in the workbook.