Tags: FILTER

Filtering on 2 date fields within one Table

{0 Comments}

This table contains a list of all the inspections created and completed within different time periods.

untitled

The objective is to create two Tables from this single table - one showing the Accounts created within the chosen time period and another showing the those that were closed within the same time period.  Here are screenshots of the expected results.

untitled1 untitled2 untitled3 untitled4

You may download my PowerBI desktop solution workbook from here.  The same solution can be obtained in Excel as well (using Power Query and PowerPivot).

Compute transaction fee based on a tiered pricing model

{0 Comments}

Consider a simple dataset as shown below:
untitled
For each tier, the tier rate is incrementally applied to the volume within the tier volume range.  Given the following transaction volumes, one may want to compute the transaction fee
untitled1
The expected result is shown below
untitled2

As one can observe, for a transaction value of 400,000, the fee has been computed as 3% on the first 100,000 and 2.5% on the next 300,000. You may download my solution workbook from here. In the file, I have shared 2 solutions - a conventional formula based one and a PowerPivot solution.

I have also solved a similar question here.

Perform an aggregation on Top x items after satisfying certain conditions

{0 Comments}

Visualise a 5 column dataset as show below.  This is a very small sample of the actual dataset.  It shows the date on which supplies were received for each item from Vendors and whether those supplies had errors in them.  Finally those identified errors have been bucketed into relevant categories.  The Item ID# is a code assigned by the Client Organisation.  The format of the ID# is X [Last two digits of year of receipt] [Four digit Number].  For items received from a certain vendor, the last four digits of any item will always be in ascending order of the Date received.  In other words. for items received from a certain vendor, the last 4 digits of an item received on 13 January 2017 will always be greater than the last 4 digits of an item received on 11 January 2017 from the same vendor.  There will never be repetitions in the Item ID# column.

Vendor Name Item ID# Date Received Error with Item? Error Category
California X170016 1-16-2017 No
California X170014 1-13-2017 Yes Labeling Error
California X170015 1-13-2017 Yes Packaging Error
California X170008 1-9-2017 Yes Quality Issue
California X170003 1-2-2017 No
California X160645 12-26-2016 Yes Packaging Error
California X160646 12-26-2016 No
California X160644 12-25-2016 Yes Labeling Error
California X160638 12-20-2016 Yes Quality Issue
California X160633 12-15-2016 No
California X160626 12-8-2016 No
California X160625 12-7-2016 Yes Packaging Error
California X160624 12-5-2016 Yes Labeling Error
California X160618 11-23-2016 Yes Quality Issue
California X160613 11-13-2016 No
California X160606 10-30-2016 No

The objective is to compute the error rate by vendor and Error category for the 10 most recent transaction dates with that specific vendor.  So, for vendor Name "California" and  Error category as "Packing Error", this ratio should be computed as = Number of packing Errors on 10 most recent dates/10.

Here is a simple snapshot of the Data for California.  I have filtered the dataset where Vendor Name is California and then sorted the Date received column in descending order.  Please note that when i filter the dataset on California, a lot more rows are returned.  I am only showing the Top 10 rows here because that is what is important for solving this question.

Vendor Name Item ID# Date Received Error with Item? Error Category
California X170016 1-16-2017 No
California X170014 1-13-2017 Yes Labeling Error
California X170015 1-13-2017 Yes Packaging Error
California X170008 1-9-2017 Yes Quality Issue
California X170003 1-2-2017 No
California X160645 12-26-2016 Yes Packaging Error
California X160646 12-26-2016 No
California X160644 12-25-2016 Yes Labeling Error
California X160638 12-20-2016 Yes Quality Issue
California X160633 12-15-2016 No

The expected result is:

Row Labels Labeling Error Packaging Error Quality Issue Factory Error
Boise 30.00%
California 20.00% 20.00% 20.00%
Detroit 70.00% 30.00%
Ekalaka 20.00% 20.00%

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Compute the average of values against the 5 most recent dates of each Category

{0 Comments}

Here is a simple 3 column dataset showing Categories, Date and Value

Catagorie Date Value
Fish 08-12-2015 6
Crab 05-12-2015 7
Crab 04-12-2015 6
Bird 27-11-2015 4
Snow 25-11-2015 10
Cat 21-11-2015 7
Dog 12-11-2015 5
Dog 28-10-2015 5
Fish 12-10-2015 3
Bird 11-10-2015 9
Dog 22-09-2015 9
Crab 17-08-2015 8
Cat 11-08-2015 1
Fish 20-07-2015 5
Crab 03-07-2015 2
Fish 02-06-2015 8
Fish 26-05-2015 9
Dog 14-05-2015 4
Snow 07-05-2015 7
Bird 03-05-2015 9
Cat 20-04-2015 10
Cat 15-04-2015 2
Snow 13-04-2015 3
Crab 29-03-2015 5
Crab 23-03-2015 2
Bird 14-03-2015 5
Cat 14-03-2015 1
Dog 26-02-2015 9
Fish 16-02-2015 4
Fish 08-02-2015 6
Bird 18-01-2015 1
Snow 06-01-2015 10

The objective is to compute category wise average of values against the 5 most recent dates.  So the expected result is:

Categorie Category wise average values on 5 most recent dates
Bird 5.60
Cat 4.20
Crab 5.60
Dog 6.40
Fish 6.20
Snow 7.50

So for the Category of Fish (dates are sorted in descending order), the average should be 6.2 (average of 6,3,5,8,9)

Catagorie Date Value
Fish 08-12-2015 6
Fish 12-10-2015 3
Fish 20-07-2015 5
Fish 02-06-2015 8
Fish 26-05-2015 9
Fish 16-02-2015 4
Fish 08-02-2015 6

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Determine the total number of projects by Status

{0 Comments}

Here's a simple 3 column table showing Date, Project name (Cat.) and Status of the project.  Each project can have multiple status entries on different dates.  So as you can observe, project "alpha_9383993" was In Progress on Oct 2, 2017, remained so on October 5, 2017 but was completed on October 6, 2017.

Date Cat. Status
02-Oct-17 alpha_9383993 In Progress
03-Oct-17 Pulse_9387388 In Progress
04-Oct-17 Pulse_9387388 Rework
05-Oct-17 alpha_9383993 In Progress
06-Oct-17 alpha_9383993 Completed
07-Oct-17 Pulse_9387388 Completed
08-Oct-17 Oppo_tes_9383 In Progress
09-Oct-17 Oppo_Max_8977 Rework

The objective is to determine the count of projects by Status as per the most recent status of every project.  So the expected result is:

Row Labels measure 2
Completed 2
In Progress 1
Rework 1

The result for In Progress should be one because there is only one such project - Oppo_tes_9383.  Project alpha_9383993 should not be counted because it was completed on October 6, 2017.  Likewise the result for Rework should be one because there is only one such project - Oppo_Max_8977.  Project Pulse_9387388 should not be counted because it was completed on October 7,2017.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

In a Pivot Table, compute highest revenue earned on any day from each customer and the date thereof

{0 Comments}

Here's a simple dataset showing the Date of sale, Customer Name and Sales amount.

Date Customer Name Sales amount
12-03-2017 A 1
12-03-2017 A 2
12-03-2017 A 3
12-03-2017 B 4
12-03-2017 B 5
12-03-2017 B 6
12-03-2017 B 7
12-03-2017 B 8
13-03-2017 A 1
13-03-2017 A 1

The objective is to determine Customer wise:

  1. Highest revenue; and
  2. Date on which that highest revenue was earned

The expected result is

Row Labels Highest revenue Date of highest revenue
A 6 12-Mar-2017
B 30 12-Mar-2017
Grand Total 36

So, for A, revenue earned on March 12, 2017 is 6 and on March 13, 2017 is 2.  Since the higher of the two is 6, that is the result which should appear in the Pivot Table.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

In a Pivot Table, show the most frequently appearing text entry by a certain parameter

{0 Comments}

Here's a simple two column dataset

Comment Identifier Intervals
A 3pm-6pm
A 9pm-12pm
S 3pm-6pm
S 3pm-6pm
S 9pm-12pm
A 9pm-12pm
S 9pm-12pm
D 3pm-6pm
A 9pm-12pm
A 9pm-12pm
A 9pm-12pm
A 3pm-6pm
A 3pm-6pm

For identifiers listed in column A, there are time intervals in column B. Note that for a certain identifier, a time interval can appear multiple times. The objective is two-fold:

  1. For each identifier, show the time interval which appears most frequently; and
  2. For each identifier, compute the count of the time interval which appears most frequently

The expected result is:

untitled

As you can observe, the result for S is two time periods - 3pm-6pm and 9pm-12pm.  This is because each of them appears twice.

You may download my solution from here.

Alter the behaviour of a filter/slicer from OR to AND

{3 Comments}

Here is a simple two column dataset which shows EmpID in column A and DateWorked in column B.  My objective is to filter the table to show those employees who worked ONLY on August 17 and August 18.  I'd like to exclude employees who:

  1. Did not work on both these days; and
  2. Worked on both these dates but also worked on other dates

A simple filter on the DatesWorked column with the criteria of August 17 and August 18, would return 7 employees.  This is clearly wrong because this will show employees who:

  1. Worked on any of the two chosen dates; and
  2. Worked on dates other than the two chosen dates

From Table 1, one can clearly observe that the correct result should be EmpID E.  Only EmpID E worked on August 17 and 18.  While others also worked on both these dates, they worked on other dates as well and should therefore be excluded.

The problem outlined above holds true for selections made via a slicer in a Pivot Table as well.  So this is a generic problem with a filter/slicer - conditions specified therein behave like OR conditions, not like AND conditions.

Base dataset

EmpID DateWorked
A 14-08-2017
A 15-08-2017
A 16-08-2017
A 17-08-2017
A 18-08-2017
B 15-08-2017
B 16-08-2017
B 18-08-2017
C 14-08-2017
C 15-08-2017
C 16-08-2017
C 17-08-2017
C 18-08-2017
D 14-08-2017
D 15-08-2017
D 16-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 16-08-2017
F 17-08-2017
G 18-08-2017

Filtered dataset (after specifying the criteria as 17 Aug and 18 Aug)

EmpID DateWorked
A 17-08-2017
A 18-08-2017
B 18-08-2017
C 17-08-2017
C 18-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 17-08-2017
G 18-08-2017

So the question is "How does one change the behavior of a filter/slicer from OR to AND?". To make things worse, the data is arranged as follows:

EmpID DateWorked
A 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
B 15/8/17;16/8/17;18/8/17
C 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
D 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
E 17/8/17;18/8/17
F 16/8/17;17/8/17
G 18-08-17

As you can see, there are multiple dates in a single cell separated by a ;.

You may refer to my Power Query and PowerPivot solution here.

Compute standard hours spent on weekdays by Tier, Week, Month and Country

{0 Comments}

Imagine a Sales dataset with the following columns - ID, Country, Start date, End date and Tier. Here's a snapshot of the table:

ID Country Start date End date Tier
33948 ES 25-Sep-17 28-Sep-17 3
19820 US 08-Oct-17 17-Oct-17 4
9118 US 27-Oct-17 03-Nov-17 2
1563 ES 02-Sep-17 07-Sep-17 2
11087 US 18-Oct-17 27-Oct-17 1
15057 US 05-Nov-17 13-Nov-17 4
13567 ES 18-Oct-17 26-Oct-17 4
22362 DE 28-Aug-17 30-Aug-17 4
31301 FR 04-Nov-17 10-Nov-17 2
22838 US 05-Sep-17 13-Sep-17 4

There is another 2 column table which lists down the Tier wise standard hours. Here's a snapshot of the table:

Tier Daily load hh
1 0.7
2 0.72
3 0.8
4 1.2

The result which one expects from these two tables is shown below:

Tier
Month Week 1 2 3 4 Grand total   Slicer
9 35 x x x x x US
36 x x x x x UK
37 x x x x x FR
38 x x x x x IT
39 x x x x x ES
Grand total   x x x x x    

Let's take a specific example to show the result expected for the US Region. From the last row of the Sales Table, one can see that there is a ID 22838 in US which spans the date range 05-Sept-17 to 13-Sep-17. These dates span week numbers 36 (from 05-Sept-17 to 08-Sep-17) and 37 (from 11-Sept-17 to 13-Sep-17) [The week numbers have been obtained by using Excel's WEEKNUM() function]. So, is US, for Tier 4, in Week 36, the standard hours worked on weekdays would be 4.8 (4 days * 1.2 hours per day). Likewise, in week 37, the standard hours worked on weekdays would be 3.6 (3 days * 1.2 hours).

I have solved this problem using Power Query a.k.a Get & Transform (Available under Data) in Excel 2016 and PowerPivot.  You may download my solution workbook from here.

Determine cumulative interest payable on an annuity with varying time periods

{0 Comments}

Imagine a fixed monthly amount due to an Organisation for services rendered to various customers.  While an invoice is raised every month by this Organisation, not all pay up the dues on time.  For unpaid dues, the Organisation charges its client interest ranging from 3% to 9% per annum.  The objective is to determine cumulative interest payable by various customers to Organisation X.

The base data looks like this

Client Monthly revenue Int. calculation start date Int. calculation end date Interest rate
Client A 33,967 01-Aug-16 25-Jul-17 9.00%
Client B 123 12-Sep-16 30-Nov-17 4.00%

Given the dataset above, the total interest payable by Client A is Rs. 16,237.20.  The calculation is shown below:

From To Days for which interest should be paid Principal Interest
02-Aug-16 31-Aug-16 328.00 33,967.00 2,745.26
01-Sep-16 30-Sep-16 298.00 33,967.00 2,494.17
01-Oct-16 31-Oct-16 267.00 33,967.00 2,234.71
01-Nov-16 30-Nov-16 237.00 33,967.00 1,983.62
01-Dec-16 31-Dec-16 206.00 33,967.00 1,724.16
01-Jan-17 31-Jan-17 175.00 33,967.00 1,464.70
01-Feb-17 28-Feb-17 147.00 33,967.00 1,230.34
01-Mar-17 31-Mar-17 116.00 33,967.00 970.88
01-Apr-17 30-Apr-17 86.00 33,967.00 719.79
01-May-17 31-May-17 55.00 33,967.00 460.33
01-Jun-17 30-Jun-17 25.00 33,967.00 209.24
01-Jul-17 25-Jul-17 - 33,967.00 -
Total       16,237.20

You may download my solution workbook with from here. I have solved this problem using normal Excel formulas and the PowerPivot.