Tags: CALCULATE

Alter the behaviour of a filter/slicer from OR to AND

{3 Comments}

Here is a simple two column dataset which shows EmpID in column A and DateWorked in column B.  My objective is to filter the table to show those employees who worked ONLY on August 17 and August 18.  I'd like to exclude employees who:

  1. Did not work on both these days; and
  2. Worked on both these dates but also worked on other dates

A simple filter on the DatesWorked column with the criteria of August 17 and August 18, would return 7 employees.  This is clearly wrong because this will show employees who:

  1. Worked on any of the two chosen dates; and
  2. Worked on dates other than the two chosen dates

From Table 1, one can clearly observe that the correct result should be EmpID E.  Only EmpID E worked on August 17 and 18.  While others also worked on both these dates, they worked on other dates as well and should therefore be excluded.

The problem outlined above holds true for selections made via a slicer in a Pivot Table as well.  So this is a generic problem with a filter/slicer - conditions specified therein behave like OR conditions, not like AND conditions.

Base dataset

EmpID DateWorked
A 14-08-2017
A 15-08-2017
A 16-08-2017
A 17-08-2017
A 18-08-2017
B 15-08-2017
B 16-08-2017
B 18-08-2017
C 14-08-2017
C 15-08-2017
C 16-08-2017
C 17-08-2017
C 18-08-2017
D 14-08-2017
D 15-08-2017
D 16-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 16-08-2017
F 17-08-2017
G 18-08-2017

Filtered dataset (after specifying the criteria as 17 Aug and 18 Aug)

EmpID DateWorked
A 17-08-2017
A 18-08-2017
B 18-08-2017
C 17-08-2017
C 18-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 17-08-2017
G 18-08-2017

So the question is "How does one change the behavior of a filter/slicer from OR to AND?". To make things worse, the data is arranged as follows:

EmpID DateWorked
A 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
B 15/8/17;16/8/17;18/8/17
C 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
D 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
E 17/8/17;18/8/17
F 16/8/17;17/8/17
G 18-08-17

As you can see, there are multiple dates in a single cell separated by a ;.

You may refer to my Power Query and PowerPivot solution here.

Compute standard hours spent on weekdays by Tier, Week, Month and Country

{0 Comments}

Imagine a Sales dataset with the following columns - ID, Country, Start date, End date and Tier. Here's a snapshot of the table:

ID Country Start date End date Tier
33948 ES 25-Sep-17 28-Sep-17 3
19820 US 08-Oct-17 17-Oct-17 4
9118 US 27-Oct-17 03-Nov-17 2
1563 ES 02-Sep-17 07-Sep-17 2
11087 US 18-Oct-17 27-Oct-17 1
15057 US 05-Nov-17 13-Nov-17 4
13567 ES 18-Oct-17 26-Oct-17 4
22362 DE 28-Aug-17 30-Aug-17 4
31301 FR 04-Nov-17 10-Nov-17 2
22838 US 05-Sep-17 13-Sep-17 4

There is another 2 column table which lists down the Tier wise standard hours. Here's a snapshot of the table:

Tier Daily load hh
1 0.7
2 0.72
3 0.8
4 1.2

The result which one expects from these two tables is shown below:

Tier
Month Week 1 2 3 4 Grand total   Slicer
9 35 x x x x x US
36 x x x x x UK
37 x x x x x FR
38 x x x x x IT
39 x x x x x ES
Grand total   x x x x x    

Let's take a specific example to show the result expected for the US Region. From the last row of the Sales Table, one can see that there is a ID 22838 in US which spans the date range 05-Sept-17 to 13-Sep-17. These dates span week numbers 36 (from 05-Sept-17 to 08-Sep-17) and 37 (from 11-Sept-17 to 13-Sep-17) [The week numbers have been obtained by using Excel's WEEKNUM() function]. So, is US, for Tier 4, in Week 36, the standard hours worked on weekdays would be 4.8 (4 days * 1.2 hours per day). Likewise, in week 37, the standard hours worked on weekdays would be 3.6 (3 days * 1.2 hours).

I have solved this problem using Power Query a.k.a Get & Transform (Available under Data) in Excel 2016 and PowerPivot.  You may download my solution workbook from here.

Determine cumulative interest payable on an annuity with varying time periods

{0 Comments}

Imagine a fixed monthly amount due to an Organisation for services rendered to various customers.  While an invoice is raised every month by this Organisation, not all pay up the dues on time.  For unpaid dues, the Organisation charges its client interest ranging from 3% to 9% per annum.  The objective is to determine cumulative interest payable by various customers to Organisation X.

The base data looks like this

Client Monthly revenue Int. calculation start date Int. calculation end date Interest rate
Client A 33,967 01-Aug-16 25-Jul-17 9.00%
Client B 123 12-Sep-16 30-Nov-17 4.00%

Given the dataset above, the total interest payable by Client A is Rs. 16,237.20.  The calculation is shown below:

From To Days for which interest should be paid Principal Interest
02-Aug-16 31-Aug-16 328.00 33,967.00 2,745.26
01-Sep-16 30-Sep-16 298.00 33,967.00 2,494.17
01-Oct-16 31-Oct-16 267.00 33,967.00 2,234.71
01-Nov-16 30-Nov-16 237.00 33,967.00 1,983.62
01-Dec-16 31-Dec-16 206.00 33,967.00 1,724.16
01-Jan-17 31-Jan-17 175.00 33,967.00 1,464.70
01-Feb-17 28-Feb-17 147.00 33,967.00 1,230.34
01-Mar-17 31-Mar-17 116.00 33,967.00 970.88
01-Apr-17 30-Apr-17 86.00 33,967.00 719.79
01-May-17 31-May-17 55.00 33,967.00 460.33
01-Jun-17 30-Jun-17 25.00 33,967.00 209.24
01-Jul-17 25-Jul-17 - 33,967.00 -
Total       16,237.20

You may download my solution workbook with from here. I have solved this problem using normal Excel formulas and the PowerPivot.

Determine number of learners who have completed different stages of multiple online courses

{4 Comments}

Here is a sample dataset of learners who have cleared different stages of multiple courses on offer within an Organisation:

Learner Stage completed Course
Bill Stage 1 Public Speaking
Bill Stage 2 Public Speaking
Bill Stage 3 Public Speaking
Susan Stage 1 Effective Communication
Bob Stage 1 Public Speaking
Bob Stage 2 Public Speaking
Sheila Stage 1 Effective Communication
Sheila Stage 2 Effective Communication
Sheila Stage 3 Effective Communication
Frank Stage 1 Effective Communication
Frank Stage 2 Effective Communication
Henry Stage 1 Public Speaking
Henry Stage 2 Public Speaking
Bill Stage 1 Effective Communication
Bill Stage 2 Effective Communication

From this sample dataset, one may want to know how many participants have completed each stage of these multiple courses.  The expected result is shown below:

Row Labels Stage 1 Stage 2 Stage 3
Effective Communication 1 2 1
Public Speaking 2 1
Grand Total 1 3 2

In this workbook, I have shared 2 solutions - one using formulas and the other using the Power Query & PowerPivot.

Compute an average for the same day in the past 3 years

{4 Comments}

Assume a simple two column dataset with dates in column A and numbers in column B. The dates in column A are from January 1, 2013 to December 31, 2016 and numbers in column B are for the period January 1, 2013 to December 31, 2015 (there are no numbers for January 1, 2016 to December 31, 2016).

The objective is to "Compute an average for each day of calendar year 2016. The average should be for the occurrence of that day in the previous 3 years". Here's an example:

1. January 1, 2016 was a Friday (the first Friday of 2016) and is in cell A1097
2. In cell B1097, the average should be computed as: Average of the "First Friday of each of the previous 3 years"
3. January 8, 2016 was a Friday (the second Friday of 2016) and is in cell A1104
4. In cell B1104, the average should be computed as: Average of the "Second Friday of each of the previous 3 years"

I have solved this problem with the help of the PowerPivot. You may refer to my solution in this workbook.

Workaround to the problem of creating a Pivot chart after using “% of row total” calculation in a Pivot Table

{0 Comments}

Here is a dashboard created with a Pivot Table, a Pivot chart and slicers (Click to enlarge image).  In the Pivot Table, the % have been computed using "% of row total".

Untitled

The Pivot chart shows two columns per month - one for complete and the other for incomplete.  The objective is to show only the Complete column per month in the Pivot chart.  If one filters the Pivot Table only on Complete, then the Pivot chart shows an unexpected result - each column will go upto 100%.  This happens because all calculations in a Pivot Table happen only on the visible cells.  Once the Incomplete column is hidden, the "% of row total" calculation will return 100% which in turn reflects in the Pivot chart.

I have resolved this problem with the help of the PowerPivot.  You may refer to my solution in this workbook.

Perform an “Affinity analysis” to identify co-selling products

{2 Comments}

Affinity analysis encompasses a broad set of Analytic techniques aimed at uncovering the associations and connections between specific objects: these might be visitors to a website (customers or audience), products in a store or content items on a media site. Of these, “market basket analysis” is perhaps the most common example. In a market basket analysis, one analyses combinations of products that frequently co-occur in transactions.  For e.g., Of all the shoppers today, how many purchased a school uniform and a school bag.  A retailer can use this information to:

1. Improve the customer shopping experience by rearranging the store layout (place products that co-occur together close to one another); and
2. Run a focused marketing campaign (e.g. target customers who buy a school uniform with offers on school bags, to encourage them to spend more on their shopping basket)

Online retailers and publishers can use this type of analysis to:

1. Inform the placement of content items on their media sites, or products in their catalogue
2. Drive recommendation engines (like Amazon’s customers who bought this product also bought these products…)
3. Deliver targeted marketing (e.g. emailing customers who bought products specific products with other products and offers on those products that are likely to be interesting to them)

Consider a dataset with four columns - Date, Order Numbers, Items sold and Item Description.  To simplify, let's ignore columns 1 and 4 for now.  One may want to know the answer to the following question:

For a particular item sold (say Item X), in how many orders (order frequency) were items X and Item Y were sold together OR in how many orders (order frequency) were items X and Item Z together.

In the Excel workbook (download link below), number of rows in the Sales data sheet is 265,321 (file size is 11 MB) and the number of unique items sold is 23,201.  In Excel, one has to pair each of the 23,201 items with the other 23,200 items to know if there is any affiliation or not.  The sheer size of that dataset would make any formula unusable here.

I have solved this problem with the help of the PowerPivot.  Please note that in the PowerPivot solution worksheet, I have filtered the Row labels on a specific Item number.  This is so because if I attempt to clear the filter criteria to view all Item numbers, then I get a message saying that the number of rows exceeds 1 million.  Also, since the calculations are very intensive, recalculation takes time when you change the criteria in the filter dropdown of the Pivot Table.  Therefore, once you change the filter criteria in the Pivot Table, wait for the Reading data counter (bottom right on the taskbar) to finish processing.

Interpretation

1. Cell C5 - 1,725 orders contained the Item number TNB521711234E.  This figure can be verified via the following steps:
a. On the Sales data worksheet, filter the item number column on TNB521711234E.
b. Copy the resulting rows of the Order number column to another worksheet
c. Select the data so copied and go to Data > Remove Duplicates
d. Count the ensuing rows
2. Cell C6 - Of the 1725 orders containing Item number TNB521711234E, 540 contained Item number TNB52C1
3. Cell C7 - Of the 1725 orders containing Item number TNB521711234E, 446 contained Item number EMT34

You may refer to my solution in this workbook.

Quantify combination courses opted by students

{0 Comments}

Assume a dataset with two columns which lists down the student names in column A and courses opted for in column B.  Since one student can opt for multiple courses and the same course can be taken up by multiple students, there can be repetitions in both columns.  The objective is to create a matrix like data structure (with courses appearing in both row and column labels) with numbers inside the matrix quantifying the "Number of students who opted for course A and C".  So, for all possible course combinations, one may want to know the number of students who opted for those combinations.

The description above can be extended to cases where buying behavior has to be analysed.  A sore manager may want to know "How many people who buy Brand A also buy Brand B."

Here's a snapshot of the source data and expected result

The number 1 in cell H4 (and cell F6) means that there is only one student who opted for courses B and D.  Likewise, 3 in cell H5 (and cell G6) means that 3 students opted for courses C and D.

You may refer to my Power Query and PowerPivot solution in this workbook.  Power Query has been used for generating a dynamic list of Courses and Power Pivot has been used for writing the DAX formula for quantifying within the matrix.

Identify buy and sell break points

{0 Comments}

Assume a two column dataset with Date in the first column and Price in the second one.  The purpose is to identify times to buy and sell - buying would be just after the lowest low is confirmed and sell before or just after the highest high is in place. Confirmation is achieved through crossover of moving averages. This data is being used in back testing buy and sell criteria.

Snapshot of base data

Snapshot of expected result

The Lowest Low is the lowest price that occurs before the next Highest High.  The Highest High is the highest price that occurs before the next Lowest Low..  2.77 is the lowest low after the highest high of 3.69 and 3.23 is highest high after the lowest low of 2.77.

You may refer to my solution in this workbook.

Rank numbers in a range after satisfying conditions

{0 Comments}

Assume a five column dataset - ID, Age, Gender, Time and Class.  For chosen ID's, the objective is to:

1. Assign a Rank (in ascending order of time i.e. lowest time will be rank 1 and so on) to each ID
2. Determine the overall place of each ID - Count of unique time entries lesser than equal to the chosen ID' time entry

These can be computed with the VLOOKUP(), RANK(), FREQUENCY(), INDIRECT() functions and array formulas.  You may refer to range H3:K8 of the Sample worksheet.  So far so good.

What adds to the problem is to meet the objectives outlined above after satisfying additional conditions.  For e.g., one may want to give conditions such as Age between 20 and 35 and colours as Orange and Yellow.  Carrying out computations for ranking and Overall place after satisfying these conditions will make the formulas fairly complex.

I have been able to solve this problem with the help of the PowerPivot.  You may download my solution workbook from this link.