# Tags: FILTER

Imagine a fixed monthly amount due to an Organisation for services rendered to various customers.  While an invoice is raised every month by this Organisation, not all pay up the dues on time.  For unpaid dues, the Organisation charges its client interest ranging from 3% to 9% per annum.  The objective is to determine cumulative interest payable by various customers to Organisation X.

The base data looks like this

 Client Monthly revenue Int. calculation start date Int. calculation end date Interest rate Client A 33,967 01-Aug-16 25-Jul-17 9.00% Client B 123 12-Sep-16 30-Nov-17 4.00%

Given the dataset above, the total interest payable by Client A is Rs. 16,237.20.  The calculation is shown below:

 From To Days for which interest should be paid Principal Interest 02-Aug-16 31-Aug-16 328.00 33,967.00 2,745.26 01-Sep-16 30-Sep-16 298.00 33,967.00 2,494.17 01-Oct-16 31-Oct-16 267.00 33,967.00 2,234.71 01-Nov-16 30-Nov-16 237.00 33,967.00 1,983.62 01-Dec-16 31-Dec-16 206.00 33,967.00 1,724.16 01-Jan-17 31-Jan-17 175.00 33,967.00 1,464.70 01-Feb-17 28-Feb-17 147.00 33,967.00 1,230.34 01-Mar-17 31-Mar-17 116.00 33,967.00 970.88 01-Apr-17 30-Apr-17 86.00 33,967.00 719.79 01-May-17 31-May-17 55.00 33,967.00 460.33 01-Jun-17 30-Jun-17 25.00 33,967.00 209.24 01-Jul-17 25-Jul-17 - 33,967.00 - Total 16,237.20

You may download my solution workbook with from here. I have solved this problem using normal Excel formulas and the PowerPivot.

Here is a sample dataset of learners who have cleared different stages of multiple courses on offer within an Organisation:

 Learner Stage completed Course Bill Stage 1 Public Speaking Bill Stage 2 Public Speaking Bill Stage 3 Public Speaking Susan Stage 1 Effective Communication Bob Stage 1 Public Speaking Bob Stage 2 Public Speaking Sheila Stage 1 Effective Communication Sheila Stage 2 Effective Communication Sheila Stage 3 Effective Communication Frank Stage 1 Effective Communication Frank Stage 2 Effective Communication Henry Stage 1 Public Speaking Henry Stage 2 Public Speaking Bill Stage 1 Effective Communication Bill Stage 2 Effective Communication

From this sample dataset, one may want to know how many participants have completed each stage of these multiple courses.  The expected result is shown below:

 Row Labels Stage 1 Stage 2 Stage 3 Effective Communication 1 2 1 Public Speaking 2 1 Grand Total 1 3 2

In this workbook, I have shared 2 solutions - one using formulas and the other using the Power Query & PowerPivot.

Assume a simple two column dataset with dates in column A and numbers in column B. The dates in column A are from January 1, 2013 to December 31, 2016 and numbers in column B are for the period January 1, 2013 to December 31, 2015 (there are no numbers for January 1, 2016 to December 31, 2016).

The objective is to "Compute an average for each day of calendar year 2016. The average should be for the occurrence of that day in the previous 3 years". Here's an example:

1. January 1, 2016 was a Friday (the first Friday of 2016) and is in cell A1097
2. In cell B1097, the average should be computed as: Average of the "First Friday of each of the previous 3 years"
3. January 8, 2016 was a Friday (the second Friday of 2016) and is in cell A1104
4. In cell B1104, the average should be computed as: Average of the "Second Friday of each of the previous 3 years"

I have solved this problem with the help of the PowerPivot. You may refer to my solution in this workbook.

Affinity analysis encompasses a broad set of Analytic techniques aimed at uncovering the associations and connections between specific objects: these might be visitors to a website (customers or audience), products in a store or content items on a media site. Of these, “market basket analysis” is perhaps the most common example. In a market basket analysis, one analyses combinations of products that frequently co-occur in transactions.  For e.g., Of all the shoppers today, how many purchased a school uniform and a school bag.  A retailer can use this information to:

1. Improve the customer shopping experience by rearranging the store layout (place products that co-occur together close to one another); and
2. Run a focused marketing campaign (e.g. target customers who buy a school uniform with offers on school bags, to encourage them to spend more on their shopping basket)

Online retailers and publishers can use this type of analysis to:

1. Inform the placement of content items on their media sites, or products in their catalogue
2. Drive recommendation engines (like Amazon’s customers who bought this product also bought these products…)
3. Deliver targeted marketing (e.g. emailing customers who bought products specific products with other products and offers on those products that are likely to be interesting to them)

Consider a dataset with four columns - Date, Order Numbers, Items sold and Item Description.  To simplify, let's ignore columns 1 and 4 for now.  One may want to know the answer to the following question:

For a particular item sold (say Item X), in how many orders (order frequency) were items X and Item Y were sold together OR in how many orders (order frequency) were items X and Item Z together.

In the Excel workbook (download link below), number of rows in the Sales data sheet is 265,321 (file size is 11 MB) and the number of unique items sold is 23,201.  In Excel, one has to pair each of the 23,201 items with the other 23,200 items to know if there is any affiliation or not.  The sheer size of that dataset would make any formula unusable here.

I have solved this problem with the help of the PowerPivot.  Please note that in the PowerPivot solution worksheet, I have filtered the Row labels on a specific Item number.  This is so because if I attempt to clear the filter criteria to view all Item numbers, then I get a message saying that the number of rows exceeds 1 million.  Also, since the calculations are very intensive, recalculation takes time when you change the criteria in the filter dropdown of the Pivot Table.  Therefore, once you change the filter criteria in the Pivot Table, wait for the Reading data counter (bottom right on the taskbar) to finish processing.

Interpretation

1. Cell C5 - 1,725 orders contained the Item number TNB521711234E.  This figure can be verified via the following steps:
a. On the Sales data worksheet, filter the item number column on TNB521711234E.
b. Copy the resulting rows of the Order number column to another worksheet
c. Select the data so copied and go to Data > Remove Duplicates
d. Count the ensuing rows
2. Cell C6 - Of the 1725 orders containing Item number TNB521711234E, 540 contained Item number TNB52C1
3. Cell C7 - Of the 1725 orders containing Item number TNB521711234E, 446 contained Item number EMT34

You may refer to my solution in this workbook.

Assume there are four interrelated tables. One may want to create a pivot that allows one to filter data by using the slicers. Data should be filtered by the following interdependent slicers selections: Customer, Country and segment.  The logic behind the pivot when using the slicers shall be as follows:

1. Feature N is only shown if relevant to Customer X in Segment Y and Competitors do not possess Feature N
2. Competitor X is only shown if Competitor X exists and is active in Country Z and is relevant to Customer Y in Country Z

So after slicer selections are made, the idea is to display all features that one can offer and are relevant to the respective customer in the respective segment and country, regardless of whether the competitors can offer them or not.  So if one competes with competitor 1 in a specific project and offers features 1, 3, 7, offering the very same features to our shared customer does not make sense.  The customer won't see a benefit in choosing me over competitor 1.

Here's an elaborate example:

1. If one selects Customer 1/Segment A/Country 1 from the three slicers, then the Pivot Table should display as follows:

1. Row Labels - Display features in pivot if they are implemented or relevant.  The ones in Blue are implemented and the ones in green are relevant.  The pivot now shows that competitor 1 does not have features 2 and 9 which. Thus one's sales pitch will focus on offering features 2 and 9.  This section should also show data for Feature benefit calculation.

2. Column labels - Competitor 1 and 2 are displayed in the Pivot Table because they are both relevant in Country 1.

3. Competitor has/does not have feature (Value area section) - The following competitors have the same features I can offer my customer 1 in segment A:

• Competitor 1 has features           1, 3, 7
• Competitor 2 has features           3

Therefore they are marked with an “x” whenever they have the features.

The other competitors also have features that I offer to customer 1 in segment A, but since they are not active in Country 1, they are automatically not displayed when using the slicers.

You may refer to my solution in this workbook.

You may also view a video of my Power Query solution here:

Assume a five column dataset - ID, Age, Gender, Time and Class.  For chosen ID's, the objective is to:

1. Assign a Rank (in ascending order of time i.e. lowest time will be rank 1 and so on) to each ID
2. Determine the overall place of each ID - Count of unique time entries lesser than equal to the chosen ID' time entry

These can be computed with the VLOOKUP(), RANK(), FREQUENCY(), INDIRECT() functions and array formulas.  You may refer to range H3:K8 of the Sample worksheet.  So far so good.

What adds to the problem is to meet the objectives outlined above after satisfying additional conditions.  For e.g., one may want to give conditions such as Age between 20 and 35 and colours as Orange and Yellow.  Carrying out computations for ranking and Overall place after satisfying these conditions will make the formulas fairly complex.

I have been able to solve this problem with the help of the PowerPivot.  You may download my solution workbook from this link.

Assume a three column dataset which has Year, Company ID and Cash flows.  For each Company, there are cash flows for multiple years.  So for Company ID A001, there are 7 rows, one each for 2010 to 2004 and cash flows appearing in a third column.  Let's assume the number of rows are 750,000.

The task is to compute the year on year growth rate in a fourth column.  While this problem can easily be solved by writing a formula in a fourth column, copying that formula all the way down to 750,000 rows will be time consuming and processor intensive.

I have been able to solve this problem using PowerPivot.  You may download the workbook from here.

You may refer to related questions at this link

Assume a three column dataset showing Audit ID, Date of receipt of audit mandate and Date of audit completion.  There are other columns as well but they are not important for our Analysis.  One may want to compute the following month wise:

1. Which (Audit ID) are the audits pending at the end of every month; and
2. When (Date of receipt of audit mandate) was the mandate for these pending audits received; and
3. Ageing of these pending audits i.e. this would be computed as the last date of the month less Date of receipt of audit mandate

Here's an example:

In January 2014, there are a total of 10 audits reports which were received (Filter "Date of receipt of audit mandate" column on January 2014).  Of these 10 audits, 4 were completed in January 2014 (Filter "Date of audit completion" column on January 2014) itself and therefore there are 6 pending audits.  To this figure of 6, we need to add the audits pending from previous months.  If one filters column "Date of receipt of audit mandate" column on Oct-Dec 2013 and "Date of audit completion" column on dates after January 2014, 8 rows will appear.  This means that there are 8 audits which were received before 1 January 2014 but were completed only after 31 January 2014.  So the total number of pending audits as at 31 January 2014 are 8+6=14.  This task needs to be carried out for all months.

You may refer to my solution in this workbook.

Assume two databases:

1. One showing employee headcount (one row per employee) which has all employee details such as Name, ID, Date of Joining, Supervisor name, Department etc. (Range A1:R781 of Source worksheet)

2. The other showing data for employees who resigned. (Range U1:Z36 of Source worksheet)

The task is to compute the attrition rate for selected Group and selected months.  Groups and months will be selected from slicers.

In the attached workbook, one can see the aborted Pivot Table attempt and the successful PowerPivot solution.  Refer to cell I25 of Abortive Pivot Table attempt worksheet to see how attrition rate should be computed.

Visualise Sales Data of a Non-Alcoholic Beverage Company with basic columnar information such as Date of Sale, Time of Sale, Brand, Stock Keeping Unit (SKU), State, City, Quantity sold, Unit Price and Salesman Code.  In this sales dataset, each line item represents one visit for one SKU.  If nothing is sold in a certain visit, then the SKU column displays No Sale.  So effectively there is a line item for each visit whether or not something is sold in that visit.

From this simple Sales dataset, here are a few questions which one may need to find answers to:

1. How did the Company perform (in both years 2013 and 2014) on two of the most critical Key Performance Indicators (KPI's) - Quantity sold and Number of Visits.  Also, what is the month wise break up of these two KPI's.

2. Study and slice the two KPI's from various perspectives such as "Type of Outlet visited", "Type of Visit" - Scheduled or Unscheduled, "Day of week", "Brand", "Sub brand".

3. Over a period of time, how did various SKU's fair on the twin planks of "Effort" i.e. Number of visits YTD and "Business Generated" i.e. Quantity sold YTD.

4. Analyse the performance of the Company on both KPI's:
a. During Festive season/Promotional periods/Events; and
b. During different months of the same year; and
c. During same month of different years; and
d. Quarter to Date

5. "Complimentary Product sold Analysis" - Analysis displayed on online retailers such as Amazon.com - "Customers who bought this also bought this".  So in the Sales dataset referred to above, one may want to know "In this month, outlets which bought this SKU, also bought this much quantity of these other SKU's."

6. "Outlet Rank slippage" - Which are the Top 10 Outlets in 2013 and what rank did they maintain in 2014.  What is the proportion of quantity sold by each of the Top 10 outlets of 2013 to:
a. Total quantity sold by all Top 10 outlets in 2013; and
b. Total quantity sold by all outlets in 2013

7. In any selected month, which new outlets did the Company forge partnerships with

8. Which employees visited their assigned outlets once in two or three weeks instead of visiting them once every week (as required by Management).

9. Which outlets were not visited at all in a particular month

10. Business generated from loyal Customers - Loyal Customers are those who transacted with the Company in a chosen month and in the previous 2 months.

These are only a few of my favourite questions which I needed answers to when I first reviewed this Sales Data.  Using Microsoft Excel's Business Intelligence Tools (Power Query, PowerPivot and Power View), I could answer all questions stated above and a lot more.

You may watch a short video of my solution here