Tags: IF

Quantify combination courses opted by students

{0 Comments}

Assume a dataset with two columns which lists down the student names in column A and courses opted for in column B.  Since one student can opt for multiple courses and the same course can be taken up by multiple students, there can be repetitions in both columns.  The objective is to create a matrix like data structure (with courses appearing in both row and column labels) with numbers inside the matrix quantifying the "Number of students who opted for course A and C".  So, for all possible course combinations, one may want to know the number of students who opted for those combinations.

The description above can be extended to cases where buying behavior has to be analysed.  A sore manager may want to know "How many people who buy Brand A also buy Brand B."

Here's a snapshot of the source data and expected result

The number 1 in cell H4 (and cell F6) means that there is only one student who opted for courses B and D.  Likewise, 3 in cell H5 (and cell G6) means that 3 students opted for courses C and D.

You may refer to my Power Query and PowerPivot solution in this workbook.  Power Query has been used for generating a dynamic list of Courses and Power Pivot has been used for writing the DAX formula for quantifying within the matrix.

Identify buy and sell break points

{0 Comments}

Assume a two column dataset with Date in the first column and Price in the second one.  The purpose is to identify times to buy and sell - buying would be just after the lowest low is confirmed and sell before or just after the highest high is in place. Confirmation is achieved through crossover of moving averages. This data is being used in back testing buy and sell criteria.

Snapshot of base data

Snapshot of expected result

The Lowest Low is the lowest price that occurs before the next Highest High.  The Highest High is the highest price that occurs before the next Lowest Low..  2.77 is the lowest low after the highest high of 3.69 and 3.23 is highest high after the lowest low of 2.77.

You may refer to my solution in this workbook.

Perform a Competitor, Feature and Customer Analysis with the PowerPivot

{2 Comments}

Assume there are four interrelated tables. One may want to create a pivot that allows one to filter data by using the slicers. Data should be filtered by the following interdependent slicers selections: Customer, Country and segment.  The logic behind the pivot when using the slicers shall be as follows:

1. Feature N is only shown if relevant to Customer X in Segment Y and Competitors do not possess Feature N
2. Competitor X is only shown if Competitor X exists and is active in Country Z and is relevant to Customer Y in Country Z

So after slicer selections are made, the idea is to display all features that one can offer and are relevant to the respective customer in the respective segment and country, regardless of whether the competitors can offer them or not.  So if one competes with competitor 1 in a specific project and offers features 1, 3, 7, offering the very same features to our shared customer does not make sense.  The customer won't see a benefit in choosing me over competitor 1.

Here's an elaborate example:

1. If one selects Customer 1/Segment A/Country 1 from the three slicers, then the Pivot Table should display as follows:

1. Row Labels - Display features in pivot if they are implemented or relevant.  The ones in Blue are implemented and the ones in green are relevant.  The pivot now shows that competitor 1 does not have features 2 and 9 which. Thus one's sales pitch will focus on offering features 2 and 9.  This section should also show data for Feature benefit calculation.

2. Column labels - Competitor 1 and 2 are displayed in the Pivot Table because they are both relevant in Country 1.

3. Competitor has/does not have feature (Value area section) - The following competitors have the same features I can offer my customer 1 in segment A:

  • Competitor 1 has features           1, 3, 7
  • Competitor 2 has features           3

Therefore they are marked with an “x” whenever they have the features.

The other competitors also have features that I offer to customer 1 in segment A, but since they are not active in Country 1, they are automatically not displayed when using the slicers.

You may refer to my solution in this workbook.

You may also view a video of my Power Query solution here:

Analyse all possible combinations of cheques received and identify the combination which gives maximum benefit to the Customer

{11 Comments}

Scenario 1

In a Stock Broking business (Firm), Customers deposit margin money through cheques. Although cheques are received from customers at branches, clearing (realization) of the same normally takes 3-4 days.  On one hand, the Firm wishes to allow trading to the customers against the deposited cheques from the day it receives the cheque, on other hand there is a risk associated with this  i.e cheque may get dishonored and the Firm may incur huge financial losses because of allowing customers to trade in anticipation of the cheque realization. Hence, there is a trade off between customer centricity & Financial Risk management.

The Firm may decide to take a calculated risk in order to retain its customer centric approach (it is a delightful experience for customers if they get trading benefit against the deposited cheque instantly before realization). The Firm may decide to give the benefit of cheque which are below a set threshold amount (say 50,000) instantly and for cheques which are above threshold amount the credit will be given only after realization. Sometimes there are  multiple unrealized cheques deposited by one client but the Firm needs to cap the total benefit of unrealized cheques to 50,000 (partial benefit against any cheque is not possible i.e. either a cheque will be considered in full or not considered) E.g. if a customer deposits 3 cheques of 40,000 each then also the total instant benefit against all unrealized cheques shall be capped at 50,000 - hence only one cheques of 40,000 will be credited to customers ledger before realization.

Now the question here is with respect to selection of cheques to give best results i.e. maximum total below 50,000.  Consider the following example:

Cheque1: 55,000
Cheque2: 44,000
Cheque3: 28,000
Cheque4: 7,000
Cheque5: 6,500
Cheque6: 4,000
Cheque7: 3,000
Cheque8: 500

The Firm wants to identify the "Combination of cheques" which gives the maximum benefit to the Customer subject to a limit of 50,000.  The result should be cheques 3-8.  They add up to 49,000.

Here's another example:

Cheque1: 55,000
Cheque2: 45,000
Cheque3: 25,000
Cheque4: 5,000

The result here should be cheque 2 and 4.  They add up to 50,000.

Here are my thoughts:

1. I have assumed that the total number of cheques received from customer will be capped to 8.  This limit can be relaxed but processing will then take time.

2. Create all possible combinations of cheques which can be credited.  So, since no partial credit is allowed, the cheque can either be accepted in full or be rejected.  Hence the possible values can be 2 i.e. 1 (Accepted) or 0 (Rejected).

3. So if there are 8 cheques received from a certain customer, then there will be 2^8 i.e. 256 possible combinations.  This can also be computed by using the =PERMUTATIONA(2,8) formula.  I'd like to than Saurabh Gupta for sharing the formula to generate all these combinations.

4. After creating all possible combinations, add the cheque amount for each combination.

5. Scan this total column and highlight the one which is the largest value <= the threshold value i.e. 50,000.

You may refer to my solution in the Solution worksheet of this workbook.

Scenario 2

Here's another example.  Say, one has a list of numbers as shown below:

1072.94
1643.02
1248.09
2167.34
276.85
322.05
910.22
1146.39
56.50

Now one may want to know which combination of numbers adds up to or comes closest to 5115.52 (our lookup value).  If none of the combinations adds up exactly to the lookup value, then return the combination which adds to the largest value less then equal to the lookup value.  In the example above, the second, third and fourth add up to 5114.95.

You may find my solution in this workbook.

Rank numbers in a range after satisfying conditions

{0 Comments}

Assume a five column dataset - ID, Age, Gender, Time and Class.  For chosen ID's, the objective is to:

1. Assign a Rank (in ascending order of time i.e. lowest time will be rank 1 and so on) to each ID
2. Determine the overall place of each ID - Count of unique time entries lesser than equal to the chosen ID' time entry

These can be computed with the VLOOKUP(), RANK(), FREQUENCY(), INDIRECT() functions and array formulas.  You may refer to range H3:K8 of the Sample worksheet.  So far so good.

What adds to the problem is to meet the objectives outlined above after satisfying additional conditions.  For e.g., one may want to give conditions such as Age between 20 and 35 and colours as Orange and Yellow.  Carrying out computations for ranking and Overall place after satisfying these conditions will make the formulas fairly complex.

I have been able to solve this problem with the help of the PowerPivot.  You may download my solution workbook from this link.

Compute year on year growth in a Pivot Table

{0 Comments}

Assume a three column dataset which has Year, Company ID and Cash flows.  For each Company, there are cash flows for multiple years.  So for Company ID A001, there are 7 rows, one each for 2010 to 2004 and cash flows appearing in a third column.  Let's assume the number of rows are 750,000.

The task is to compute the year on year growth rate in a fourth column.  While this problem can easily be solved by writing a formula in a fourth column, copying that formula all the way down to 750,000 rows will be time consuming and processor intensive.

I have been able to solve this problem using PowerPivot.  You may download the workbook from here.

You may refer to related questions at this link

1. Computing growth % inside a pivot table
2. Compute Pro rata growth rate within a Pivot Table

Compute month wise pending audits

{0 Comments}

Assume a three column dataset showing Audit ID, Date of receipt of audit mandate and Date of audit completion.  There are other columns as well but they are not important for our Analysis.  One may want to compute the following month wise:

1. Which (Audit ID) are the audits pending at the end of every month; and
2. When (Date of receipt of audit mandate) was the mandate for these pending audits received; and
3. Ageing of these pending audits i.e. this would be computed as the last date of the month less Date of receipt of audit mandate

Here's an example:

In January 2014, there are a total of 10 audits reports which were received (Filter "Date of receipt of audit mandate" column on January 2014).  Of these 10 audits, 4 were completed in January 2014 (Filter "Date of audit completion" column on January 2014) itself and therefore there are 6 pending audits.  To this figure of 6, we need to add the audits pending from previous months.  If one filters column "Date of receipt of audit mandate" column on Oct-Dec 2013 and "Date of audit completion" column on dates after January 2014, 8 rows will appear.  This means that there are 8 audits which were received before 1 January 2014 but were completed only after 31 January 2014.  So the total number of pending audits as at 31 January 2014 are 8+6=14.  This task needs to be carried out for all months.

You may refer to my solution in this workbook.

Consider a Pivot Table Value field column as a criteria for computing another Value Field column

{8 Comments}

Assume a simple three column dataset showing hours worked by different machine on different dates.  So column A is Date, column B is Machine Name and column C is hours worked.  There are duplicates appearing in column A and B .  Blanks in column C depict machine idle time.

The task is to create a simple three column dataset showing all unique Machine names in the first column, Last day on which the machine worked in the second column and hours worked on the last day in the third column.

This problem can be solved by using formulas (Refer first worksheet of the workbook) but if one has to use a Pivot Table, then there would be a few problems.

1. The Grand Total for the Date Field should be blank because on cannot determine the Last day on which the machine worked across different machine types.  A conventional Pivot Table shows the Maximum of all dates appearing in the Date Field.

2. The Grand Total for the Hours worked Field should be a summation of the total hours worked on last day across all machine types.  A conventional Pivot Table shows the Maximum of all hours worked appearing in the Hours worked Field.

3. The biggest problem of them all is that there is no way to give a criteria as the Last day for that machine for computing another Field in the Pivot Table.  Please refer the file for a better understanding.

This problem can be solved using the PowerPivot.  You may refer to my solution in this workbook.

Sales data modelling and interactive visualisations

{30 Comments}

Visualise Sales Data of a Non-Alcoholic Beverage Company with basic columnar information such as Date of Sale, Time of Sale, Brand, Stock Keeping Unit (SKU), State, City, Quantity sold, Unit Price and Salesman Code.  In this sales dataset, each line item represents one visit for one SKU.  If nothing is sold in a certain visit, then the SKU column displays No Sale.  So effectively there is a line item for each visit whether or not something is sold in that visit.

From this simple Sales dataset, here are a few questions which one may need to find answers to:

1. How did the Company perform (in both years 2013 and 2014) on two of the most critical Key Performance Indicators (KPI's) - Quantity sold and Number of Visits.  Also, what is the month wise break up of these two KPI's.

2. Study and slice the two KPI's from various perspectives such as "Type of Outlet visited", "Type of Visit" - Scheduled or Unscheduled, "Day of week", "Brand", "Sub brand".

3. Over a period of time, how did various SKU's fair on the twin planks of "Effort" i.e. Number of visits YTD and "Business Generated" i.e. Quantity sold YTD.

4. Analyse the performance of the Company on both KPI's:
a. During Festive season/Promotional periods/Events; and
b. During different months of the same year; and
c. During same month of different years; and
d. Quarter to Date

5. "Complimentary Product sold Analysis" - Analysis displayed on online retailers such as Amazon.com - "Customers who bought this also bought this".  So in the Sales dataset referred to above, one may want to know "In this month, outlets which bought this SKU, also bought this much quantity of these other SKU's."

6. "Outlet Rank slippage" - Which are the Top 10 Outlets in 2013 and what rank did they maintain in 2014.  What is the proportion of quantity sold by each of the Top 10 outlets of 2013 to:
a. Total quantity sold by all Top 10 outlets in 2013; and
b. Total quantity sold by all outlets in 2013

7. In any selected month, which new outlets did the Company forge partnerships with

8. Which employees visited their assigned outlets once in two or three weeks instead of visiting them once every week (as required by Management).

9. Which outlets were not visited at all in a particular month

10. Business generated from loyal Customers - Loyal Customers are those who transacted with the Company in a chosen month and in the previous 2 months.

These are only a few of my favourite questions which I needed answers to when I first reviewed this Sales Data.  Using Microsoft Excel's Business Intelligence Tools (Power Query, PowerPivot and Power View), I could answer all questions stated above and a lot more.

You may watch a short video of my solution here

Perform different calculations in the Subtotal/Grand Total column of a Pivot Table

{33 Comments}

Visualise a Pivot Table with a few Fields dragged in the Report filter, Row labels and Value Area section.  In the Column labels are two fields, Month and then Year - so in the column labels, for every month, there is data for three years 2005, 2006 and 2007.  For some months, there is data for two years only 2005 and 2006.  In the Value area section are fields such as Net Amount, Quantity, Bonus etc. and the summarization function applied to them is SUM.  There is no complication in creating the Pivot Table described above.

The actual requirement is to customise the Subtotal column of the Pivot Table as follows:

1. For the monthly subtotals, the Net Amount and Bonus figure are to computed as a difference of 2005 and 2006 i.e. SUM of quantity of 2005 - SUM of quantity of 2006.  The Grand total column should be a a summation of individual subtotals.
2. Average Selling price for every year is to be computed as as Net Amount/(Ttl Bonus + Quantity).  For the monthly subtotals, the figure is to be computed as

=(Net Amount of 2005/((Bonus of 2005+Quantity of 2005)) - (Net Amount of 2006/((Bonus of 2006+Quantity of 2006))

The Grand Total column is to be left blank for Average Selling Price,

As you can observe, the subtotal column (for the months) will have different formulas running for different Fields.

A conventional Pivot Table does not allow one to have custom formulas in the Subtotal columns.  I have been able to resolve this problem by using the free Power Pivot add-in from Microsoft for Excel 2010 and higher versions.

You may refer to my solution in this workbook.

Here's another example.  Assume a dataset with three columns - Date, Manager and Amount.  There are repetitions in the Data and Manager column.  One may want to know the maximum amount per month per Manager.  While this is easy to accomplish with a Conventional Pivot Table as well, the problem occurs in the Subtotal/Grand Total cells of a Pivot Table.  The Subtotal/Grand Total cells assume the same function as has been used in the "Summarise Values field by".  So, while in the "Summarise values fields by" section, one may want to use the Maximum function, in the subtotal cell, one may want to use the sum function.

You may refer to my solution in this workbook.