Tags: VALUES

Show sales only for corresponding months in prior years

{0 Comments}

Refer to this simple Sales dataset

untitled

The objective is to create a simple matrix with months in the row labels, years in the column labels and sales figures in the value area section.  The twist in the question is that for years prior to the current year (2018 in this dataset), sales should only appear till the month for which there is data for the current year.  For e.g., for 2018, data is only till Month 4 and therefore for prior years as well, data should only appear till Month 4.  As and when Sales data gets added below row 17, data for prior years should also go up to that month.

The expected result is

untitled1

You may download my PBI file from here. The same solution can be obtained in Excel as well (using Power Query and PowerPivot).

Filtering on 2 date fields within one Table

{0 Comments}

This table contains a list of all the inspections created and completed within different time periods.

untitled

The objective is to create two Tables from this single table - one showing the Accounts created within the chosen time period and another showing the those that were closed within the same time period.  Here are screenshots of the expected results.

untitled1 untitled2 untitled3 untitled4

You may download my PowerBI desktop solution workbook from here.  The same solution can be obtained in Excel as well (using Power Query and PowerPivot).

Compute transaction fee based on a tiered pricing model

{0 Comments}

Consider a simple dataset as shown below:
untitled
For each tier, the tier rate is incrementally applied to the volume within the tier volume range.  Given the following transaction volumes, one may want to compute the transaction fee
untitled1
The expected result is shown below
untitled2

As one can observe, for a transaction value of 400,000, the fee has been computed as 3% on the first 100,000 and 2.5% on the next 300,000. You may download my solution workbook from here. In the file, I have shared 2 solutions - a conventional formula based one and a PowerPivot solution.

I have also solved a similar question here.

Perform an aggregation on Top x items after satisfying certain conditions

{0 Comments}

Visualise a 5 column dataset as show below.  This is a very small sample of the actual dataset.  It shows the date on which supplies were received for each item from Vendors and whether those supplies had errors in them.  Finally those identified errors have been bucketed into relevant categories.  The Item ID# is a code assigned by the Client Organisation.  The format of the ID# is X [Last two digits of year of receipt] [Four digit Number].  For items received from a certain vendor, the last four digits of any item will always be in ascending order of the Date received.  In other words. for items received from a certain vendor, the last 4 digits of an item received on 13 January 2017 will always be greater than the last 4 digits of an item received on 11 January 2017 from the same vendor.  There will never be repetitions in the Item ID# column.

Vendor Name Item ID# Date Received Error with Item? Error Category
California X170016 1-16-2017 No
California X170014 1-13-2017 Yes Labeling Error
California X170015 1-13-2017 Yes Packaging Error
California X170008 1-9-2017 Yes Quality Issue
California X170003 1-2-2017 No
California X160645 12-26-2016 Yes Packaging Error
California X160646 12-26-2016 No
California X160644 12-25-2016 Yes Labeling Error
California X160638 12-20-2016 Yes Quality Issue
California X160633 12-15-2016 No
California X160626 12-8-2016 No
California X160625 12-7-2016 Yes Packaging Error
California X160624 12-5-2016 Yes Labeling Error
California X160618 11-23-2016 Yes Quality Issue
California X160613 11-13-2016 No
California X160606 10-30-2016 No

The objective is to compute the error rate by vendor and Error category for the 10 most recent transaction dates with that specific vendor.  So, for vendor Name "California" and  Error category as "Packing Error", this ratio should be computed as = Number of packing Errors on 10 most recent dates/10.

Here is a simple snapshot of the Data for California.  I have filtered the dataset where Vendor Name is California and then sorted the Date received column in descending order.  Please note that when i filter the dataset on California, a lot more rows are returned.  I am only showing the Top 10 rows here because that is what is important for solving this question.

Vendor Name Item ID# Date Received Error with Item? Error Category
California X170016 1-16-2017 No
California X170014 1-13-2017 Yes Labeling Error
California X170015 1-13-2017 Yes Packaging Error
California X170008 1-9-2017 Yes Quality Issue
California X170003 1-2-2017 No
California X160645 12-26-2016 Yes Packaging Error
California X160646 12-26-2016 No
California X160644 12-25-2016 Yes Labeling Error
California X160638 12-20-2016 Yes Quality Issue
California X160633 12-15-2016 No

The expected result is:

Row Labels Labeling Error Packaging Error Quality Issue Factory Error
Boise 30.00%
California 20.00% 20.00% 20.00%
Detroit 70.00% 30.00%
Ekalaka 20.00% 20.00%

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Compute the average of values against the 5 most recent dates of each Category

{0 Comments}

Here is a simple 3 column dataset showing Categories, Date and Value

Catagorie Date Value
Fish 08-12-2015 6
Crab 05-12-2015 7
Crab 04-12-2015 6
Bird 27-11-2015 4
Snow 25-11-2015 10
Cat 21-11-2015 7
Dog 12-11-2015 5
Dog 28-10-2015 5
Fish 12-10-2015 3
Bird 11-10-2015 9
Dog 22-09-2015 9
Crab 17-08-2015 8
Cat 11-08-2015 1
Fish 20-07-2015 5
Crab 03-07-2015 2
Fish 02-06-2015 8
Fish 26-05-2015 9
Dog 14-05-2015 4
Snow 07-05-2015 7
Bird 03-05-2015 9
Cat 20-04-2015 10
Cat 15-04-2015 2
Snow 13-04-2015 3
Crab 29-03-2015 5
Crab 23-03-2015 2
Bird 14-03-2015 5
Cat 14-03-2015 1
Dog 26-02-2015 9
Fish 16-02-2015 4
Fish 08-02-2015 6
Bird 18-01-2015 1
Snow 06-01-2015 10

The objective is to compute category wise average of values against the 5 most recent dates.  So the expected result is:

Categorie Category wise average values on 5 most recent dates
Bird 5.60
Cat 4.20
Crab 5.60
Dog 6.40
Fish 6.20
Snow 7.50

So for the Category of Fish (dates are sorted in descending order), the average should be 6.2 (average of 6,3,5,8,9)

Catagorie Date Value
Fish 08-12-2015 6
Fish 12-10-2015 3
Fish 20-07-2015 5
Fish 02-06-2015 8
Fish 26-05-2015 9
Fish 16-02-2015 4
Fish 08-02-2015 6

I have solved this problem using the PowerPivot. You may download my solution workbook from here.

Determine the total number of projects by Status

{0 Comments}

Here's a simple 3 column table showing Date, Project name (Cat.) and Status of the project.  Each project can have multiple status entries on different dates.  So as you can observe, project "alpha_9383993" was In Progress on Oct 2, 2017, remained so on October 5, 2017 but was completed on October 6, 2017.

Date Cat. Status
02-Oct-17 alpha_9383993 In Progress
03-Oct-17 Pulse_9387388 In Progress
04-Oct-17 Pulse_9387388 Rework
05-Oct-17 alpha_9383993 In Progress
06-Oct-17 alpha_9383993 Completed
07-Oct-17 Pulse_9387388 Completed
08-Oct-17 Oppo_tes_9383 In Progress
09-Oct-17 Oppo_Max_8977 Rework

The objective is to determine the count of projects by Status as per the most recent status of every project.  So the expected result is:

Row Labels measure 2
Completed 2
In Progress 1
Rework 1

The result for In Progress should be one because there is only one such project - Oppo_tes_9383.  Project alpha_9383993 should not be counted because it was completed on October 6, 2017.  Likewise the result for Rework should be one because there is only one such project - Oppo_Max_8977.  Project Pulse_9387388 should not be counted because it was completed on October 7,2017.

I have solved this problem with the PowerPivot.  You may download my solution workbook from here.

In a Pivot Table, show the most frequently appearing text entry by a certain parameter

{0 Comments}

Here's a simple two column dataset

Comment Identifier Intervals
A 3pm-6pm
A 9pm-12pm
S 3pm-6pm
S 3pm-6pm
S 9pm-12pm
A 9pm-12pm
S 9pm-12pm
D 3pm-6pm
A 9pm-12pm
A 9pm-12pm
A 9pm-12pm
A 3pm-6pm
A 3pm-6pm

For identifiers listed in column A, there are time intervals in column B. Note that for a certain identifier, a time interval can appear multiple times. The objective is two-fold:

  1. For each identifier, show the time interval which appears most frequently; and
  2. For each identifier, compute the count of the time interval which appears most frequently

The expected result is:

untitled

As you can observe, the result for S is two time periods - 3pm-6pm and 9pm-12pm.  This is because each of them appears twice.

You may download my solution from here.

Alter the behaviour of a filter/slicer from OR to AND

{3 Comments}

Here is a simple two column dataset which shows EmpID in column A and DateWorked in column B.  My objective is to filter the table to show those employees who worked ONLY on August 17 and August 18.  I'd like to exclude employees who:

  1. Did not work on both these days; and
  2. Worked on both these dates but also worked on other dates

A simple filter on the DatesWorked column with the criteria of August 17 and August 18, would return 7 employees.  This is clearly wrong because this will show employees who:

  1. Worked on any of the two chosen dates; and
  2. Worked on dates other than the two chosen dates

From Table 1, one can clearly observe that the correct result should be EmpID E.  Only EmpID E worked on August 17 and 18.  While others also worked on both these dates, they worked on other dates as well and should therefore be excluded.

The problem outlined above holds true for selections made via a slicer in a Pivot Table as well.  So this is a generic problem with a filter/slicer - conditions specified therein behave like OR conditions, not like AND conditions.

Base dataset

EmpID DateWorked
A 14-08-2017
A 15-08-2017
A 16-08-2017
A 17-08-2017
A 18-08-2017
B 15-08-2017
B 16-08-2017
B 18-08-2017
C 14-08-2017
C 15-08-2017
C 16-08-2017
C 17-08-2017
C 18-08-2017
D 14-08-2017
D 15-08-2017
D 16-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 16-08-2017
F 17-08-2017
G 18-08-2017

Filtered dataset (after specifying the criteria as 17 Aug and 18 Aug)

EmpID DateWorked
A 17-08-2017
A 18-08-2017
B 18-08-2017
C 17-08-2017
C 18-08-2017
D 17-08-2017
D 18-08-2017
E 17-08-2017
E 18-08-2017
F 17-08-2017
G 18-08-2017

So the question is "How does one change the behavior of a filter/slicer from OR to AND?". To make things worse, the data is arranged as follows:

EmpID DateWorked
A 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
B 15/8/17;16/8/17;18/8/17
C 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
D 14/8/17;15/8/17;16/8/17;17/8/17;18/8/17
E 17/8/17;18/8/17
F 16/8/17;17/8/17
G 18-08-17

As you can see, there are multiple dates in a single cell separated by a ;.

You may refer to my Power Query and PowerPivot solution here.

Quantify combination courses opted by students

{0 Comments}

Assume a dataset with two columns which lists down the student names in column A and courses opted for in column B.  Since one student can opt for multiple courses and the same course can be taken up by multiple students, there can be repetitions in both columns.  The objective is to create a matrix like data structure (with courses appearing in both row and column labels) with numbers inside the matrix quantifying the "Number of students who opted for course A and C".  So, for all possible course combinations, one may want to know the number of students who opted for those combinations.

The description above can be extended to cases where buying behavior has to be analysed.  A sore manager may want to know "How many people who buy Brand A also buy Brand B."

Here's a snapshot of the source data and expected result

The number 1 in cell H4 (and cell F6) means that there is only one student who opted for courses B and D.  Likewise, 3 in cell H5 (and cell G6) means that 3 students opted for courses C and D.

You may refer to my Power Query and PowerPivot solution in this workbook.  Power Query has been used for generating a dynamic list of Courses and Power Pivot has been used for writing the DAX formula for quantifying within the matrix.

Compute month wise pending audits

{0 Comments}

Assume a three column dataset showing Audit ID, Date of receipt of audit mandate and Date of audit completion.  There are other columns as well but they are not important for our Analysis.  One may want to compute the following month wise:

1. Which (Audit ID) are the audits pending at the end of every month; and
2. When (Date of receipt of audit mandate) was the mandate for these pending audits received; and
3. Ageing of these pending audits i.e. this would be computed as the last date of the month less Date of receipt of audit mandate

Here's an example:

In January 2014, there are a total of 10 audits reports which were received (Filter "Date of receipt of audit mandate" column on January 2014).  Of these 10 audits, 4 were completed in January 2014 (Filter "Date of audit completion" column on January 2014) itself and therefore there are 6 pending audits.  To this figure of 6, we need to add the audits pending from previous months.  If one filters column "Date of receipt of audit mandate" column on Oct-Dec 2013 and "Date of audit completion" column on dates after January 2014, 8 rows will appear.  This means that there are 8 audits which were received before 1 January 2014 but were completed only after 31 January 2014.  So the total number of pending audits as at 31 January 2014 are 8+6=14.  This task needs to be carried out for all months.

You may refer to my solution in this workbook.