Tags: ADDCOLUMNS

Determine latest condition of each equipment and show a month wise count

{0 Comments}

There are 100 machines in a factory.  Every machine has different test frequency. In a given month, not every machine is tested but we still have the last known rating (from some previous month) of that machine.  We have to show the latest rating of each machine for each month in a stacked column chart. This way, the total number will remain 100 every month in the chart, but the rating distribution (color based on legend) will change based on last available rating of that machine.

For example, in January, 35 machines were tested. So we have latest ratings of these 35 machines. But as the rest of the machines also have some previous rating, the graph needs to show all 100, with last available rating.

The expected result should look like this

You may download my PBI Desktop file from here.  The very same DAX formulas can be written in the DAX formula language of MS Excel as well.

Segment towns according to volume contribution and market share with a slicer

{0 Comments}

This post is an extension to the one I posted here - Segment towns according to volume contribution and market share. Here's a simple dataset of Shampoo sales in the state of Rajasthan, India.

For a chosen segment, one may want to segment the 4 towns based on the following conditions:
Based on the two screenshots shared above, the desired result is shown in the screenshot below:
The difference between this solution at the previous one (the link of which I have shared above) is that in this one we want to drag the Classification (range E16:E17) to either the row/column/report filter section of the Pivot Table use it as a slicer.  The current limitation with measures that one writes in PowerPivot's is that measures cannot be used in either row/column/report filter section or as a slicer of/in a Pivot Table.  So in the previous solution, I had written a measure to return the result as Headroom, Stronghold, Emerging or small in only the value area section of the Pivot Table.  One could not drag that measure into the row labels of a Pivot Table.  In this solution, one can drag the Town classification to the row/column/report filter section or even to the slicer (see images below)
You may download my solution workbook from here.

Segment customers into dynamic buckets

{0 Comments}

Consider a 4 column table - Respondent ID, Device ID, App Name and Category.  So this dataset shows which apps are installed on which device ID by which user and which category do the apps fall into.  It is a small dataset with only 4 columns and 2,000 rows.

The question on this dataset is - "I would like to segment the total user base by Categories into the following 9 buckets:

  1. Those who only have 1 app installed; and
  2. Those who have 2 apps installed; and
  3. Those who have 3 apps installed; and
  4. Those who have 4 apps installed; and
  5. Those who have 5 apps installed; and
  6. Those who have 6 apps installed; and
  7. Those who have 7 apps installed; and
  8. Those who have 8 - 10 apps installed; and
  9. Those who have 10+ apps installed

The expected result is a Pivot Table with buckets in the column labels, Categories in the row labels and number of people in the value area section (as shown below)

Here's how one can interpret the Pivot Table shown above:

  1. Cell B50 - There are 75 people who only have 1 "Tool" app installed
  2. Cell J44 - There is just 1 person who has 10+ Photography apps installed.

I have solved this problem using Power Query and PowerPivot.  Since these two Business Intelligence (BI) tools are available in PowerBI desktop (PBI) as well, you may download a folder with both files (the MS Excel workbook and PBI file) from here.

In a Pivot Table, show the most frequently appearing text entry by a certain parameter

{0 Comments}

Here's a simple two column dataset

Comment Identifier Intervals
A 3pm-6pm
A 9pm-12pm
S 3pm-6pm
S 3pm-6pm
S 9pm-12pm
A 9pm-12pm
S 9pm-12pm
D 3pm-6pm
A 9pm-12pm
A 9pm-12pm
A 9pm-12pm
A 3pm-6pm
A 3pm-6pm

For identifiers listed in column A, there are time intervals in column B. Note that for a certain identifier, a time interval can appear multiple times. The objective is two-fold:

  1. For each identifier, show the time interval which appears most frequently; and
  2. For each identifier, compute the count of the time interval which appears most frequently

The expected result is:

untitled

As you can observe, the result for S is two time periods - 3pm-6pm and 9pm-12pm.  This is because each of them appears twice.

You may download my solution from here.

Data slicing and analysis with the Power Pivot

{0 Comments}

Visualise an MS Excel file with two worksheets:

  1. Employee headcount – a multi column dataset with information such as Employee code, Date of Joining, Age, Division, Department and Location.  Each row represents data for one employee.  The number of rows on this worksheet is approximately 700.
  2. Training Data - a multi column dataset with information such as Employee code, Training Date from, Training Date to, Training Program Name, Training Program Category (Internal and External), Training Location and Training Service Provider.  Each row represents one training attended by one employee.  The number of rows on this worksheet is approximately 2,600.

Let’s suppose that the training calendar of this company runs from July to June.  Some questions (only few mentioned for illustration purposes) which a Training Manager may need answers to are:

1)   How may unique employees were trained each year; and
a)   Of the unique employees trained, how many were first time trainees and how many were repeat trainees
i)   Of the first time trainees:
(1)    How many joined this year
(2)    How many joined in past years
ii)  Of the first time trainees:
(1)    How many were trained within the first year of joining
(2)    How many were trained in the second year of joining
(3)    How many were trained in the third year of joining
(4)    How many were trained after three years of joining
iii)  Of the repeat trainees:
(1)    What is the average gap (in days) between trainings
(2)    What is the minimum gap (in days) between trainings
(3)    What is the maximum gap (in days) between trainings

Getting answers to the questions mentioned above would entail writing a lot of lookup related formulas, applying filters, copying and pasting and then creating Pivot Tables.  While the example taken above is that of a training database, you may envision “drilling down to and slicing” any dataset – Marketing, Sales, Purchase etc.

You may watch a short video of my solution here

In these two workbooks, you will be able to see the level to which one can drill down and analyse data using the Power Pivot add-in.  When you open this workbook, please go the first worksheet and make the relevant choice of MS Excel version first so that you start looking at the Analysis from the correct worksheet.

1. Analysing Training data of a company; and
2. Analysing Sales data of a company

You will be able to see the analysis in these workbooks only if you are using one of the following versions of MS Office:

1. Excel 2013 Professional Plus; or
2. Excel 2010 with the Power Pivot add-in installed.  Power Pivot is a free add-in from Microsoft which can be downloaded from here.

Lastly, if you are using the Power Pivot add-in in Excel 2010, you will not be able to see the underlying Data Model or the calculated Field formulas because this workbook has been created in Excel 2013 Professional Plus and unfortunately the Power Pivot model is not backward compatible.  However, all the analysis performed in this workbook can be performed in Excel 2010 as well (with the Power Pivot add-in installed).

Compute year wise weighted average on a large dataset

{2 Comments}

Assume a dataset with a Key Performance Indicator (KPI) [appearing in one column] data for years ranging from 1985 to 2010 for 114 countries.  This dataset has 170,000 rows of data and one row below the last row for every country, there is a total of the KPI column.  So, if there are 25 rows for India, then in the 26th row, there will be the total appearing for numbers in the KPI column.  The same is occurring for other countries as well.

There is another dataset (in another worksheet of the same workbook) which has an index value for the same countries and same date range as the first dataset.  The second dataset is relatively smaller (with only 1315 rows) because the index value is not available for all years of each country.

The objective is to determine the year wise (for all years from 1985 to 2010) weighted average of KPI.  An illustration of the weighted average computation has been shown in range F5:H10 of the "Index" worksheet of the workbook link shared below.

Solving this problem using Pivot Table, filters, formulas will slow down processing speed due to sheer size of data.  I have solved this problem using the Power Pivot tool (for Excel 2010 and higher versions).

You may refer to my solution in this workbook.