Tags: DISTINCTCOUNT

Data slicing and analysis with the Power Pivot

{0 Comments}

Visualise an MS Excel file with two worksheets:

  1. Employee headcount – a multi column dataset with information such as Employee code, Date of Joining, Age, Division, Department and Location.  Each row represents data for one employee.  The number of rows on this worksheet is approximately 700.
  2. Training Data - a multi column dataset with information such as Employee code, Training Date from, Training Date to, Training Program Name, Training Program Category (Internal and External), Training Location and Training Service Provider.  Each row represents one training attended by one employee.  The number of rows on this worksheet is approximately 2,600.

Let’s suppose that the training calendar of this company runs from July to June.  Some questions (only few mentioned for illustration purposes) which a Training Manager may need answers to are:

1)   How may unique employees were trained each year; and
a)   Of the unique employees trained, how many were first time trainees and how many were repeat trainees
i)   Of the first time trainees:
(1)    How many joined this year
(2)    How many joined in past years
ii)  Of the first time trainees:
(1)    How many were trained within the first year of joining
(2)    How many were trained in the second year of joining
(3)    How many were trained in the third year of joining
(4)    How many were trained after three years of joining
iii)  Of the repeat trainees:
(1)    What is the average gap (in days) between trainings
(2)    What is the minimum gap (in days) between trainings
(3)    What is the maximum gap (in days) between trainings

Getting answers to the questions mentioned above would entail writing a lot of lookup related formulas, applying filters, copying and pasting and then creating Pivot Tables.  While the example taken above is that of a training database, you may envision “drilling down to and slicing” any dataset – Marketing, Sales, Purchase etc.

You may watch a short video of my solution here

In these two workbooks, you will be able to see the level to which one can drill down and analyse data using the Power Pivot add-in.  When you open this workbook, please go the first worksheet and make the relevant choice of MS Excel version first so that you start looking at the Analysis from the correct worksheet.

1. Analysing Training data of a company; and
2. Analysing Sales data of a company

You will be able to see the analysis in these workbooks only if you are using one of the following versions of MS Office:

1. Excel 2013 Professional Plus; or
2. Excel 2010 with the Power Pivot add-in installed.  Power Pivot is a free add-in from Microsoft which can be downloaded from here.

Lastly, if you are using the Power Pivot add-in in Excel 2010, you will not be able to see the underlying Data Model or the calculated Field formulas because this workbook has been created in Excel 2013 Professional Plus and unfortunately the Power Pivot model is not backward compatible.  However, all the analysis performed in this workbook can be performed in Excel 2010 as well (with the Power Pivot add-in installed).

Story telling with Excel Power BI

{6 Comments}

With Power Business Intelligence (BI) tools of Excel 2013, one can metamorphose raw data and/or results of complex calculations into stunning and interactive visualizations.  Power View (one of the four components of Power BI) allows one to create a PPT like flow in Excel thus allowing one to weave a story.  To be able to interact with/create visualizations, you will need to install Microsoft Office Professional Plus 2013 (this version will already have two of the four components of Power BI - PowerPivot and Power View).  Additionally, you will have to install the following add-ins from Microsoft (the other two components of Power BI)

1. Power Query; and
2. Power Map

I have tried to showcase the prowess of Power BI tools of Excel 2013 in these two workbooks:

1. An overview of the BRIC Economies
2. Sales data analysis

You may watch a video of my work at this link

Compute configuration count using Set Theory and Venn Diagrams

{0 Comments}

Assume a table which lists attendees for a Company's Annual day function.  In this Table, data for every attendee is shown on a separate row so if an employee attends the function with his/her spouse and three children, then there will be 5 rows for that employee.

The question is to determine the count of the following family configuration:

1. Employees only (those who attended without spouse and children); and
2. Employees, spouse and children (Family); and
3. Employees and spouse (no children); and
4. Employees and children (no spouse)

You may refer to my solution in this workbook.  I have solved this problem using:

1. MS Excel Formulas based on Set Theory and Venn Diagram; and
2. PowerPivot

Count unique values with conditions on large databases

{8 Comments}

Given a database of 50,000 rows, counting unique values with conditions using formulas would either adversely effect workbook performance or would not work in the first place at all.

In this workbook, I have shown the technique to count unique values with conditions on a large database

1. Using PowerPivot - Will only work in Excel 2010 and higher versions

2. Using a  normal Pivot Table and SUMPRODUCT() function - Will work for all versions but is not as efficient as the PowerPivot solution.

To count unique values with conditions on small databases, you may refer to the following link

Remove duplicates after satisfying additional conditions

{3 Comments}

Assume a two column database of patient ID's and service availed.  One patient may avail the same service multiple times in a year due to which that record may appear as many times as the service is availed.  For e.g., if patient A001 avails the Radiology service twice, then A001 and Radiology will appear in two rows.

Once may want to create the following two reports from this database:

1. A list containing all those records where the patient availed just one service; and
2. A list containing all those records where the patient availed more than one service

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the PowerPivot add-in, then a calculated column formula can resolve this problem.

Solution for all versions of MS Excel

I have shared two solutions here:

1. Array formula and advanced filters; and
2. Only array formulas

For better understanding of the question and to view the final solution, please refer to this workbook.