Tags: FILTER

Data slicing and analysis with the Power Pivot

{0 Comments}

Visualise an MS Excel file with two worksheets:

  1. Employee headcount – a multi column dataset with information such as Employee code, Date of Joining, Age, Division, Department and Location.  Each row represents data for one employee.  The number of rows on this worksheet is approximately 700.
  2. Training Data - a multi column dataset with information such as Employee code, Training Date from, Training Date to, Training Program Name, Training Program Category (Internal and External), Training Location and Training Service Provider.  Each row represents one training attended by one employee.  The number of rows on this worksheet is approximately 2,600.

Let’s suppose that the training calendar of this company runs from July to June.  Some questions (only few mentioned for illustration purposes) which a Training Manager may need answers to are:

1)   How may unique employees were trained each year; and
a)   Of the unique employees trained, how many were first time trainees and how many were repeat trainees
i)   Of the first time trainees:
(1)    How many joined this year
(2)    How many joined in past years
ii)  Of the first time trainees:
(1)    How many were trained within the first year of joining
(2)    How many were trained in the second year of joining
(3)    How many were trained in the third year of joining
(4)    How many were trained after three years of joining
iii)  Of the repeat trainees:
(1)    What is the average gap (in days) between trainings
(2)    What is the minimum gap (in days) between trainings
(3)    What is the maximum gap (in days) between trainings

Getting answers to the questions mentioned above would entail writing a lot of lookup related formulas, applying filters, copying and pasting and then creating Pivot Tables.  While the example taken above is that of a training database, you may envision “drilling down to and slicing” any dataset – Marketing, Sales, Purchase etc.

You may watch a short video of my solution here

In these two workbooks, you will be able to see the level to which one can drill down and analyse data using the Power Pivot add-in.  When you open this workbook, please go the first worksheet and make the relevant choice of MS Excel version first so that you start looking at the Analysis from the correct worksheet.

1. Analysing Training data of a company; and
2. Analysing Sales data of a company

You will be able to see the analysis in these workbooks only if you are using one of the following versions of MS Office:

1. Excel 2013 Professional Plus; or
2. Excel 2010 with the Power Pivot add-in installed.  Power Pivot is a free add-in from Microsoft which can be downloaded from here.

Lastly, if you are using the Power Pivot add-in in Excel 2010, you will not be able to see the underlying Data Model or the calculated Field formulas because this workbook has been created in Excel 2013 Professional Plus and unfortunately the Power Pivot model is not backward compatible.  However, all the analysis performed in this workbook can be performed in Excel 2010 as well (with the Power Pivot add-in installed).

Perform different calculations in the Subtotal/Grand Total column of a Pivot Table

{33 Comments}

Visualise a Pivot Table with a few Fields dragged in the Report filter, Row labels and Value Area section.  In the Column labels are two fields, Month and then Year - so in the column labels, for every month, there is data for three years 2005, 2006 and 2007.  For some months, there is data for two years only 2005 and 2006.  In the Value area section are fields such as Net Amount, Quantity, Bonus etc. and the summarization function applied to them is SUM.  There is no complication in creating the Pivot Table described above.

The actual requirement is to customise the Subtotal column of the Pivot Table as follows:

1. For the monthly subtotals, the Net Amount and Bonus figure are to computed as a difference of 2005 and 2006 i.e. SUM of quantity of 2005 - SUM of quantity of 2006.  The Grand total column should be a a summation of individual subtotals.
2. Average Selling price for every year is to be computed as as Net Amount/(Ttl Bonus + Quantity).  For the monthly subtotals, the figure is to be computed as

=(Net Amount of 2005/((Bonus of 2005+Quantity of 2005)) - (Net Amount of 2006/((Bonus of 2006+Quantity of 2006))

The Grand Total column is to be left blank for Average Selling Price,

As you can observe, the subtotal column (for the months) will have different formulas running for different Fields.

A conventional Pivot Table does not allow one to have custom formulas in the Subtotal columns.  I have been able to resolve this problem by using the free Power Pivot add-in from Microsoft for Excel 2010 and higher versions.

You may refer to my solution in this workbook.

Here's another example.  Assume a dataset with three columns - Date, Manager and Amount.  There are repetitions in the Data and Manager column.  One may want to know the maximum amount per month per Manager.  While this is easy to accomplish with a Conventional Pivot Table as well, the problem occurs in the Subtotal/Grand Total cells of a Pivot Table.  The Subtotal/Grand Total cells assume the same function as has been used in the "Summarise Values field by".  So, while in the "Summarise values fields by" section, one may want to use the Maximum function, in the subtotal cell, one may want to use the sum function.

You may refer to my solution in this workbook.

Compute year wise weighted average on a large dataset

{2 Comments}

Assume a dataset with a Key Performance Indicator (KPI) [appearing in one column] data for years ranging from 1985 to 2010 for 114 countries.  This dataset has 170,000 rows of data and one row below the last row for every country, there is a total of the KPI column.  So, if there are 25 rows for India, then in the 26th row, there will be the total appearing for numbers in the KPI column.  The same is occurring for other countries as well.

There is another dataset (in another worksheet of the same workbook) which has an index value for the same countries and same date range as the first dataset.  The second dataset is relatively smaller (with only 1315 rows) because the index value is not available for all years of each country.

The objective is to determine the year wise (for all years from 1985 to 2010) weighted average of KPI.  An illustration of the weighted average computation has been shown in range F5:H10 of the "Index" worksheet of the workbook link shared below.

Solving this problem using Pivot Table, filters, formulas will slow down processing speed due to sheer size of data.  I have solved this problem using the Power Pivot tool (for Excel 2010 and higher versions).

You may refer to my solution in this workbook.

Story telling with Excel Power BI

{6 Comments}

With Power Business Intelligence (BI) tools of Excel 2013, one can metamorphose raw data and/or results of complex calculations into stunning and interactive visualizations.  Power View (one of the four components of Power BI) allows one to create a PPT like flow in Excel thus allowing one to weave a story.  To be able to interact with/create visualizations, you will need to install Microsoft Office Professional Plus 2013 (this version will already have two of the four components of Power BI - PowerPivot and Power View).  Additionally, you will have to install the following add-ins from Microsoft (the other two components of Power BI)

1. Power Query; and
2. Power Map

I have tried to showcase the prowess of Power BI tools of Excel 2013 in these two workbooks:

1. An overview of the BRIC Economies
2. Sales data analysis

You may watch a video of my work at this link

Compute configuration count using Set Theory and Venn Diagrams

{0 Comments}

Assume a table which lists attendees for a Company's Annual day function.  In this Table, data for every attendee is shown on a separate row so if an employee attends the function with his/her spouse and three children, then there will be 5 rows for that employee.

The question is to determine the count of the following family configuration:

1. Employees only (those who attended without spouse and children); and
2. Employees, spouse and children (Family); and
3. Employees and spouse (no children); and
4. Employees and children (no spouse)

You may refer to my solution in this workbook.  I have solved this problem using:

1. MS Excel Formulas based on Set Theory and Venn Diagram; and
2. PowerPivot

Computing penalties by Employee, Group and Labour type using a PowerPivot

{0 Comments}

Assume a database of Maximum allowed pay and Actual pay for each employee.  The employees have been further categorized into Groups and labour categories.

The task is to create three Pivot Tables (one each with Employee name, Group and Labour category in the row labels) with the following information in the Value area section:

1. Actual salary
2. Maximum permissible salary
3. OverUnderMax - This is calculated as the difference between Actual Salary and Maximum permissible salary
4. Penalty - Maximum of 0 and OverUnderMax

As can be seen in the Pivot Table worksheets of this workbook, there is a problem with the result of the calculated field formula result in the Grand Total cell.

You may refer to my solution in the PowerPivot worksheets.  Please note that to see my result in the PowerPivot worksheets, you will have to use the PowerPivot add-in for MS Excel 2010/2013.

Count unique values with conditions on large databases

{8 Comments}

Given a database of 50,000 rows, counting unique values with conditions using formulas would either adversely effect workbook performance or would not work in the first place at all.

In this workbook, I have shown the technique to count unique values with conditions on a large database

1. Using PowerPivot - Will only work in Excel 2010 and higher versions

2. Using a  normal Pivot Table and SUMPRODUCT() function - Will work for all versions but is not as efficient as the PowerPivot solution.

To count unique values with conditions on small databases, you may refer to the following link

Calculate a unique count with conditions in a Pivot Table

{12 Comments}

Assume a three column table arranged as follows: Circle, Date of Fault and ID.  Dates in the date range span one week - November 26, 2012 to December 2, 2012.  A particular equipment can be only one specific Region and the same equipment an go faulty multiple times within one week.  Data for one week is about 8,400 rows.

There are three questions to be answered from this data:

1. The Circle wise, count of ID's which went faulty more than twice between November 26, 2012 and December 2, 2012; and
2. The Circle wise, count of faulty instances more than twice between November 26, 2012 and December 2, 2012; and
3. Determine individual sites for 1 and 2 above

The difference between 1 and 2 above is "If a certain ID goes down 4 times, then for question1, the answer should be 1.  For question2, the answer should be 4."

The first question basically boils down to "Count of unique ID's by Circle which went faulty more than twice."

There are two ways one can go about answering the questions above:

Solution A - For Excel 2010 and higher versions - This solution is for those using the PowerPivot MS Excel add-in for Excel 2010 and higher versions.

Solution B - For all versions of MS Excel - This solution will work in all versions of MS Excel but for those using Excel 2010 and higher versions, the PowerPivot solution would be far more efficient.

The steps for creating a pivot table under Solution B for answering both questions above are:

1.  Count of downtime sites.xlsx is saved in a folder on the desktop;
2. Open the workbook, select the data on the Base_Data sheet (including the first row as the header row – it will be range A1:C8741.  Ensure that the header row has some distinctive formatting such as Bold or some colour) and press Ctrl+F3 > New.  In the Name box, type Dummy and click on OK > Close.
3. To cross check that the name assigned above has indeed been assigned correctly, select the data range once again and in the Name box (left of the formula bar), Dummy should appear.
4. Select range A1:C8741 of the Base_Data sheet again and press Ctrl+T to convert this range into a Table.  Ensure that the “My Table has headers” box is checked.  Save the workbook.
5. Open a new worksheet and go to Data > From Other Sources > From Microsoft Query
6. Under Databases, select Excel files > OK
7. In the Directories dialog box, navigate to the folder on the desktop where the workbook file is saved.  So for me, it is saved under C:\Users\Ashish\Desktop\ and double click on the folder where the workbook is saved.
8. In the left hand side window, select the Count of downtime sites.xlsx file and click on OK
9. With Dummy selected, click on the > symbol to bring over all columns of this named range to the right hand side box 10. Click on Next three times
11. Select the option of View Data or Edit Query in Microsoft Query
12. Click on the SQL button, delete the contents in the white space there and paste the following SQL Query

SELECT ucase(dummy.Circle) AS 'Circle', ucase(dummy.Indus_Site_ID) AS 'Indus_site_ID', Count(dummy.Indus_Site_ID) AS 'fault_frequency'
FROM `C:\Users\Ashish\Desktop\Count of downtime sites.xlsx`.dummy dummy
GROUP BY ucase(dummy.Circle), ucase(dummy.Indus_Site_ID)
HAVING (Count(dummy.Indus_Site_ID)>2)

13. Click on OK and on the message box which appears, click on OK
14. Under File, select the last option – Return Data to Microsoft Excel
15. At this stage, if you wish to get data in a tabular form, then select Table.  If you directly want a pivot table, select the second option button – Pivot Table.  For this example, select Pivot Table and in the cell reference box, select any cell where you would like to the result to appear, say cell A1.  Click on OK
16. A counter will run at the bottom left hand side with the title of Reading Data
17. Drag Circle and ID to the to the Row Labels
18. Drag Fault Frequency to the Value Area twice
19. Right click on any one number in the fault frequency column and under Summarise Value by, select Count
20. Right click on any value in the ID column and under Expand/Collapse, select Collapse Entire Field.

Count entries in a range which exclude certain user defined words

{5 Comments}

Assume the following sentences in range B2:B6

B2 - This is an Apple Pie
B3 - An apple a day keeps a doctor away
B4 - These Pears, apples and mangoes are sweet
B5 - In this season, prices of mangoes have increased
B6 - This is a glass of Guava juice

In range B9:B10, type Mango and Apple.  The objective is to count the sentences in range B2:B6 which exclude the words mentioned in range B9:B10.

The array formula (Ctrl+Shift+Enter) for getting the count of exclusions would be:

=SUMPRODUCT(1*(MMULT(TRANSPOSE(1*(ISTEXT($B$9:$B$10))),1*(ISERROR(SEARCH($B$9:$B$10,TRANSPOSE($B2:$B$6)))))=COUNTA($B$9:$B$10)))

The result should be 1.

The array formula (Ctrl+Shift+Enter) for getting the count of inclusions (sentences which include these words), would be:

=COUNTA($B$2:$B$6)-SUMPRODUCT(1*(MMULT(TRANSPOSE(1*(ISTEXT($B$9:$B$10))),1*(ISERROR(SEARCH($B$9:$B$10,TRANSPOSE($B2:$B$6)))))=COUNTA($B$9:$B$10)))

The result should be 4.

One can also solve the problem with the PowerPivot tool.  You may refer to my PowerPivot solution in this workbook.

Analysing customer walkin data by date and service taken

{5 Comments}

Assume data is in range A3:E10.  In A4:A10, various service types are mentioned.  In B3:E3, dates are mentioned from June 1, 2012 to June 4, 2012.  In range B4:E10 are numbers representing number of customers.  One may want to answer the following questions from this data:

1. For every date, total number of customer walkins and total number of services taken; and
2. For every date, new customer walkins and new services taken; and
3. For every date, repeat customer walkins and repeat services taken

While the first and third questions are fairly straight forward to solve, some deliberation would be required for the second question.  A new service type taken on June 3, 2012 would be one that has not been taken by any customer from June 1 - 2, 2012.  So if cell A8 has Service type E and cell D8 (data for June 3, 2012) has 3 (3 customer took service type E on June 3, 2012), then this service should be counted only if there is no figure in range B8:C8 i.e. no customer took this service on June 1 - June 2, 2012.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in and the PowerPivot add-in, then a few simple steps and minimal DAX formulas can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).

Solution for all versions of MS Excel

While this solution works for all versions of MS Excel, it uses an array formula (Ctrl+Shift+Enter).  Array formulas, if used extensively in the workbook, adversely effect the system's performance.

You may refer to my solution in the this workbook.