Compute an average for the same day in the past 3 years

{0 Comments}

Assume a simple two column dataset with dates in column A and numbers in column B. The dates in column A are from January 1, 2013 to December 31, 2016 and numbers in column B are for the period January 1, 2013 to December 31, 2015 (there are no numbers for January 1, 2016 to December 31, 2016).

The objective is to "Compute an average for each day of calendar year 2016. The average should be for the occurrence of that day in the previous 3 years". Here's an example:

1. January 1, 2016 was a Friday (the first Friday of 2016) and is in cell A1097
2. In cell B1097, the average should be computed as: Average of the "First Friday of each of the previous 3 years"
3. January 8, 2016 was a Friday (the second Friday of 2016) and is in cell A1104
4. In cell B1104, the average should be computed as: Average of the "Second Friday of each of the previous 3 years"

I have solved this problem with the help of the PowerPivot. You may refer to my solution in this workbook.

Visualising data flows using Custom Visuals

{0 Comments}

Assume a 4 column dataset (a small sample) as follows:

City of Origin City of destination Mode of Transport Passengers travelled
New Delhi Pune Air 123
New Delhi Mumbai Air 213
New Delhi Kolkata Air 125
Chandigarh Jammu Bus 785
Chandigarh Amritsar Train 567

Given this dataset, one may want answers to the following questions:

1. Of all those passengers who originated their journey (City of Origin) from Chandigarh, how many terminated their journey (City of destination) in New Delhi via different modes of transport; and
2. Of all those passengers who terminated their journey (City of destination) in Jammu, how many arrived in Amritsar (City of Origin) via different modes of transport; and
3. Of all those passengers who travelled by Bus, how many travelled from City A (City of Origin) to City X,Y,Z (City of destination)

While one can analyse/slice and dice this data using Pivot Tables, one cannot visualize this data very clearly (even after creating a Pivot chart).  I have attempted to visualize this data using a software called PowerBI desktop (a free for download and use Business Intelligence software from Microsoft which rolls all of Excel's BI tools into 1 - PowerPivot, Power Query, Power Map and Power View).

You may download the source Excel workbook and the Power BI desktop workbook from this link.

You may also watch a short video here:

 

Sales data modelling and interactive visualisations of an E-Commerce Company

{4 Comments}

In this workbook, I have Sales data of an E-Commerce Company for 3 months.  The typical columns in the base data are:

1. Order Date/Time
2. City to which orders were shipped
3. Order Number
4. Payment Type i.e. Cash on delivery, Net Banking, EMI's
5. Order Status i.e. Delivered or cancelled
6. SKU's which the ordered items fall into
7. Products which the ordered SKU's fall into
8. Categories which the ordered products fall into

Given this simple tabular representation, one may want to analyse and visualize this dataset from multiple perspectives based on user selections, such as

"What was the revenue earned from the Top 5 products in the A100 category in April for orders shipped to New Delhi?"

In this query framed above, the end user should have the leeway to select any/all of the underlined facets.  So one can either choose revenue earned or Number of orders.  Likewise, one can either select Top 5 products or Top 15 products/Top 5 SKU's etc.

With relative ease, one should also be able to "Perform an affiliate analysis" showing which categories are ordered together (to study affiliations).  Please review this post for an independent discussion on "Affinity Analysis".

Furthermore, one should be able to perform a free form timeline search such as  - "I would like to study growth in Total revenue of March 2-8 2015 over Feb 1-4 2015"

You may download the workbook from the link shared above.

You may watch similar videos showcasing the capabilities of Business Intelligence in MS Excel:

1. Analyse Sales data of a Beverage Company
2.
Analyse Training data of a Company

Here's a video showing the capabilities of this Sales data model

You may also watch this short video to see how I visualized the revenue flow from Categories to Shipping cities during different Order periods using Custom visuals available in PowerBI desktop.

Please feel free to download the PowerBI desktop workbook of the video shown above from here.

For a detailed over of Sankey diagrams (a Custom visual available in PowerBI desktop), you may refer to my Blog article here.

Show multiple text entries in one cell of a Pivot Table

{0 Comments}

Imagine a three column dataset as shown below

ID Prize Year
A X 9
A Y 10
B X 9
B Y 9
B Z 10

As you can observe, there are duplicates that appear in each column.  The task it to convert this tabular data structure to a matrix like one with ID's appearing down one column, Years appearing in one row and at each intersecting cell, the prizes should appear.  Please note that there can be multiple prizes for each ID and year combination - ID is B and year is 9 in two rows but the prizes are different.

The expected solution is shown below

ID 9 10
A X Y
B X,Y Z

I have solved this problem by using Power Query.  You may download the workbook from here.

You may also watch a short video here

Summarise data with multiple wildcard OR conditions

{0 Comments}

Imagine a dataset as shown below

Status Text Number
Active bat 1
Passive erq 2
Passive cat 3
Active enm 4
Active dog 5

Suppose the data is in range A3:C7 (headings are in A2:C2).  The objective is to add numbers where the following conditions are met:

1. Status should be Active; and
2. Text should contain either b or c or d

So the result for the dataset above should be 6 i.e. 1+5.  Sounding like a simple case of summarising based on multiple conditions, one may attempt using the SUMIFS() function here.  However, a single SUMIFS() function will not work because a SUMIFS() can only accept multiple AND conditions (not OR conditions).  The only way to make the SUMIFS() function work is to use 3 of them and then add them up.

So this SUMIFS() function will work

=SUMIFS($C$3:$C$7,$A$3:$A$7,A11,$B$3:$B$7,B11)+SUMIFS($C$3:$C$7,$A$3:$A$7,A12,$B$3:$B$7,B12)+SUMIFS($C$3:$C$7,$A$3:$A$7,A13,$B$3:$B$7,B13)

A11:A13 has Active and B11:B13 has *a*,*b*,*c*.

While this looks like a solution, the problem is that there will have to be one SUMIFS() function for each OR condition.  So the more the number of OR conditions, the more the SUMIFS() functions and the lengthier the formula.

In this workbook, I have shared two alternative solution.  One uses the DSUM() function and other uses a combination of MMULT(), TRANSPOSE() and SUMPRODUCT() functions.

Transpose data column wise

{0 Comments}

Here's a simple two dataset:

Project ID Employee
A00 Ashish Mathur
A00 Henry
A00 Paul
A00 Sanjay Singh
L02 Geeta Puri
L02 Ajay Singh
L02 Robert
L02 Mike
L02 Ajay Kumar Venkatesh
L02 Ashish Mathur
L02 Geeta Kukreja
A00 Sameer Mehta
L02 Harish Rai

As one can observe, there are repetitions in the Project ID column.  One may want to transpose this data column wise like this

A00 L02
Ashish Mathur Geeta Puri
Henry Ajay Singh
Paul Robert
Sanjay Singh Mike
Sameer Mehta Ajay Kumar Venkatesh
Ashish Mathur
Geeta Kukreja
Harish Rai

In this workbook, I have solved the problem using Excel formulas and Power Query.

You may watch a short video of my solution here

Compute product wise YTD Revenue from a matrix like/Cross tabular dataset

{0 Comments}

Assume the following data layout

Name Budget April Actual April Budget May Actual May Budget June Actual June
a 2 1 1 4 3
b 4 4 2 1
c 2 3 3 3
d 2 1
e 5 6 4 8 6

As one can observe here, there are two sub columns for each month - Budget and Actual.  From this data layout, we wish to compute the Product wise YTD budget and Actual sales figure.  So for example, if one selects May in a drop down, then the YTD budget for Product B should be 6 and YTD actual for Product B should be 5.

The expected solution should look like this

Month Name May
Products YTD Budget YTD Actual
a 2 2
b 6 5
c 2 3
e 9 14
Grand Total 19 24

You may download my solution workbook from this link.

You may watch a short video of my solution here

Workaround to the problem of creating a Pivot chart after using “% of row total” calculation in a Pivot Table

{0 Comments}

Here is a dashboard created with a Pivot Table, a Pivot chart and slicers (Click to enlarge image).  In the Pivot Table, the % have been computed using "% of row total".

Untitled

The Pivot chart shows two columns per month - one for complete and the other for incomplete.  The objective is to show only the Complete column per month in the Pivot chart.  If one filters the Pivot Table only on Complete, then the Pivot chart shows an unexpected result - each column will go upto 100%.  This happens because all calculations in a Pivot Table happen only on the visible cells.  Once the Incomplete column is hidden, the "% of row total" calculation will return 100% which in turn reflects in the Pivot chart.

I have resolved this problem with the help of the PowerPivot.  You may refer to my solution in this workbook.

Create a daily work schedule

{2 Comments}

Imagine a 4 column dataset as shown below:

Resource Name Start End Projects
Raghav 02-06-2015 05-06-2015 p4
Raghav 02-06-2015 07-06-2015 p5
Raghav 08-06-2015 12-06-2015 p1
Raghav 16-06-2015 19-06-2015 p2
Raghav 22-06-2015 26-06-2015 p3
Ashish 03-06-2015 09-06-2015 p6
Ashish 04-06-2015 07-06-2015 p7

One may want to transform this dataset to the below mentioned one.  There will obviously be a lot more columns to the right (I'm only showing a snip)

Employee Name 01-06-2015 02-06-2015 03-06-2015 04-06-2015 05-06-2015
Raghav p4 & P5 p4 & P5 p4 & P5 p4 & P5
Ashish p6 p6 & P7 p6 & P7

I have solved this problem with the help of Microsoft Power Query. You may download my solution workbook from here.

You may watch a short video of my solution at this link

Perform an “Affinity analysis” to identify co-selling products

{0 Comments}

Affinity analysis encompasses a broad set of Analytic techniques aimed at uncovering the associations and connections between specific objects: these might be visitors to a website (customers or audience), products in a store or content items on a media site. Of these, “market basket analysis” is perhaps the most common example. In a market basket analysis, one analyses combinations of products that frequently co-occur in transactions.  For e.g., Of all the shoppers today, how many purchased a school uniform and a school bag.  A retailer can use this information to:

1. Improve the customer shopping experience by rearranging the store layout (place products that co-occur together close to one another); and
2. Run a focused marketing campaign (e.g. target customers who buy a school uniform with offers on school bags, to encourage them to spend more on their shopping basket)

Online retailers and publishers can use this type of analysis to:

1. Inform the placement of content items on their media sites, or products in their catalogue
2. Drive recommendation engines (like Amazon’s customers who bought this product also bought these products…)
3. Deliver targeted marketing (e.g. emailing customers who bought products specific products with other products and offers on those products that are likely to be interesting to them)

Consider a dataset with four columns - Date, Order Numbers, Items sold and Item Description.  To simplify, let's ignore columns 1 and 4 for now.  One may want to know the answer to the following question:

For a particular item sold (say Item X), in how many orders (order frequency) were items X and Item Y were sold together OR in how many orders (order frequency) were items X and Item Z together.

In the Excel workbook (download link below), number of rows in the Sales data sheet is 265,321 (file size is 11 MB) and the number of unique items sold is 23,201.  In Excel, one has to pair each of the 23,201 items with the other 23,200 items to know if there is any affiliation or not.  The sheer size of that dataset would make any formula unusable here.

I have solved this problem with the help of the PowerPivot.  Please note that in the PowerPivot solution worksheet, I have filtered the Row labels on a specific Item number.  This is so because if I attempt to clear the filter criteria to view all Item numbers, then I get a message saying that the number of rows exceeds 1 million.  Also, since the calculations are very intensive, recalculation takes time when you change the criteria in the filter dropdown of the Pivot Table.  Therefore, once you change the filter criteria in the Pivot Table, wait for the Reading data counter (bottom right on the taskbar) to finish processing.

Interpretation

1. Cell C5 - 1,725 orders contained the Item number TNB521711234E.  This figure can be verified via the following steps:
a. On the Sales data worksheet, filter the item number column on TNB521711234E.
b. Copy the resulting rows of the Order number column to another worksheet
c. Select the data so copied and go to Data > Remove Duplicates
d. Count the ensuing rows
2. Cell C6 - Of the 1725 orders containing Item number TNB521711234E, 540 contained Item number TNB52C1
3. Cell C7 - Of the 1725 orders containing Item number TNB521711234E, 446 contained Item number EMT34

You may refer to my solution in this workbook.