Assume a single row of data with numbers and blanks appearing at random intervals. The objective is to sum the largest 5 of last 10 numbers in that row. Solving this problem entails multiple steps:

Identify the last 10 numbers in that row i.e. starting from the right hand side, identify the last 10 numbers

Identify the largest 5 of those 10 numbers

Sum those largest 5 numbers

Here are the steps

Suppose the numbers and blanks are in range A2:V2

Type 10 in cell X1

Enter this array formula (Ctrl+Shift+Enter) in cell X2

This post is in continuation of an earlier post where I applied Excel's Business Intelligence tools (PowerPivot, Power Query and PowerView) to analyse the Sales data of an E-Commerce Company. So, just for starters, in that post, I have basically sliced and diced the Sales dataset of an E-Commerce Company from multiple perspectives/facets to know the performance of this Company by Categories, Products, SKU's, Shipping Cities, Modes of payment etc.

After analysing the data, I have also visualised that data using PowerView. From the link shared above, you may download the workbook, watch the YouTube video and see a PowerBI desktop custom visual ("Sankey Diagram"). In this post, I have taken the same dataset and showcased/discussed the following:

1. How one can discover insights from this data with minimal effort using a Custom PowerBI desktop visual called "Sand Dance"; and
2. How one can query the dataset using "Natural Language" on a web browser (using www.powerbi.com); and
3. How one can query the dataset using "Natural Language" using Cortana (Microsoft's personal digital assistant in Windows 10).

For aspects 2 and 3 above, here are a few "Natural Language queries" which returned the correct result:

1. Show total revenue and growth in total revenue over previous month where order status is delivered by month in ascending order of month order as a Table
2. Show total revenue by category as a column chart
3. Show total revenue by order period as a pie chart in descending order of total revenue
4. Show total revenue by order period as a pie chart in descending order of total revenue where day of week is Sunday
5. Show Business generated from new categories by month where order period is mid day, payment type is COD sorted by month order in ascending order as a table
6. Show total revenue where portion of month is first half of month

Enough talking!!. You may view all three aspects mentioned above in this YouTube video

You may download the Powerbi desktop workbook from here and play around with the Sand Dance visual yourself. The PowerBI.com service also allows one to Publish reports to the Web (which can be viewed and interacted with by anyone). This is currently in preview stage and may become a payable service later. You may view and interact with the Sand Dance visual here:

Consider this simple two column table showing text entries in column A and the corresponding numbers in column. Assume this data is in range A2:B11 (headings are in A1:B1).

text

Value

A

1

B

2

C

3

D

4

E

5

F

6

G

7

H

8

I

9

J

0

The objective is to generate the numeric code for text code of any length entered in a certain cell. For example, a user will type a certain text code, say ABEJ and the expected result should be 1250. For JABF, the result should be 0126. The text entry and text length are both user determined.

With ABEJ, typed in cell D2, enter this array formula in cell E2

Assume a simple two column dataset with dates in column A and numbers in column B. The dates in column A are from January 1, 2013 to December 31, 2016 and numbers in column B are for the period January 1, 2013 to December 31, 2015 (there are no numbers for January 1, 2016 to December 31, 2016).

The objective is to "Compute an average for each day of calendar year 2016. The average should be for the occurrence of that day in the previous 3 years". Here's an example:

1. January 1, 2016 was a Friday (the first Friday of 2016) and is in cell A1097
2. In cell B1097, the average should be computed as: Average of the "First Friday of each of the previous 3 years"
3. January 8, 2016 was a Friday (the second Friday of 2016) and is in cell A1104
4. In cell B1104, the average should be computed as: Average of the "Second Friday of each of the previous 3 years"

I have solved this problem with the help of the PowerPivot. You may refer to my solution in this workbook.

Assume a 4 column dataset (a small sample) as follows:

City of Origin

City of destination

Mode of Transport

Passengers travelled

New Delhi

Pune

Air

123

New Delhi

Mumbai

Air

213

New Delhi

Kolkata

Air

125

Chandigarh

Jammu

Bus

785

Chandigarh

Amritsar

Train

567

Given this dataset, one may want answers to the following questions:

1. Of all those passengers who originated their journey (City of Origin) from Chandigarh, how many terminated their journey (City of destination) in New Delhi via different modes of transport; and
2. Of all those passengers who terminated their journey (City of destination) in Jammu, how many arrived in Amritsar (City of Origin) via different modes of transport; and
3. Of all those passengers who travelled by Bus, how many travelled from City A (City of Origin) to City X,Y,Z (City of destination)

While one can analyse/slice and dice this data using Pivot Tables, one cannot visualize this data very clearly (even after creating a Pivot chart). I have attempted to visualize this data using a software called PowerBI desktop (a free for download and use Business Intelligence software from Microsoft which rolls all of Excel's BI tools into 1 - PowerPivot, Power Query, Power Map and Power View).

You may download the source Excel workbook and the Power BI desktop workbook from this link.

In this workbook, I have Sales data of an E-Commerce Company for 3 months. The typical columns in the base data are:

1. Order Date/Time
2. City to which orders were shipped
3. Order Number
4. Payment Type i.e. Cash on delivery, Net Banking, EMI's
5. Order Status i.e. Delivered or cancelled
6. SKU's which the ordered items fall into
7. Products which the ordered SKU's fall into
8. Categories which the ordered products fall into

Given this simple tabular representation, one may want to analyse and visualize this dataset from multiple perspectives based on user selections, such as

"What was the revenue earned from the Top 5products in the A100 category in April for orders shipped to New Delhi?"

In this query framed above, the end user should have the leeway to select any/all of the underlined facets. So one can either choose revenue earned or Number of orders. Likewise, one can either select Top 5 products or Top 15 products/Top 5 SKU's etc.

With relative ease, one should also be able to "Perform an affiliate analysis" showing which categories are ordered together (to study affiliations). Please review this post for an independent discussion on "Affinity Analysis".

Furthermore, one should be able to perform a free form timeline search such as - "I would like to study growth in Total revenue of March 2-8 2015 over Feb 1-4 2015"

You may download the workbook from the link shared above.

You may watch similar videos showcasing the capabilities of Business Intelligence in MS Excel:

Here's a video showing the capabilities of this Sales data model

You may also watch this short video to see how I visualized the revenue flow from Categories to Shipping cities during different Order periods using Custom visuals available in PowerBI desktop.

Please feel free to download the PowerBI desktop workbook of the video shown above from here.

For a detailed overview of Sankey diagrams (a Custom visual available in PowerBI desktop), you may refer to my Blog article here.

Another great Custom visual (Sand Dance) which allows data discovery has been shown at this link. At that link, you will also be able to see how I queried the underlying dataset using "Natural Language".

As you can observe, there are duplicates that appear in each column. The task it to convert this tabular data structure to a matrix like one with ID's appearing down one column, Years appearing in one row and at each intersecting cell, the prizes should appear. Please note that there can be multiple prizes for each ID and year combination - ID is B and year is 9 in two rows but the prizes are different.

The expected solution is shown below

ID

9

10

A

X

Y

B

X,Y

Z

I have solved this problem by using Power Query. You may download the workbook from here.

Suppose the data is in range A3:C7 (headings are in A2:C2). The objective is to add numbers where the following conditions are met:

1. Status should be Active; and
2. Text should contain either b or c or d

So the result for the dataset above should be 6 i.e. 1+5. Sounding like a simple case of summarising based on multiple conditions, one may attempt using the SUMIFS() function here. However, a single SUMIFS() function will not work because a SUMIFS() can only accept multiple AND conditions (not OR conditions). The only way to make the SUMIFS() function work is to use 3 of them and then add them up.

While this looks like a solution, the problem is that there will have to be one SUMIFS() function for each OR condition. So the more the number of OR conditions, the more the SUMIFS() functions and the lengthier the formula.

In this workbook, I have shared two alternative solution. One uses the DSUM() function and other uses a combination of MMULT(), TRANSPOSE() and SUMPRODUCT() functions.

As one can observe here, there are two sub columns for each month - Budget and Actual. From this data layout, we wish to compute the Product wise YTD budget and Actual sales figure. So for example, if one selects May in a drop down, then the YTD budget for Product B should be 6 and YTD actual for Product B should be 5.

The expected solution should look like this

Month Name

May

Products

YTD Budget

YTD Actual

a

2

2

b

6

5

c

2

3

e

9

14

Grand Total

19

24

You may download my solution workbook from this link.