Transpose data column wise


Here's a simple two dataset:

Project ID Employee
A00 Ashish Mathur
A00 Henry
A00 Paul
A00 Sanjay Singh
L02 Geeta Puri
L02 Ajay Singh
L02 Robert
L02 Mike
L02 Ajay Kumar Venkatesh
L02 Ashish Mathur
L02 Geeta Kukreja
A00 Sameer Mehta
L02 Harish Rai

As one can observe, there are repetitions in the Project ID column.  One may want to transpose this data column wise like this

A00 L02
Ashish Mathur Geeta Puri
Henry Ajay Singh
Paul Robert
Sanjay Singh Mike
Sameer Mehta Ajay Kumar Venkatesh
Ashish Mathur
Geeta Kukreja
Harish Rai

In this workbook, I have solved the problem using Excel formulas and Power Query.

You may watch a short video of my solution here

Compute product wise YTD Revenue from a matrix like/Cross tabular dataset


Assume the following data layout

Name Budget April Actual April Budget May Actual May Budget June Actual June
a 2 1 1 4 3
b 4 4 2 1
c 2 3 3 3
d 2 1
e 5 6 4 8 6

As one can observe here, there are two sub columns for each month - Budget and Actual.  From this data layout, we wish to compute the Product wise YTD budget and Actual sales figure.  So for example, if one selects May in a drop down, then the YTD budget for Product B should be 6 and YTD actual for Product B should be 5.

The expected solution should look like this

Month Name May
Products YTD Budget YTD Actual
a 2 2
b 6 5
c 2 3
e 9 14
Grand Total 19 24

You may download my solution workbook from this link.

You may watch a short video of my solution here

Workaround to the problem of creating a Pivot chart after using “% of row total” calculation in a Pivot Table


Here is a dashboard created with a Pivot Table, a Pivot chart and slicers (Click to enlarge image).  In the Pivot Table, the % have been computed using "% of row total".


The Pivot chart shows two columns per month - one for complete and the other for incomplete.  The objective is to show only the Complete column per month in the Pivot chart.  If one filters the Pivot Table only on Complete, then the Pivot chart shows an unexpected result - each column will go upto 100%.  This happens because all calculations in a Pivot Table happen only on the visible cells.  Once the Incomplete column is hidden, the "% of row total" calculation will return 100% which in turn reflects in the Pivot chart.

I have resolved this problem with the help of the PowerPivot.  You may refer to my solution in this workbook.

Create a daily work schedule


Imagine a 4 column dataset as shown below:

Resource Name Start End Projects
Raghav 02-06-2015 05-06-2015 p4
Raghav 02-06-2015 07-06-2015 p5
Raghav 08-06-2015 12-06-2015 p1
Raghav 16-06-2015 19-06-2015 p2
Raghav 22-06-2015 26-06-2015 p3
Ashish 03-06-2015 09-06-2015 p6
Ashish 04-06-2015 07-06-2015 p7

One may want to transform this dataset to the below mentioned one.  There will obviously be a lot more columns to the right (I'm only showing a snip)

Employee Name 01-06-2015 02-06-2015 03-06-2015 04-06-2015 05-06-2015
Raghav p4 & P5 p4 & P5 p4 & P5 p4 & P5
Ashish p6 p6 & P7 p6 & P7

I have solved this problem with the help of Microsoft Power Query. You may download my solution workbook from here.

You may watch a short video of my solution at this link

Perform an “Affinity analysis” to identify co-selling products


Affinity analysis encompasses a broad set of Analytic techniques aimed at uncovering the associations and connections between specific objects: these might be visitors to a website (customers or audience), products in a store or content items on a media site. Of these, “market basket analysis” is perhaps the most common example. In a market basket analysis, one analyses combinations of products that frequently co-occur in transactions.  For e.g., Of all the shoppers today, how many purchased a school uniform and a school bag.  A retailer can use this information to:

1. Improve the customer shopping experience by rearranging the store layout (place products that co-occur together close to one another); and
2. Run a focused marketing campaign (e.g. target customers who buy a school uniform with offers on school bags, to encourage them to spend more on their shopping basket)

Online retailers and publishers can use this type of analysis to:

1. Inform the placement of content items on their media sites, or products in their catalogue
2. Drive recommendation engines (like Amazon’s customers who bought this product also bought these products…)
3. Deliver targeted marketing (e.g. emailing customers who bought products specific products with other products and offers on those products that are likely to be interesting to them)

Consider a dataset with four columns - Date, Order Numbers, Items sold and Item Description.  To simplify, let's ignore columns 1 and 4 for now.  One may want to know the answer to the following question:

For a particular item sold (say Item X), in how many orders (order frequency) were items X and Item Y were sold together OR in how many orders (order frequency) were items X and Item Z together.

In the Excel workbook (download link below), number of rows in the Sales data sheet is 265,321 (file size is 11 MB) and the number of unique items sold is 23,201.  In Excel, one has to pair each of the 23,201 items with the other 23,200 items to know if there is any affiliation or not.  The sheer size of that dataset would make any formula unusable here.

I have solved this problem with the help of the PowerPivot.  Please note that in the PowerPivot solution worksheet, I have filtered the Row labels on a specific Item number.  This is so because if I attempt to clear the filter criteria to view all Item numbers, then I get a message saying that the number of rows exceeds 1 million.  Also, since the calculations are very intensive, recalculation takes time when you change the criteria in the filter dropdown of the Pivot Table.  Therefore, once you change the filter criteria in the Pivot Table, wait for the Reading data counter (bottom right on the taskbar) to finish processing.


1. Cell C5 - 1,725 orders contained the Item number TNB521711234E.  This figure can be verified via the following steps:
a. On the Sales data worksheet, filter the item number column on TNB521711234E.
b. Copy the resulting rows of the Order number column to another worksheet
c. Select the data so copied and go to Data > Remove Duplicates
d. Count the ensuing rows
2. Cell C6 - Of the 1725 orders containing Item number TNB521711234E, 540 contained Item number TNB52C1
3. Cell C7 - Of the 1725 orders containing Item number TNB521711234E, 446 contained Item number EMT34

You may refer to my solution in this workbook.

Remove special characters from a string



Assume a column of names as follows:

Mohammed Zia-Ul Haque
Steven Thomas -
,-Rohit Sunil Ahir-Chowdhary.-
Anuj -----------
Sameer --
..,Mohit --
Rajeev Nair.
Monalisa . Das
Vijeta ...
--,.Anjana. M.U..,-

Please observe that there are special characters before the name, within the name and after the name.  The task is to remove special characters before and after the name.  The expected result is shown below:

Expected Result
Mohammed Zia-Ul Haque
Steven Thomas
Rohit Sunil Ahir-Chowdhary
Rajeev Nair
Monalisa . Das
Anjana. M.U

The array formula (Ctrl+Shift+Enter) to make this work is


I have solved a similar problem at this link as well but that requires the usage of an add-in.  This is so because the special characters and numbers need to be removed from within the string as well.  In other words, everything except letters need to be removed from the alphanumeric string (no matter where the numbers and special characters are - beginning, middle or at the end).

Filter the Rank Field in a Pivot Table


Imagine a two column dataset - Customer Code and Material Number (with alphanumeric data).  The objective is to determine the second highest quantity sold per Customer code.

Since we will first have to determine the Customer wise and Material Number wise quantity sold, a good way to get started is to use a Pivot Table.  One can drag Customer Code and Material Number to the Row labels and Material Number (again) to the Value area section.  We can then sort the numbers in the Value area section in descending order.

Now comes the bit of filtering the Pivot Table to only view the Material Numbers per Customer code which rank second.  To do so, one can try the following steps:

1. Add the Material Number field once again to the Value area section
2. Right click on any cell in the MaterialNumber2 column and choose the "Rank Largest to Smallest" in the "Show Values as" option.  This option is available only in Excel 2010 and higher versions.
3. Click on the Filter drop down of the Row labels heading and select Material Number in the drop down there.  Now go to Value Filters > Equal > Count of Material Number2 > 2

While the steps above sound very logical, the result is incorrect.  The criteria gets applied on the Count column rather than the Rank column.  So the result will be all rows where the Count is 2.

One can overcome this problem by using the PowerPivot.  You may refer to my solution in this workbook.

Remove duplicates from rows


Assume a simple matrix like data structure as shown below.  As one can observe, for every unique ID, there are duplicate traits appearing.  The objective is to remove duplicate traits from each row.  The Expected solution is show in the second table.

Unique ID Trait1 Trait2 Trait3 Trait4 Trait5
a 1 1 2
b 1 2 1 3 3
c 2 2 1 3 4
Unique ID Trait 1 Trait 2 Trait 3 Trait 4
a 1 2
b 1 2 3
c 1 2 3 4

While there is a feature to remove duplicates from columns, there is no feature to do so from rows.  You may download my workbook from here.

You may also watch a short video of my solution here

Flip a string with conditions


Assume a column with some IP addresses.  Each cell has 4 portions separate by a dot.  The number of digits in each portion can either be 2 or 3.  The following tasks need to be performed on this dataset:

1. Drop the last portion of the string
2. Flip the string
3. Add the following text to the flipped string - "" (without double quotes)

The base data and expected solution is shown below

While this can be resolved by writing formulas, I have shown a Power Query solution in this workbook.

You may also watch a short video of my solution here

Consolidate multiple rows of data and remove blank rows


Imagine a dataset representing questions answered by various respondents in a survey.  The first column has the Respondent's Name and thereafter there is one column each for a question posed in the survey.  The data extracted from the system suffers from the following fallacies:

1. There are blank rows; and
2. Multiple questions answered by one respondent appear in multiple lines rather than in one line.

One may want to delete the blank rows and consolidate the dataset to show only one row per respondent.

Here's a snapshot of the source data and expected result

You may refer to my solution in this workbook.

Here's a short video of my solution