Remove duplicates from rows

{0 Comments}

Assume a simple matrix like data structure as shown below.  As one can observe, for every unique ID, there are duplicate traits appearing.  The objective is to remove duplicate traits from each row.  The Expected solution is show in the second table.

Unique ID Trait1 Trait2 Trait3 Trait4 Trait5
a 1 1 2
b 1 2 1 3 3
c 2 2 1 3 4
Unique ID Trait 1 Trait 2 Trait 3 Trait 4
a 1 2
b 1 2 3
c 1 2 3 4

While there is a feature to remove duplicates from columns, there is no feature to do so from rows.  You may download my workbook from here.

You may also watch a short video of my solution here

Flip a string with conditions

{2 Comments}

Assume a column with some IP addresses.  Each cell has 4 portions separate by a dot.  The number of digits in each portion can either be 2 or 3.  The following tasks need to be performed on this dataset:

1. Drop the last portion of the string
2. Flip the string
3. Add the following text to the flipped string - "in-addr.arpa" (without double quotes)

The base data and expected solution is shown below

While this can be resolved by writing formulas, I have shown a Power Query solution in this workbook.

You may also watch a short video of my solution here

Consolidate multiple rows of data and remove blank rows

{0 Comments}

Imagine a dataset representing questions answered by various respondents in a survey.  The first column has the Respondent's Name and thereafter there is one column each for a question posed in the survey.  The data extracted from the system suffers from the following fallacies:

1. There are blank rows; and
2. Multiple questions answered by one respondent appear in multiple lines rather than in one line.

One may want to delete the blank rows and consolidate the dataset to show only one row per respondent.

Here's a snapshot of the source data and expected result

You may refer to my solution in this workbook.

Here's a short video of my solution

Quantify combination courses opted by students

{0 Comments}

Assume a dataset with two columns which lists down the student names in column A and courses opted for in column B.  Since one student can opt for multiple courses and the same course can be taken up by multiple students, there can be repetitions in both columns.  The objective is to create a matrix like data structure (with courses appearing in both row and column labels) with numbers inside the matrix quantifying the "Number of students who opted for course A and C".  So, for all possible course combinations, one may want to know the number of students who opted for those combinations.

The description above can be extended to cases where buying behavior has to be analysed.  A sore manager may want to know "How many people who buy Brand A also buy Brand B."

Here's a snapshot of the source data and expected result

The number 1 in cell H4 (and cell F6) means that there is only one student who opted for courses B and D.  Likewise, 3 in cell H5 (and cell G6) means that 3 students opted for courses C and D.

You may refer to my Power Query and PowerPivot solution in this workbook.  Power Query has been used for generating a dynamic list of Courses and Power Pivot has been used for writing the DAX formula for quantifying within the matrix.

Compute potential Sales of a retail outlet

{0 Comments}

The objective is to assist a Store Manager with computing potential sales across different products and colours.  To start with let's assume two datasets:

1. Customer-Colour dataset - a two column table which lists down the colour preference of each customer; and
2. Colour-Product-Price dataset - a three column table which lists down the multiple products associated with each colour and the associated prices

My objective is to compute the "Potential Sales" if each customer buys one unit of each colour of each product.  While this can be resolved by using formulas (see Formula solution worksheet of the workbook), I'd like to resolve this problem with the help of the Power Query and PowerPivot tools.

The initial challenge will be to establish a relationship between the two tables because one cannot establish Many to Many relationships in a PowerPivot.  The two tables above are a perfect example of Many to Many relationships because each customer likes multiple colours and each colour is associated with multiple products.

You may download my solution workbook from this link.

You may also view a short video of my solution here:

Merge data from multiple cells into a single cell

{8 Comments}

Assume a simple two column dataset with Name in column A and Department in column B.  There are repetitions in the Name column.  Here's a snapshot of the base data:

One may want to merge data from multiple cells into a single cell - basically a view where all unique names are listed in a column and all departments for each name appear in a single cell (separated by commas).  The expected result should look like this:

You may view my solution in this workbook.

You may also watch a short video of my solution here

To accomplish the reverse of this i.e. start at Table 2, process the dataset and arrive at Table1, you may refer to my solution at this link.

Identify buy and sell break points

{0 Comments}

Assume a two column dataset with Date in the first column and Price in the second one.  The purpose is to identify times to buy and sell - buying would be just after the lowest low is confirmed and sell before or just after the highest high is in place. Confirmation is achieved through crossover of moving averages. This data is being used in back testing buy and sell criteria.

Snapshot of base data

Snapshot of expected result

The Lowest Low is the lowest price that occurs before the next Highest High.  The Highest High is the highest price that occurs before the next Lowest Low..  2.77 is the lowest low after the highest high of 3.69 and 3.23 is highest high after the lowest low of 2.77.

You may refer to my solution in this workbook.

Customise Row/Column appearances in Pivot Tables

{0 Comments}

Assume a simple four column dataset with the following columns - User, Month, Leads and Sales.  The dataset shows the user and month wise leads generated and revenue earned.  One may want to analyse this data in a Pivot Table with the User field appearing in the Row labels section, Months field appearing in the Column labels section and the other two Fields appearing in the Value area section.  One can easily create this Pivot Table by dragging field in the quadrant of the Pivot Table Field list pane or in the Pivot Table grid directly.

However, the one customization one may want is to show the Leads generated for all months combined only (not month wise).  The Pivot Table should look as follows:

You may refer to my solution in this workbook.

You may view a video of my solution here

Perform a Competitor, Feature and Customer Analysis with the PowerPivot

{0 Comments}

Assume there are four interrelated tables. One may want to create a pivot that allows one to filter data by using the slicers. Data should be filtered by the following interdependent slicers selections: Customer, Country and segment.  The logic behind the pivot when using the slicers shall be as follows:

1. Feature N is only shown if relevant to Customer X in Segment Y and Competitors do not possess Feature N
2. Competitor X is only shown if Competitor X exists and is active in Country Z and is relevant to Customer Y in Country Z

So after slicer selections are made, the idea is to display all features that one can offer and are relevant to the respective customer in the respective segment and country, regardless of whether the competitors can offer them or not.  So if one competes with competitor 1 in a specific project and offers features 1, 3, 7, offering the very same features to our shared customer does not make sense.  The customer won't see a benefit in choosing me over competitor 1.

Here's an elaborate example:

1. If one selects Customer 1/Segment A/Country 1 from the three slicers, then the Pivot Table should display as follows:

1. Row Labels - Display features in pivot if they are implemented or relevant.  The ones in Blue are implemented and the ones in green are relevant.  The pivot now shows that competitor 1 does not have features 2 and 9 which. Thus one's sales pitch will focus on offering features 2 and 9.  This section should also show data for Feature benefit calculation.

2. Column labels - Competitor 1 and 2 are displayed in the Pivot Table because they are both relevant in Country 1.

3. Competitor has/does not have feature (Value area section) - The following competitors have the same features I can offer my customer 1 in segment A:

  • Competitor 1 has features           1, 3, 7
  • Competitor 2 has features           3

Therefore they are marked with an “x” whenever they have the features.

The other competitors also have features that I offer to customer 1 in segment A, but since they are not active in Country 1, they are automatically not displayed when using the slicers.

You may refer to my solution in this workbook.

You may also view a video of my Power Query solution here:

Auto detect sum range when copying and pasting

{11 Comments}

Assume a simple two column data range as follows:

Product Amount
A 1
S 2
D 3
E 4
Product A
F 5
G 6
R 7
Product B
q 8
w 9
s 10
d 11
c 12
v 13
b 14
Product C

In the table above, one may want to compute the Product wise revenue.  Since the number of items falling in every product is different, one cannot copy and paste the SUM function from one product to another.  So one can adopt any one of the following two approaches:

1. Write the SUM function thrice; or
2. Use the short cut key for adding.  Here is the process (please note that this process will only work when there are no blank cells in the second column.  If there are blank cells, then the process mentioned below can be modified to still get the desired result):

a. Select range B2:B18
b. Press Ctrl+G > Special > Blanks > OK
c. Press Alt+= (this is the short cut key to generate the SUM function)

Now consider a different scenario.  Assume that the two column data range has the summarization row at the top of each Product block (rather than at the bottom as in the table above)

Product Amount
Product A
A 1
S 2
D 3
E 4
Product B
F 5
G 6
R 7
Product C
q 8
w 9
s 10
d 11
c 12
v 13
b 14

The second method described above will not work in this case because the Alt+= shortcut key only identifies thr range above or to the left.  To solve this problem, we need to write a formula for Product A which when copied down will auto adjust the range height.

You may refer to my solution in this workbook.  I have discussed two variants of the table above.