Combine unique entries from a range of cells after satisfying a condition


Here is a simple three column dataset showing Item ID, Catalogue Name and Catalogue Year

Item ID Catalogue Name Catalogue Year
A Spring 2012
A Spring 2016
A Spring 2017
B Fall 2014
B Fall 2014
B Fall 2016

The second and third last rows are clear duplicate rows. The task is to create a fourth column (titled as All Year) in this dataset with concatenated entries from the Catalogue Year column for each Item ID. So the expected result is:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2016
B Fall 2014 2014, 2016
B Fall 2016 2014, 2016

Notice that for Item ID B, the result in every cell of the fourth column is 2014, 2016 and not 2014,2014,2016.  So while joining (concatenating in technical MS Excel terminology) entries from the "Catalogues Year" column in the "All Year" column, only unique entries for that Item ID have to be considered.

Excel 2016 has a TEXTJOIN() function which gets use close to the final result but does not take into account the unique entries only (it takes all entries into account all entries).  On using the TEXTJOIN() array formula in a spare column, here is the result i got:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2014, 2016
B Fall 2014 2014, 2014, 2016
B Fall 2016 2014, 2014, 2016

Notice that for Item ID B, the result is 2014, 2014, 2017 and not 2014, 2017.  So the TEXTJOIN() function does not work.

I have solved this problem using Power Query (Get and Transform in Excel 2016).  You may download my solution workbook from here.

Prepare an invigilation schedule for each teacher by different time periods


Imagine a multi column exam invigilation schedule with the following information

  1. S. No.
  2. Name of staff
  3. Designation of staff member
  4. Two columns for each day on which there is an exam - one for Morning and another for Afternoon
  5. A * under each column if that particular staff member has to be an invigilator during that time period
S. No. Name of staff Designation 09/11/2017-Morning 09/11/2017-Afternoon 10/11/2017-Morning 10/11/2017-Afternoon
1 Tom Lecturer * * *
2 Sam Lecturer * *

This dataset stretches into many more columns.  So if the exams last for 15 days, there will be 30 columns.  The objective is to condense the column expanding dataset into a 5 column one - S. No., Name of Staff, Designation, Morning and Evening.  Under the Morning and Evening columns, the different dates have to be separated by commas in that one single cell.  The expected result is:

Name of staff S. No. Designation Morning Afternoon
Tom 1 Lecturer 09/11/2017, 10/11/2017, 11/11/2017, 13/11/2017, 14/11/2017, 16/11/2017, 17/11/2017 10/11/2017, 13/11/2017, 14/11/2017, 15/11/2017, 16/11/2017, 17/11/2017
Sam 2 Lecturer 09/11/2017, 13/11/2017, 15/11/2017, 16/11/2017 09/11/2017

I can think of 3 ways to solve this problem. Here's a brief about each of those methods:

  1. Using Power Query and Excel's functions - Using Power Query, one can first unpivot the data and then use the TEXTJOIN() function.  This would be an array formula.
  2. Using Power Query only - Using Power Query, one can first unpivot the data and then modify the code to concatenate all dates in a single cell for that invigilator.  One can then Pivot the data in Power Query itself
  3. Using Power Query and PowerPivot - Using Power Query, one can first unpivot the data and then use the CONCATENATEX() function of the PowerPivot.

As you can observe, the common thread in all solutions is that one has to first unpivot the dataset.

You may refer to all my 3 solutions in this workbook.

Return the specific product which satisfies the user defined feature combination


Here is a simple matrix like data layout which shows the features available in every product.  Let's assume that this data is in range A2:E8 (including the header row)

Functions Product A Product B Product C Product D
Function 1 x
Function 2 x x
Function 3 x x x
Function 4 x x
Function 5 x x
Function 6 x

Here is a user created input table of his/her requirements.  Let's assume that this data is in range A12:B18 (including the header row)

Functions Input
Function 1
Function 2 x
Function 3 x
Function 4
Function 5 x
Function 6

The expected result is the "Product" which meets the user defined function combinations. The result should be Product B. If there are different products which meet the user's requirements, only the first one will be returned as a result. If one would like all products to be returned, the one can use Power Query to resolve this problem.

=IFERROR(INDEX($B$2:$E$2,1,MATCH(COUNTA($A$13:$A$18),MMULT(1*(TRANSPOSE(B3:E8=B13:B18)),1*(ISNUMBER(ROW(INDIRECT("1:"&COUNTA($A$3:$A$8)))))),0)),"No such product")

Please note that this is an array formula so please confirm the formula with Ctrl+Shift+Enter.

Determine number of learners who have completed different stages of multiple online courses


Here is a sample dataset of learners who have cleared different stages of multiple courses on offer within an Organisation:

Learner Stage completed Course
Bill Stage 1 Public Speaking
Bill Stage 2 Public Speaking
Bill Stage 3 Public Speaking
Susan Stage 1 Effective Communication
Bob Stage 1 Public Speaking
Bob Stage 2 Public Speaking
Sheila Stage 1 Effective Communication
Sheila Stage 2 Effective Communication
Sheila Stage 3 Effective Communication
Frank Stage 1 Effective Communication
Frank Stage 2 Effective Communication
Henry Stage 1 Public Speaking
Henry Stage 2 Public Speaking
Bill Stage 1 Effective Communication
Bill Stage 2 Effective Communication

From this sample dataset, one may want to know how many participants have completed each stage of these multiple courses.  The expected result is shown below:

Row Labels Stage 1 Stage 2 Stage 3
Effective Communication 1 2 1
Public Speaking 2 1
Grand Total 1 3 2

In this workbook, I have shared 2 solutions - one using formulas and the other using the Power Query & PowerPivot.

Return best possible fit, to manually entered dimensions, with the intent to minimise wastage


Refer to a simple 5 column representative inventory dataset of a Glass manufacturer:

Model Length (MM) Wide (MM) Thk (MM) CAT
HX9-G-ARD 1071 273 3.5 A
MYP-G-3RD 580 535 3.2 B
EPO-G-3RD 580 535 3.2 A
MYG-G-3R 966 350 3.2 A
MYN-G-3RD 649 530 3.2 A
GM SPIN-G-3FD 882 395 3.2 A
MY8-G-AR 880 400 3.5 B
GM2-G-AR 880 400 3.5 A

From this inventory data, one has to furnish customer orders based on specific dimensions demanded by them.  A typical Customer request would be to supply glass sheets as per the following dimensions

Length (MM) Wide (MM) Thk (MM) CAT
780 542 3.5 A

The firm may or may not have glass sheets of this specific size.  The objective is to identify glass sheets, from the inventory on hand, which match customer specifications.  If there is no exact match, then one must be able to obtain all inventory items which have the same Thk (MM) and CAT as the customer specified dimensions but the Length and thickness should be more than equal to the customer specified dimensions.  The length and width can then be trimmed to match the exact customer dimensions.  Furthermore, the result returned should:

  1. List only the Top 30 glass sheets available in inventory; and
  2. List those Top 30 glass sheets in ascending order of wastage (wastage caused when the glass sheet is trimmed to match the customer specified dimensions)

You may refer to my solution in this workbook.  I have shared two solutions - one using Excel formulas and the other using Power Query a.k.a. Get and Transform in Excel 2016.  Please read the Comments in cells F1, J9 and J16 of the "Solutions" worksheet.  The difference between the 2 solutions is:

  1. Formula driven solution - This is in range J10:AM14 of the Solutions worksheet.  This is a semi dynamic solution (as compared to the Power Query solution).  To get the models in ascending order of wastage, one will have to create an Area column in the base data and sort that column in ascending order.
  2. Power Query solution - This is in range J17:AM21 of the Solutions worksheet.  This is a dynamic solution.  Just change the customer specified dimensions in range G2:J2 of the Data and Query worksheet.  Thereafter just right click on any cell in the range below and select refresh.

Sum the largest 5 of the last 10 numbers in a row ignoring blanks


Assume a single row of data with numbers and blanks appearing at random intervals.  The objective is to sum the largest 5 of last 10 numbers in that row.  Solving this problem entails multiple steps:

  1. Identify the last 10 numbers in that row i.e. starting from the right hand side, identify the last 10 numbers
  2. Identify the largest 5 of those 10 numbers
  3. Sum those largest 5 numbers

Here are the steps

  1. Suppose the numbers and blanks are in range A2:V2
  2. Type 10 in cell X1
  3. Enter this array formula (Ctrl+Shift+Enter) in cell X2


Convert a text entry into its number equivalent


Consider this simple two column table showing text entries in column A and the corresponding numbers in column.  Assume this data is in range A2:B11 (headings are in A1:B1).

text Value
A 1
B 2
C 3
D 4
E 5
F 6
G 7
H 8
I 9
J 0

The objective is to generate the numeric code for text code of any length entered in a certain cell.  For example, a user will type a certain text code, say ABEJ and the expected result should be 1250.  For JABF, the result should be 0126.  The text entry and text length are both user determined.

With ABEJ, typed in cell D2, enter this array formula in cell E2


This formula can now be copied down for generating the numeric code for all text codes entered in column D.

Summarise data with multiple wildcard OR conditions


Imagine a dataset as shown below

Status Text Number
Active bat 1
Passive erq 2
Passive cat 3
Active enm 4
Active dog 5

Suppose the data is in range A3:C7 (headings are in A2:C2).  The objective is to add numbers where the following conditions are met:

1. Status should be Active; and
2. Text should contain either b or c or d

So the result for the dataset above should be 6 i.e. 1+5.  Sounding like a simple case of summarising based on multiple conditions, one may attempt using the SUMIFS() function here.  However, a single SUMIFS() function will not work because a SUMIFS() can only accept multiple AND conditions (not OR conditions).  The only way to make the SUMIFS() function work is to use 3 of them and then add them up.

So this SUMIFS() function will work


A11:A13 has Active and B11:B13 has *a*,*b*,*c*.

While this looks like a solution, the problem is that there will have to be one SUMIFS() function for each OR condition.  So the more the number of OR conditions, the more the SUMIFS() functions and the lengthier the formula.

In this workbook, I have shared two alternative solution.  One uses the DSUM() function and other uses a combination of MMULT(), TRANSPOSE() and SUMPRODUCT() functions.

Remove special characters from a string



Assume a column of names as follows:

Mohammed Zia-Ul Haque
Steven Thomas -
,-Rohit Sunil Ahir-Chowdhary.-
Anuj -----------
Sameer --
..,Mohit --
Rajeev Nair.
Monalisa . Das
Vijeta ...
--,.Anjana. M.U..,-

Please observe that there are special characters before the name, within the name and after the name.  The task is to remove special characters before and after the name.  The expected result is shown below:

Expected Result
Mohammed Zia-Ul Haque
Steven Thomas
Rohit Sunil Ahir-Chowdhary
Rajeev Nair
Monalisa . Das
Anjana. M.U

The array formula (Ctrl+Shift+Enter) to make this work is


I have solved a similar problem at this link as well but that requires the usage of an add-in.  This is so because the special characters and numbers need to be removed from within the string as well.  In other words, everything except letters need to be removed from the alphanumeric string (no matter where the numbers and special characters are - beginning, middle or at the end).

Analyse all possible combinations of cheques received and identify the combination which gives maximum benefit to the Customer


Scenario 1

In a Stock Broking business (Firm), Customers deposit margin money through cheques. Although cheques are received from customers at branches, clearing (realization) of the same normally takes 3-4 days.  On one hand, the Firm wishes to allow trading to the customers against the deposited cheques from the day it receives the cheque, on other hand there is a risk associated with this  i.e cheque may get dishonored and the Firm may incur huge financial losses because of allowing customers to trade in anticipation of the cheque realization. Hence, there is a trade off between customer centricity & Financial Risk management.

The Firm may decide to take a calculated risk in order to retain its customer centric approach (it is a delightful experience for customers if they get trading benefit against the deposited cheque instantly before realization). The Firm may decide to give the benefit of cheque which are below a set threshold amount (say 50,000) instantly and for cheques which are above threshold amount the credit will be given only after realization. Sometimes there are  multiple unrealized cheques deposited by one client but the Firm needs to cap the total benefit of unrealized cheques to 50,000 (partial benefit against any cheque is not possible i.e. either a cheque will be considered in full or not considered) E.g. if a customer deposits 3 cheques of 40,000 each then also the total instant benefit against all unrealized cheques shall be capped at 50,000 - hence only one cheques of 40,000 will be credited to customers ledger before realization.

Now the question here is with respect to selection of cheques to give best results i.e. maximum total below 50,000.  Consider the following example:

Cheque1: 55,000
Cheque2: 44,000
Cheque3: 28,000
Cheque4: 7,000
Cheque5: 6,500
Cheque6: 4,000
Cheque7: 3,000
Cheque8: 500

The Firm wants to identify the "Combination of cheques" which gives the maximum benefit to the Customer subject to a limit of 50,000.  The result should be cheques 3-8.  They add up to 49,000.

Here's another example:

Cheque1: 55,000
Cheque2: 45,000
Cheque3: 25,000
Cheque4: 5,000

The result here should be cheque 2 and 4.  They add up to 50,000.

Here are my thoughts:

1. I have assumed that the total number of cheques received from customer will be capped to 8.  This limit can be relaxed but processing will then take time.

2. Create all possible combinations of cheques which can be credited.  So, since no partial credit is allowed, the cheque can either be accepted in full or be rejected.  Hence the possible values can be 2 i.e. 1 (Accepted) or 0 (Rejected).

3. So if there are 8 cheques received from a certain customer, then there will be 2^8 i.e. 256 possible combinations.  This can also be computed by using the =PERMUTATIONA(2,8) formula.  I'd like to than Saurabh Gupta for sharing the formula to generate all these combinations.

4. After creating all possible combinations, add the cheque amount for each combination.

5. Scan this total column and highlight the one which is the largest value <= the threshold value i.e. 50,000.

You may refer to my solution in the Solution worksheet of this workbook.

Scenario 2

Here's another example.  Say, one has a list of numbers as shown below:


Now one may want to know which combination of numbers adds up to or comes closest to 5115.52 (our lookup value).  If none of the combinations adds up exactly to the lookup value, then return the combination which adds to the largest value less then equal to the lookup value.  In the example above, the second, third and fourth add up to 5114.95.

You may find my solution in this workbook.