Tags: ARRAY FORMULA

Extract City, State and Pin code from an address string

{49 Comments}

For a lot of Excel users, a common problem is to extract City, State and Pin Code from an address string.  What compounds the issue is that there is no standardisation in an address string.  For e.g., one may end the address with a Pin code, while others may end it with a State and Country.  Some other variations could be:

1. End the address with Contact Numbers
2. There are no spaces between City, State and Pin code

There are of course many other variations which are possible and the sheer number of these variations makes it difficult to list down all of them.

Here is an example of address strings from where the State, City and Pin code need to extracted in three columns:

1. F-45, Pocket 1, Sector 198, Noida - 201303, Uttar Pradesh, India
2. S-45/B, Pocket 1-C, Phase II, Sector 189, Gurgaon, Haryana - 122002
3. RR-45/B, Pocket II-S, Cross 2, Noida - 201303, India
4. T-45, Sector 198, Lucknow - 226001, Uttar Pradesh, India
5. V-45(A), Sector 193C,Allahabad-211002 Uttar Pradesh India
6. V-45(A), Sector 193C,Allahabad-211002Uttar PradeshIndia

You may refer to my solution in this workbook.

Prioritise investment liquidation to minimise Capital Gains

{2 Comments}

Assume a five column dataset with the Fund Name in column A, Market Value in column B, Gain/Loss in column C, Gain/Loss % in column D [column C/column B] and Type of investment in column E (Short term or Long term).  Since multiple investments can be made in one fund, there can be multiple line items for with the same Fund Name (with different market values).  Let's call this dataset the Investment schedule.

The objective is to determine the following:

a. Minimum total gain realised from sale of one/multiple funds; and
b. Short term gain present in a above; and
c. Total Long term gain realised from sale of one/multiple funds; and
d. If Long term investments fall for liquidating the particular fund, then how much is the gain/loss by forcible liquidating the short term investments.

You may refer to my solution in this workbook.

Consider a Pivot Table Value field column as a criteria for computing another Value Field column

{8 Comments}

Assume a simple three column dataset showing hours worked by different machine on different dates.  So column A is Date, column B is Machine Name and column C is hours worked.  There are duplicates appearing in column A and B .  Blanks in column C depict machine idle time.

The task is to create a simple three column dataset showing all unique Machine names in the first column, Last day on which the machine worked in the second column and hours worked on the last day in the third column.

This problem can be solved by using formulas (Refer first worksheet of the workbook) but if one has to use a Pivot Table, then there would be a few problems.

1. The Grand Total for the Date Field should be blank because on cannot determine the Last day on which the machine worked across different machine types.  A conventional Pivot Table shows the Maximum of all dates appearing in the Date Field.

2. The Grand Total for the Hours worked Field should be a summation of the total hours worked on last day across all machine types.  A conventional Pivot Table shows the Maximum of all hours worked appearing in the Hours worked Field.

3. The biggest problem of them all is that there is no way to give a criteria as the Last day for that machine for computing another Field in the Pivot Table.  Please refer the file for a better understanding.

This problem can be solved using the PowerPivot.  You may refer to my solution in this workbook.

Compute configuration count using Set Theory and Venn Diagrams

{0 Comments}

Assume a table which lists attendees for a Company's Annual day function.  In this Table, data for every attendee is shown on a separate row so if an employee attends the function with his/her spouse and three children, then there will be 5 rows for that employee.

The question is to determine the count of the following family configuration:

1. Employees only (those who attended without spouse and children); and
2. Employees, spouse and children (Family); and
3. Employees and spouse (no children); and
4. Employees and children (no spouse)

You may refer to my solution in this workbook.  I have solved this problem using:

1. MS Excel Formulas based on Set Theory and Venn Diagram; and
2. PowerPivot

Perform an iterative sum of Top n values across multiple columns

{4 Comments}

A tournament has 18 participating teams with 25 players in each team.  Each team has to play five rounds of the Tournament and not all players play all rounds.  Scores earned by each player in each round are shown in individual cells.  If a player does not play a round, that cell is left empty.

The task is to sum the highest 18 scoring players for each round.  Only the highest 18 players per team count towards the teams score.  If few of the players have the same score at position 18 then only one of them should be included in the overall score.

One solution is to sort each round of scores for each team in descending order and sum the highest 18 values.  This is obviously a time consuming process.

There could be two others ways to solve this problem

Formula driven solution - This uses a spare column, a lengthy formula and the Data > Table functionality.  Since Data > Table is a series of array formula, this solution makes the workbook very sluggish.  You may refer to my solution in this workbook.

PowerPivot solution - This solution is far better than the formula driven one in as much as no spare columns, lengthy formulas or Data > Tables have been used.  The solution in this workbook adds the scores of the highest 18 scoring players per round (If few of the players have the same score at position 18 then all scores are included in the overall score).  To use this PowerPivot solution, you need to be using the PowerPivot add-in for MS Excel.  This add-in is only available for Excel 2010 and higher versions.

Compute MODE of all numbers split across multiple worksheets

{10 Comments}

Assume numbers are typed in range A1:A2 of multiple worksheets in a workbook.  The task is the compute the MODE of these numbers.  Mode is defined as the value which appears most frequently in a range of cells.  So, if one types 1,3,4,3,5,6 in range A1:A6, then the mode will be 3 - 3 appears maximum number of times in the range.

In MS Excel, there is a built in way to compute the MODE.  The formula for the same is

=MODE(A1:A6)

Unfortunately, MODE() is not a 3D function and therefore, something like this return a #REF error

=MODE(sheet1:sheet3!A1:A6)

This behavior seems somewhat vague because other basic Mathematical and Statistical functions such as SUM(), COUNT(), AVERAGE(), MAX(), MIN(), VAR(), and STDEV() work just fine across multiple worksheets.

Depending upon the version of MS Excel which you are using, there could be two ways to solve this problem.

Solution for MS Excel 2010 and higher versions

If you are using the Power Query add-in, then a few simple steps (no formulas at all) can solve this problem.  The result will be dynamic and refreshable (just as in a Pivot Table).  In Power Query, there is a feature to append data from multiple worksheets into one running range.  Once this is done, the running range can be transferred to an MS Excel worksheet on which the MODE function can be used.

Solution for all versions of MS Excel

To compute MODE across multiple worksheets, you may refer to my solution in this workbook.

Show granular as well as total figures on the Summary sheet

{0 Comments}

Assume there is data for complaints received and complaints resolved for two years - 2009 and 2010 for some regions regions.  Each Region has its own worksheet.  There are three headings on each sheet - Particulars, 2009 and 2010.  While the Particulars column has the text "Complaints received" and "Complaints resolved", the year columns have number of complaints received and resolved.

The task is to create a Summary sheet which:

1. Show the consolidated figure of complaints received and complaints resolved for both years
2. Show the individual regions which make up the consolidated figure in 1 above.

In this workbook, you may see the data layout and four different ways of resolving the two tasks mentioned above.

You may also refer to the following related posts:

1. Summarise data from multiple worksheets with one condition
2. Summarise data from multiple worksheets with one condition - Part II
3. Summarise data from multiple worksheets
4. Summarise data from multiple worksheets with multiple conditions
5. Summarise data from multiple worksheets with multiple conditions - Part II
6. Summarise data from different cells of multiple worksheets

Generate a list of missing invoice numbers

{2 Comments}

In a range of cells, assume there is a list of invoices raised during a certain period.  Since adequate care was not taken to number the invoices sequentially, there are missing invoice numbers in the Excel range.  So, it is possible that after raising invoice number 501, 502 and 503 the next invoice raised was 507.

Given the following:

1. Starting and ending invoice number (two numbers) from the invoice book; and
2. List of actual invoice numbers raised during a certain period (as elaborated above)

one may want to generate a list of missing invoices.

You may refer to two solutions in this workbook.

LOOKUP unique data from multiple columns where search string appears multiple times

{2 Comments}

In a list with multiple occurrences, the VLOOKUP() function in Excel will only return the first occurrence of the string being searched.  Furthermore,  the VLOOKUP() function returns data from one column only.

To work with a list where there are multiple occurrences of a particular string and data has to be extracted from multiple columns, you may refer to my solution in this workbook.

To work with a list where there are multiple occurrences of a particular string and data has to be extracted from one column only, you may refer to my solution at this link.

Sum data from a particular cell of last n sheets only

{0 Comments}

A number is entered in cell A1 of every sheet in a multi sheet workbook.  The task is to sum numbers in cell C2 of the Summary sheet from cell A1 of last x sheets - x is the number of sheets (mentioned in cell B2 of the Summary sheet) from which numbers have to be summed up.  So if the number entered in cell B2 of sheet is 3, then sum numbers from cell A1 of last three sheets.  Worksheets can get added, removed or renamed.

You may refer to my solution in this workbook.