# Category: DATA SUMMARISING

Imagine a dataset as shown below

 Status Text Number Active bat 1 Passive erq 2 Passive cat 3 Active enm 4 Active dog 5

Suppose the data is in range A3:C7 (headings are in A2:C2).  The objective is to add numbers where the following conditions are met:

1. Status should be Active; and
2. Text should contain either b or c or d

So the result for the dataset above should be 6 i.e. 1+5.  Sounding like a simple case of summarising based on multiple conditions, one may attempt using the SUMIFS() function here.  However, a single SUMIFS() function will not work because a SUMIFS() can only accept multiple AND conditions (not OR conditions).  The only way to make the SUMIFS() function work is to use 3 of them and then add them up.

So this SUMIFS() function will work

=SUMIFS(\$C\$3:\$C\$7,\$A\$3:\$A\$7,A11,\$B\$3:\$B\$7,B11)+SUMIFS(\$C\$3:\$C\$7,\$A\$3:\$A\$7,A12,\$B\$3:\$B\$7,B12)+SUMIFS(\$C\$3:\$C\$7,\$A\$3:\$A\$7,A13,\$B\$3:\$B\$7,B13)

A11:A13 has Active and B11:B13 has *a*,*b*,*c*.

While this looks like a solution, the problem is that there will have to be one SUMIFS() function for each OR condition.  So the more the number of OR conditions, the more the SUMIFS() functions and the lengthier the formula.

In this workbook, I have shared two alternative solution.  One uses the DSUM() function and other uses a combination of MMULT(), TRANSPOSE() and SUMPRODUCT() functions.

A tournament has 18 participating teams with 25 players in each team.  Each team has to play five rounds of the Tournament and not all players play all rounds.  Scores earned by each player in each round are shown in individual cells.  If a player does not play a round, that cell is left empty.

The task is to sum the highest 18 scoring players for each round.  Only the highest 18 players per team count towards the teams score.  If few of the players have the same score at position 18 then only one of them should be included in the overall score.

One solution is to sort each round of scores for each team in descending order and sum the highest 18 values.  This is obviously a time consuming process.

There could be two others ways to solve this problem

Formula driven solution - This uses a spare column, a lengthy formula and the Data > Table functionality.  Since Data > Table is a series of array formula, this solution makes the workbook very sluggish.  You may refer to my solution in this workbook.

PowerPivot solution - This solution is far better than the formula driven one in as much as no spare columns, lengthy formulas or Data > Tables have been used.  The solution in this workbook adds the scores of the highest 18 scoring players per round (If few of the players have the same score at position 18 then all scores are included in the overall score).  To use this PowerPivot solution, you need to be using the PowerPivot add-in for MS Excel.  This add-in is only available for Excel 2010 and higher versions.

Assume date wise revenue data has been shown on two sheets - Sports wear and Fitness equipment (Data worksheets).  On each of these two worksheets, there are two columns - Date and Revenue.  On the Summary worksheet, the user would like to summarise data (Count instances and sum revenue figures) based on multiple conditions.  The conditions for summarising are:

1. User specifies upper and lower revenue figures - Revenue figures from both sheets which satisfy the conditions (two conditions) should get added as well as rows should get counted (to know instances)
2. User specifies upper and lower date limits - Revenue figures from both sheets which satisfy the conditions (two conditions) should get added as well as rows should get counted (to know instances)
3. User specifies upper and lower revenue figures AND User specifies upper and lower date limits - Revenue figures from both sheets lying between these figures and dates should get added as well as rows should get counted (to know instances)

The restrictions are as follows:

1. Do not want to consolidate data from two data worksheets into a single worksheet
2. Do not want to perform any ancillary calculations on the two data worksheets

If these conditions are relaxed, then alternative solution such as Pivot Tables and SUMPRODUCT() function can be used.

You may refer to my solution in this workbook.

I have solved a related problem of summarising data from multiple worksheets here.

Assume there is data for complaints received and complaints resolved for two years - 2009 and 2010 for some regions regions.  Each Region has its own worksheet.  There are three headings on each sheet - Particulars, 2009 and 2010.  While the Particulars column has the text "Complaints received" and "Complaints resolved", the year columns have number of complaints received and resolved.

The task is to create a Summary sheet which:

1. Show the consolidated figure of complaints received and complaints resolved for both years
2. Show the individual regions which make up the consolidated figure in 1 above.

In this workbook, you may see the data layout and four different ways of resolving the two tasks mentioned above.

You may also refer to the following related posts:

Data on the sheet named "Sales Data Master" shows the Territory wise, Show Title wise and Sales man wise forecast and actual data.  One may want to summarise data based on an unknown combination of conditions.  Conditions may be entered as:

1. All three conditions specified
2. No condition specified
3. Only first condition specified
4. Only second condition specified
5. Only third condition specified
6. First and second condition specified
7. First and third condition specified
8. Second and third condition specified

You may refer to two solution workbooks here

1. Faster because data sheet formulas have been Pasted Special as Values - Download

A number is entered in cell A1 of every sheet in a multi sheet workbook.  The task is to sum numbers in cell C2 of the Summary sheet from cell A1 of last x sheets - x is the number of sheets (mentioned in cell B2 of the Summary sheet) from which numbers have to be summed up.  So if the number entered in cell B2 of sheet is 3, then sum numbers from cell A1 of last three sheets.  Worksheets can get added, removed or renamed.

You may refer to my solution in this workbook.

In each of the four worksheets (one for each week) of this workbook, there is a two column database showing Employee ID and Revenue earned by that employee in that week.  Not all employees appear on all sheets and there is no fixed cell reference for any employee on any sheet - therefore employee A001 can be on cell B4 of week 1 sheet but on cell B10 on week 2 sheet.

In the Summary, all unique Employee ID's have been listed in range B4:B17.  The question is "How can the employee wise monthly revenue be computed in range C4:C17".

I can think of two ways of solving this problem.

Method 1

Create a Pivot Table from four worksheets (without first copying data from all four worksheets into one).  To create a Pivot Table from multiple worksheets, you may refer to my solution at this link.

Method 2

Write a multi sheet array formula.

Assume a three column table arranged as follows: Circle, Date of Fault and ID.  Dates in the date range span one week - November 26, 2012 to December 2, 2012.  A particular equipment can be only one specific Region and the same equipment an go faulty multiple times within one week.  Data for one week is about 8,400 rows.

There are three questions to be answered from this data:

1. The Circle wise, count of ID's which went faulty more than twice between November 26, 2012 and December 2, 2012; and
2. The Circle wise, count of faulty instances more than twice between November 26, 2012 and December 2, 2012; and
3. Determine individual sites for 1 and 2 above

The difference between 1 and 2 above is "If a certain ID goes down 4 times, then for question1, the answer should be 1.  For question2, the answer should be 4."

The first question basically boils down to "Count of unique ID's by Circle which went faulty more than twice."

There are two ways one can go about answering the questions above:

Solution A - For Excel 2010 and higher versions - This solution is for those using the PowerPivot MS Excel add-in for Excel 2010 and higher versions.

Solution B - For all versions of MS Excel - This solution will work in all versions of MS Excel but for those using Excel 2010 and higher versions, the PowerPivot solution would be far more efficient.

The steps for creating a pivot table under Solution B for answering both questions above are:

1.  Count of downtime sites.xlsx is saved in a folder on the desktop;
2. Open the workbook, select the data on the Base_Data sheet (including the first row as the header row – it will be range A1:C8741.  Ensure that the header row has some distinctive formatting such as Bold or some colour) and press Ctrl+F3 > New.  In the Name box, type Dummy and click on OK > Close.
3. To cross check that the name assigned above has indeed been assigned correctly, select the data range once again and in the Name box (left of the formula bar), Dummy should appear.
4. Select range A1:C8741 of the Base_Data sheet again and press Ctrl+T to convert this range into a Table.  Ensure that the “My Table has headers” box is checked.  Save the workbook.
5. Open a new worksheet and go to Data > From Other Sources > From Microsoft Query
6. Under Databases, select Excel files > OK
7. In the Directories dialog box, navigate to the folder on the desktop where the workbook file is saved.  So for me, it is saved under C:\Users\Ashish\Desktop\ and double click on the folder where the workbook is saved.
8. In the left hand side window, select the Count of downtime sites.xlsx file and click on OK
9. With Dummy selected, click on the > symbol to bring over all columns of this named range to the right hand side box 10. Click on Next three times
11. Select the option of View Data or Edit Query in Microsoft Query
12. Click on the SQL button, delete the contents in the white space there and paste the following SQL Query

SELECT ucase(dummy.Circle) AS 'Circle', ucase(dummy.Indus_Site_ID) AS 'Indus_site_ID', Count(dummy.Indus_Site_ID) AS 'fault_frequency'
FROM `C:\Users\Ashish\Desktop\Count of downtime sites.xlsx`.dummy dummy
GROUP BY ucase(dummy.Circle), ucase(dummy.Indus_Site_ID)
HAVING (Count(dummy.Indus_Site_ID)>2)

13. Click on OK and on the message box which appears, click on OK
14. Under File, select the last option – Return Data to Microsoft Excel
15. At this stage, if you wish to get data in a tabular form, then select Table.  If you directly want a pivot table, select the second option button – Pivot Table.  For this example, select Pivot Table and in the cell reference box, select any cell where you would like to the result to appear, say cell A1.  Click on OK
16. A counter will run at the bottom left hand side with the title of Reading Data
17. Drag Circle and ID to the to the Row Labels
18. Drag Fault Frequency to the Value Area twice
19. Right click on any one number in the fault frequency column and under Summarise Value by, select Count
20. Right click on any value in the ID column and under Expand/Collapse, select Collapse Entire Field.

Assume a worksheet which has scores (1-10) on four future dates on nine questions.  Each row represents responses for one question.  Likewise there are 20 respondents i.e. 20 worksheets which have the same structure.

One may want to summarise data from all 20 worksheets into one worksheet.  The question and solution are more clearly explained in the attached file.

I have presented two cases here - Example 1 and Example 2 in this workbook.

You may also want to refer to the following related posts:

In range G13:G17, assume the following entries - VG:25, NS:50, NS:20, #DIV/0! and VG:150.  To sum up the values where VG appears, you may refer to my solution in this workbook.