Tags: IF

Compute hours spent on projects given resource allocation

{2 Comments}

In the dataset below column A has the Employee Name, column B and C are the assignment start and end dates, Column D is the location and columns E to J are the Month-Year columns.  So each row represents data for an employee on a particular project.  The numbers in range E2:J8 represent how much that particular employee is aligned to the particular project i.e. a value of 1 means that the employee is dedicated solely to that project, 1.4 means that the employee will be spending extra hours on that project and 0.1 indicates that the employee will be working on multiple other projects.

The objective is to create another column (column K in the second screenshot) which will show the number of hours the employee will spend on the project.  The number of hours will be computed as number of working days in a month (treat Saturday and Sunday as weekends) * time allocation to that project (the numbers in range E2:J8) * 8.5 hours per day for an Offshore project and 8 hours per day for other projects.

The raw data sheet looks like this

The expected result is

The figure in cell K3 has been computed as:

  • Number of working days between November 11, 2018 and November 30, 2018 are 15.  So 15 * 1 = 15
  • Number of working days between December 1, 2018 and December 12, 2018 are 8.  So 8 * 0.5 = 4
  • Total effective working days are 15 + 4 = 19
  • Since it is an Offshore project, the hours per day would be 8.5.  Therefore total effective hours: 19 * 8.5 = 161.5

I have solved this problem using 3 methods:

  1. Excel formulas - Refer worksheet named "Formula output"
  2. Power Query and PowerPivot - Refer worksheet named "Power Pivot output"
  3. Power Query only - Refer worksheet named "Power Query output"

You may download my solution workbook from here.

Compute Relative Size Factor per vendor

{0 Comments}

Relative size factor (RSF) is a test to identify anomalies where the largest amount for subsets in a given key is outside the norm for those subsets. This test compares the top two amounts for each subset and calculates the RSF for each. In order to identify potential fraudulent activities in invoice payment data, one utilizes the largest and the second-largest amounts to calculate a ratio based on purchases that are grouped by vendors.  You may read more on this topic here.

Here is a 3 column dataset.  The first column is Vendor Number, the second is Invoice number and last is invoice amount.  There can be multiple invoices per vendor.  The objective is to determine the highest invoice value for a vendor and divide that by the second highest invoice value for that same vendor to get a ratio.  The same needs to be done for all vendors.  An interesting case in the dataset below is Vendor_No V4439 - there are 2 instances of highest value for this vendor i.e. 25,378.30 and another 2 instances of second highest value i.e. 24,068.25.  The RSF for this case will be 25,378.30/24,068.25.  If there is no instance of second highest value for a vendor, then the result should be 0.

The expected result is:

I have solved this question with the help of the PowerPivot.  You may download my solution workbook from here.

Analyse free flowing text data or user entered remarks from multiple perspectives

{0 Comments}

Here is a 2 column dataset - UserID in column A and Remarks in Column B.  This dataset basically tabulates the remarks/comments shared by different users.  Entries in the Remarks column are basically free flowing text entries which have the following inconsistencies/nuances:

  1. Users reported multiple errors which are separated by comma, Alt+Enter (same line within the cell) and numbered bullets
  2. Users committed spelling mistakes (see arrows in Table1)
  3. A user ID may be repeated in column A

Given this dataset, one may want to "hunt" for specific "keyword Groups" (column E above) in each user remark cell and get meaningful insights.  Some questions which one would like to have answers to are:

  1. How may users reported each type of keyword Group - "How may users used the Unresponsive keyword?".  See Pivot Table1 below.
  2. Which are the keyword Groups that each user reported - "Which are the different keyword groups reported by UserID A004?".  See Pivot Table2 below.
  3. How many users reported each of the different keyword Groups - "How many users reported all 3 problems of Slow, unresponsiveness and crash".  See Pivot Table 3 below.
  4. How may users who used this keyword group also used this keyword group - "How many users who reported Crash also reported Unresponsive?".  See Pivot Table 4 below.

This was quite a formidable challenge to solve because of spelling mistakes and multiple keywords reported in each cell.  I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook from here.

Show Project wise status in a Pivot Table

{0 Comments}

Visualise a simple 6 column Table as shown below - Project Name and the finish date for each of the 5 stages that the projects go through.  Each project goes through 5 stages - Requirement (Req), Development (Dev), UAT, Implement and Warranty.

The objective is to report on the status of each project at the end of each month based on which stage is/was completed in that month.  So, if a given project's requirements are completed in January and development completes some time in March, the one would expect the output of the report to show the project's status in January and February as "Req" and in March as "Dev" respectively.  February should also show "Req" because the next stage was completed only in March (although it may have started in January).  If multiple stages complete in one month, then the report should display only the most recently completed stage.  So, if Project A completed both Requirements and Development stages in January, the report should show only "Dev" as the stage completed in January.

For the data shared above, the expected result is:

You may download my solution workbook from here.

Determine the lowest bidding vendor(s) for each product in a Pivot Table

{0 Comments}

Imagine a dataset like this.  This dataset shows vendors that submitted proposals for supplying various parts to a Company.  There is one column for each of the twelve months.

untitled

Via a simple Pivot Table, one can determine the lowest bidding vendor per product (part) for any chosen month.  However, one may also want to know the names of those vendors for each product (as seen in column G below).  Notice, that Vendor 2 and Vendor 3 submitted the lowest bid for Product 1 and therefore both names should appear in the result.

untitled

I have solved this problem using PowerPivot and Power Query a.k.a. Data > Get & Transform in Excel 2016.  You may download my solution workbook from here.

Show sales only for corresponding months in prior years

{0 Comments}

Refer to this simple Sales dataset

untitled

The objective is to create a simple matrix with months in the row labels, years in the column labels and sales figures in the value area section.  The twist in the question is that for years prior to the current year (2018 in this dataset), sales should only appear till the month for which there is data for the current year.  For e.g., for 2018, data is only till Month 4 and therefore for prior years as well, data should only appear till Month 4.  As and when Sales data gets added below row 17, data for prior years should also go up to that month.

The expected result is

untitled1

You may download my PBI file from here. The same solution can be obtained in Excel as well (using Power Query and PowerPivot).

Filtering on 2 date fields within one Table

{0 Comments}

This table contains a list of all the inspections created and completed within different time periods.

untitled

The objective is to create two Tables from this single table - one showing the Accounts created within the chosen time period and another showing the those that were closed within the same time period.  Here are screenshots of the expected results.

untitled1 untitled2 untitled3 untitled4

You may download my PowerBI desktop solution workbook from here.  The same solution can be obtained in Excel as well (using Power Query and PowerPivot).

Determine the most recent status after satisfying certain conditions

{0 Comments}

Assume a three column dataset with Patient ID, Smoking Status and Review Date

PatientID SmokingStatus ReviewDate
P1 10-03-2018
P1 9 09-03-2018
P1 1 08-03-2018
P1 4 07-03-2018
P2 9 10-03-2018
P2 9 09-03-2018
P2 9 08-03-2018
P2 9 07-03-2018
P3 2 10-03-2018
P3 09-03-2018
P3 9 08-03-2018
P4 9 10-03-2018
P4 1 09-03-2018
P4 4 08-03-2018

The objective is the create another 3 column dataset with the following conditions:

  1. If the patient's latest smoking status is other than Blank or 9, then consider that as the smoking status of the patient; and
  2. If the patient's latest smoking status is blank or 9, then consider the previous smoking status that is not blank or 9; and
  3. If the patient's smoking status is blank or 9 on all dates, then consider the smoking status as 9

The expected result is:

PatientID Last date when the smoking status was other than 9 or Blank Smoking status on that date
P1 08-Mar-18 1
P2 10-Mar-18 9
P3 10-Mar-18 2
P4 09-Mar-18 1

I have solved this question using 3 methods - PowerPivot, Advanced Filters and formulas.  You may download my solution workbook from here.

Combine unique entries from a range of cells after satisfying a condition

{2 Comments}

Here is a simple three column dataset showing Item ID, Catalogue Name and Catalogue Year

Item ID Catalogue Name Catalogue Year
A Spring 2012
A Spring 2016
A Spring 2017
B Fall 2014
B Fall 2014
B Fall 2016

The second and third last rows are clear duplicate rows. The task is to create a fourth column (titled as All Year) in this dataset with concatenated entries from the Catalogue Year column for each Item ID. So the expected result is:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2016
B Fall 2014 2014, 2016
B Fall 2016 2014, 2016

Notice that for Item ID B, the result in every cell of the fourth column is 2014, 2016 and not 2014,2014,2016.  So while joining (concatenating in technical MS Excel terminology) entries from the "Catalogues Year" column in the "All Year" column, only unique entries for that Item ID have to be considered.

Excel 2016 has a TEXTJOIN() function which gets use close to the final result but does not take into account the unique entries only (it takes all entries into account all entries).  On using the TEXTJOIN() array formula in a spare column, here is the result i got:

Item ID Catalogue Name Catalogue Year All Year
A Spring 2012 2012, 2016, 2017
A Spring 2016 2012, 2016, 2017
A Spring 2017 2012, 2016, 2017
B Fall 2014 2014, 2014, 2016
B Fall 2014 2014, 2014, 2016
B Fall 2016 2014, 2014, 2016

Notice that for Item ID B, the result is 2014, 2014, 2017 and not 2014, 2017.  So the TEXTJOIN() function does not work.

I have solved this problem using Power Query (Get and Transform in Excel 2016).  You may download my solution workbook from here.

Prepare an invigilation schedule for each teacher by different time periods

{5 Comments}

Imagine a multi column exam invigilation schedule with the following information

  1. S. No.
  2. Name of staff
  3. Designation of staff member
  4. Two columns for each day on which there is an exam - one for Morning and another for Afternoon
  5. A * under each column if that particular staff member has to be an invigilator during that time period
S. No. Name of staff Designation 09/11/2017-Morning 09/11/2017-Afternoon 10/11/2017-Morning 10/11/2017-Afternoon
1 Tom Lecturer * * *
2 Sam Lecturer * *

This dataset stretches into many more columns.  So if the exams last for 15 days, there will be 30 columns.  The objective is to condense the column expanding dataset into a 5 column one - S. No., Name of Staff, Designation, Morning and Evening.  Under the Morning and Evening columns, the different dates have to be separated by commas in that one single cell.  The expected result is:

Name of staff S. No. Designation Morning Afternoon
Tom 1 Lecturer 09/11/2017, 10/11/2017, 11/11/2017, 13/11/2017, 14/11/2017, 16/11/2017, 17/11/2017 10/11/2017, 13/11/2017, 14/11/2017, 15/11/2017, 16/11/2017, 17/11/2017
Sam 2 Lecturer 09/11/2017, 13/11/2017, 15/11/2017, 16/11/2017 09/11/2017

I can think of 3 ways to solve this problem. Here's a brief about each of those methods:

  1. Using Power Query and Excel's functions - Using Power Query, one can first unpivot the data and then use the TEXTJOIN() function.  This would be an array formula.
  2. Using Power Query only - Using Power Query, one can first unpivot the data and then modify the code to concatenate all dates in a single cell for that invigilator.  One can then Pivot the data in Power Query itself
  3. Using Power Query and PowerPivot - Using Power Query, one can first unpivot the data and then use the CONCATENATEX() function of the PowerPivot.

As you can observe, the common thread in all solutions is that one has to first unpivot the dataset.

You may refer to all my 3 solutions in this workbook.