Tabulating data from multiple unstructured Excel files

{ 0 Comments }

Many a times data downloaded from Applications/ERP’s are not in a filter/Pivot ready format.  In such cases, a lot of time has to first be invested in getting that data in proper order before even beginning to analyse that data.  What makes this situation worse is that data is downloaded every month in that unstructured […]

Read More →

Calculate rolling sum for the past week by ignoring blank cells

{ 0 Comments }

Assume a simple dataset as shown in the image below (the input data is in columns A and B only.  The desired outcome is in columns C and D). The objective is to calculate the 7 days rolling sum and average (as shown in columns C and D) ignoring blank cells.  So in cell C8, […]

Read More →

Append data from multiple worksheets of multiple workbooks where each worksheet has a different heading

{ 0 Comments }

In a folder there are multiple workbooks with an unknown number of worksheets in each workbook.  Each worksheet has data for one year and has 13 columns – the first is for the Product and the other 12 are for each month of the year.  So sheet1 of Book1 has Product in column1, 1 Jan […]

Read More →

Summarise data by most recent status

{ 0 Comments }

Here’s a simple 3 column dataset showing Date, ID and Status – the status of each ID by Date. So, the narrative for ID A is: It was “New” on Jan 1 It remained “New” until Jan 14 On Jan 15, the status changed to “Open” It remained “Open” till Jan 31 and the status […]

Read More →

Segment customers into dynamic buckets

{ 0 Comments }

Consider a 4 column table – Respondent ID, Device ID, App Name and Category.  So this dataset shows which apps are installed on which device ID by which user and which category do the apps fall into.  It is a small dataset with only 4 columns and 2,000 rows. The question on this dataset is […]

Read More →

Compute hours spent on projects given resource allocation

{ 2 Comments }

In the dataset below column A has the Employee Name, column B and C are the assignment start and end dates, Column D is the location and columns E to J are the Month-Year columns.  So each row represents data for an employee on a particular project.  The numbers in range E2:J8 represent how much […]

Read More →

Customer analysis by Country and time period

{ 2 Comments }

Here is a Sales dataset of 8 columns and 29 rows.  It basically details the revenue earned and cash collected by service type, Customer, Country and Period.  For a selected Country and time period, there could be customers availing of both services or of any 1 service. There are 2 broad questions that one may […]

Read More →

Compute Relative Size Factor per vendor

{ 2 Comments }

Relative size factor (RSF) is a test to identify anomalies where the largest amount for subsets in a given key is outside the norm for those subsets. This test compares the top two amounts for each subset and calculates the RSF for each. In order to identify potential fraudulent activities in invoice payment data, one […]

Read More →

Analyse free flowing text data or user entered remarks from multiple perspectives

{ 0 Comments }

Here is a 2 column dataset – UserID in column A and Remarks in Column B.  This dataset basically tabulates the remarks/comments shared by different users.  Entries in the Remarks column are basically free flowing text entries which have the following inconsistencies/nuances: Users reported multiple errors which are separated by comma, Alt+Enter (same line within […]

Read More →

Determine the top selling location for each product

{ 0 Comments }

Visualise a 3 column dataset as shown below – Location, Product and Sales.  Each location can have multiple products (Product A has Banana, Apple and Carrot) and each product can be sold in multiple locations (Banana is sold in locations A, B and F). The objective is to determine the location with highest sales for […]

Read More →