# Tags: USERELATIONSHIP

Here's a simple Sales data of a retail Store which sells Apple Products.  Since a customer can transact multiple times, there will be repetitions in the Cust ID column.  While Cust ID 123 and 782 purchased multiple products from the same Store in one transaction, Cust ID 53 purchased multiple products from different stores (Store ID 165 and 45) [and therefore the Order ID's are also different (Order ID 2 and 6)].

On this small sample, one may want to identify "Cross and up selling opportunities" i.e. one may want to know which are the Customers that can be approached for selling more products to.  So for e.g. one may want to know which Customers have bought only one product so far.  A case in point being the Apple Watch - Customer ID 2442 and 428 bought only this product.  The other Customers who bought the Apple Watch also bought atleast one more product.  Therefore, Cust ID 2442 and 428 could be approached for buying other products as well.

Solving this via conventional Excel formulas and Pivot Tables would prove to be a formidable challenge.  I have solved this problem using a Data visualisation software from Microsoft called PowerBI desktop (it can be downloaded free from the Microsoft website).  This problem can also be solved in MS Excel using Power Query and Power Pivot.

There are 3 sections in the image below - Table at the top (First Table), slicers at the right and another Table at the bottom (Second Table).

Interpretation of First Table

1. 1 appearing at the intersection of APPLE TV (row labels) and APPLE TV (column labels) represents that there is 1 customer who bought the APPLE TV
1. 1 appeaing at the intersection of APPLE TV (row labels) and MACBOOK AIR (column labels) represents that the 1 customer who bought the APPLE TV also bought the MACBOOK AIR
2. 4 appearing at the intersection of APPLE WATCH (row labels) and APPLE WATCH (column labels) represents that there are 4 customers who bought the APPLE WATCH
1. 1 appearing in other columns of the same row represents other products which those customers bought
2. When one right click's on APPLE WATCH and selects "Drill down", one will be able to see the Customers who bought the other products as well.
1. Customer 53 bought the APPLE WATCH, AIRPORT and IPHONE 8S.  Customer 123 bought APPLE WATCH, IPHONE X and IPOD
2. Customers 2442 and 428 did not buy any other product

Interpretation of Second Table

This table shows a list of Customers (and their transaction details) who bought only and only that one product selected by the user in the filter section (see the red oval selection in the image).  So these two customers could be approached for selling more products to.

Affinity analysis encompasses a broad set of Analytic techniques aimed at uncovering the associations and connections between specific objects: these might be visitors to a website (customers or audience), products in a store or content items on a media site. Of these, “market basket analysis” is perhaps the most common example. In a market basket analysis, one analyses combinations of products that frequently co-occur in transactions.  For e.g., Of all the shoppers today, how many purchased a school uniform and a school bag.  A retailer can use this information to:

1. Improve the customer shopping experience by rearranging the store layout (place products that co-occur together close to one another); and
2. Run a focused marketing campaign (e.g. target customers who buy a school uniform with offers on school bags, to encourage them to spend more on their shopping basket)

Online retailers and publishers can use this type of analysis to:

1. Inform the placement of content items on their media sites, or products in their catalogue
2. Drive recommendation engines (like Amazon’s customers who bought this product also bought these products…)
3. Deliver targeted marketing (e.g. emailing customers who bought products specific products with other products and offers on those products that are likely to be interesting to them)

Consider a dataset with four columns - Date, Order Numbers, Items sold and Item Description.  To simplify, let's ignore columns 1 and 4 for now.  One may want to know the answer to the following question:

For a particular item sold (say Item X), in how many orders (order frequency) were items X and Item Y were sold together OR in how many orders (order frequency) were items X and Item Z together.

In the Excel workbook (download link below), number of rows in the Sales data sheet is 265,321 (file size is 11 MB) and the number of unique items sold is 23,201.  In Excel, one has to pair each of the 23,201 items with the other 23,200 items to know if there is any affiliation or not.  The sheer size of that dataset would make any formula unusable here.

I have solved this problem with the help of the PowerPivot.  Please note that in the PowerPivot solution worksheet, I have filtered the Row labels on a specific Item number.  This is so because if I attempt to clear the filter criteria to view all Item numbers, then I get a message saying that the number of rows exceeds 1 million.  Also, since the calculations are very intensive, recalculation takes time when you change the criteria in the filter dropdown of the Pivot Table.  Therefore, once you change the filter criteria in the Pivot Table, wait for the Reading data counter (bottom right on the taskbar) to finish processing.

Interpretation

1. Cell C5 - 1,725 orders contained the Item number TNB521711234E.  This figure can be verified via the following steps:
a. On the Sales data worksheet, filter the item number column on TNB521711234E.
b. Copy the resulting rows of the Order number column to another worksheet
c. Select the data so copied and go to Data > Remove Duplicates
d. Count the ensuing rows
2. Cell C6 - Of the 1725 orders containing Item number TNB521711234E, 540 contained Item number TNB52C1
3. Cell C7 - Of the 1725 orders containing Item number TNB521711234E, 446 contained Item number EMT34

You may refer to my solution in this workbook.

Assume a dataset with two columns which lists down the student names in column A and courses opted for in column B.  Since one student can opt for multiple courses and the same course can be taken up by multiple students, there can be repetitions in both columns.  The objective is to create a matrix like data structure (with courses appearing in both row and column labels) with numbers inside the matrix quantifying the "Number of students who opted for course A and C".  So, for all possible course combinations, one may want to know the number of students who opted for those combinations.

The description above can be extended to cases where buying behavior has to be analysed.  A sore manager may want to know "How many people who buy Brand A also buy Brand B."

Here's a snapshot of the source data and expected result

The number 1 in cell H4 (and cell F6) means that there is only one student who opted for courses B and D.  Likewise, 3 in cell H5 (and cell G6) means that 3 students opted for courses C and D.

You may refer to my Power Query and PowerPivot solution in this workbook.  Power Query has been used for generating a dynamic list of Courses and Power Pivot has been used for writing the DAX formula for quantifying within the matrix.