Affinity analysis encompasses a broad set of Analytic techniques aimed at uncovering the associations and connections between specific objects: these might be visitors to a website (customers or audience), products in a store or content items on a media site. Of these, “market basket analysis” is perhaps the most common example. In a market basket analysis, one analyses combinations of products that frequently co-occur in transactions. For e.g., Of all the shoppers today, how many purchased a school uniform and a school bag. A retailer can use this information to:
1. Improve the customer shopping experience by rearranging the store layout (place products that co-occur together close to one another); and
2. Run a focused marketing campaign (e.g. target customers who buy a school uniform with offers on school bags, to encourage them to spend more on their shopping basket)
Online retailers and publishers can use this type of analysis to:
1. Inform the placement of content items on their media sites, or products in their catalogue
2. Drive recommendation engines (like Amazon’s customers who bought this product also bought these products…)
3. Deliver targeted marketing (e.g. emailing customers who bought products specific products with other products and offers on those products that are likely to be interesting to them)
Consider a dataset with four columns - Date, Order Numbers, Items sold and Item Description. To simplify, let's ignore columns 1 and 4 for now. One may want to know the answer to the following question:
For a particular item sold (say Item X), in how many orders (order frequency) were items X and Item Y were sold together OR in how many orders (order frequency) were items X and Item Z together.
In the Excel workbook (download link below), number of rows in the Sales data sheet is 265,321 (file size is 11 MB) and the number of unique items sold is 23,201. In Excel, one has to pair each of the 23,201 items with the other 23,200 items to know if there is any affiliation or not. The sheer size of that dataset would make any formula unusable here.
I have solved this problem with the help of the PowerPivot. Please note that in the PowerPivot solution worksheet, I have filtered the Row labels on a specific Item number. This is so because if I attempt to clear the filter criteria to view all Item numbers, then I get a message saying that the number of rows exceeds 1 million. Also, since the calculations are very intensive, recalculation takes time when you change the criteria in the filter dropdown of the Pivot Table. Therefore, once you change the filter criteria in the Pivot Table, wait for the Reading data counter (bottom right on the taskbar) to finish processing.
1. Cell C5 - 1,725 orders contained the Item number TNB521711234E. This figure can be verified via the following steps:
a. On the Sales data worksheet, filter the item number column on TNB521711234E.
b. Copy the resulting rows of the Order number column to another worksheet
c. Select the data so copied and go to Data > Remove Duplicates
d. Count the ensuing rows
2. Cell C6 - Of the 1725 orders containing Item number TNB521711234E, 540 contained Item number TNB52C1
3. Cell C7 - Of the 1725 orders containing Item number TNB521711234E, 446 contained Item number EMT34
You may refer to my solution in this workbook.