Perform an “Affinity analysis” to identify co-selling products

{ 6 Comments }

Affinity analysis encompasses a broad set of Analytic techniques aimed at uncovering the associations and connections between specific objects: these might be visitors to a website (customers or audience), products in a store or content items on a media site. Of these, “market basket analysis” is perhaps the most common example. In a market basket analysis, one analyses combinations of products that frequently co-occur in transactions.  For e.g., Of all the shoppers today, how many purchased a school uniform and a school bag.  A retailer can use this information to:

1. Improve the customer shopping experience by rearranging the store layout (place products that co-occur together close to one another); and
2. Run a focused marketing campaign (e.g. target customers who buy a school uniform with offers on school bags, to encourage them to spend more on their shopping basket)

Online retailers and publishers can use this type of analysis to:

1. Inform the placement of content items on their media sites, or products in their catalogue
2. Drive recommendation engines (like Amazon’s customers who bought this product also bought these products…)
3. Deliver targeted marketing (e.g. emailing customers who bought products specific products with other products and offers on those products that are likely to be interesting to them)

Consider a dataset with four columns – Date, Order Numbers, Items sold and Item Description.  To simplify, let’s ignore columns 1 and 4 for now.  One may want to know the answer to the following question:

For a particular item sold (say Item X), in how many orders (order frequency) were items X and Item Y were sold together OR in how many orders (order frequency) were items X and Item Z together.

In the Excel workbook (download link below), number of rows in the Sales data sheet is 265,321 (file size is 11 MB) and the number of unique items sold is 23,201.  In Excel, one has to pair each of the 23,201 items with the other 23,200 items to know if there is any affiliation or not.  The sheer size of that dataset would make any formula unusable here.

I have solved this problem with the help of the PowerPivot.  Please note that in the PowerPivot solution worksheet, I have filtered the Row labels on a specific Item number.  This is so because if I attempt to clear the filter criteria to view all Item numbers, then I get a message saying that the number of rows exceeds 1 million.  Also, since the calculations are very intensive, recalculation takes time when you change the criteria in the filter dropdown of the Pivot Table.  Therefore, once you change the filter criteria in the Pivot Table, wait for the Reading data counter (bottom right on the taskbar) to finish processing.

Interpretation

1. Cell C5 – 1,725 orders contained the Item number TNB521711234E.  This figure can be verified via the following steps:
a. On the Sales data worksheet, filter the item number column on TNB521711234E.
b. Copy the resulting rows of the Order number column to another worksheet
c. Select the data so copied and go to Data > Remove Duplicates
d. Count the ensuing rows
2. Cell C6 – Of the 1725 orders containing Item number TNB521711234E, 540 contained Item number TNB52C1
3. Cell C7 – Of the 1725 orders containing Item number TNB521711234E, 446 contained Item number EMT34

You may refer to my solution in this workbook.

Leave a Comment

Your email address will not be published.

*

  • Hello,
    I am working on a very similar problem. So far I set up the Power Pivot in a similar way to you. However, I cannot seem to access the “Calculated field 1” equation. This has stopped my progress. Would you be willing to share your calculation equation with me?
    Thanks!
    – Michael

    • Hi,

      This is the calculated Field 1 equation

      =CALCULATE ([Distinct Orders],CALCULATETABLE(SUMMARIZE(Sales_data,Sales_data[Order Number]),ALL(Items_numbers),USERELATIONSHIP(Sales_data[2nd Item Number],Item_numbers1[2nd Item Number])))

      Distinct Count is computed as

      =DISTINCTCOUNT(Sales_data[Order Number])

      Hope this helps.

  • Can you please explain how I can replace it with my own data? I attempted to add my product sales data to your template but receive an error that ‘Column ‘2nd Item Number’ in Table ‘Item_numbers’ contains blank values and this is not allowed for columns on the one side of a many-to-one relationship or for columns that are used as the primary key of a table’. How do I update the queries and tables to reflect my data? Thank you!

    • Hi,
      Share the version of the file in which you have replaced my data with yours. Ensure that you do not tamper with the headings in my datasets. Let me formulas remain intact.

      • Hi Ashish, I figured out the issue – it was simple user error. My data had less rows and I didn’t refresh the table size which was creating the ‘blank values’ error. It is working as expected now. Thank you!