Here is a 2 column dataset – UserID in column A and Remarks in Column B. This dataset basically tabulates the remarks/comments shared by different users. Entries in the Remarks column are basically free flowing text entries which have the following inconsistencies/nuances:
- Users reported multiple errors which are separated by comma, Alt+Enter (same line within the cell) and numbered bullets
- Users committed spelling mistakes (see arrows in Table1)
- A user ID may be repeated in column A
Given this dataset, one may want to “hunt” for specific “keyword Groups” (column E above) in each user remark cell and get meaningful insights. Some questions which one would like to have answers to are:
- How may users reported each type of keyword Group – “How may users used the Unresponsive keyword?”. See Pivot Table1 below.
- Which are the keyword Groups that each user reported – “Which are the different keyword groups reported by UserID A004?”. See Pivot Table2 below.
- How many users reported each of the different keyword Groups – “How many users reported all 3 problems of Slow, unresponsiveness and crash”. See Pivot Table 3 below.
- How may users who used this keyword group also used this keyword group – “How many users who reported Crash also reported Unresponsive?”. See Pivot Table 4 below.
This was quite a formidable challenge to solve because of spelling mistakes and multiple keywords reported in each cell. I have solved this problem with the help of Power Query and PowerPivot. You may download my workbook from here.