Analyse free flowing text data or user entered remarks from multiple perspectives


Here is a 2 column dataset – UserID in column A and Remarks in Column B.  This dataset basically tabulates the remarks/comments shared by different users.  Entries in the Remarks column are basically free flowing text entries which have the following inconsistencies/nuances:

  1. Users reported multiple errors which are separated by comma, Alt+Enter (same line within the cell) and numbered bullets
  2. Users committed spelling mistakes (see arrows in Table1)
  3. A user ID may be repeated in column A

Given this dataset, one may want to “hunt” for specific “keyword Groups” (column E above) in each user remark cell and get meaningful insights.  Some questions which one would like to have answers to are:

  1. How may users reported each type of keyword Group – “How may users used the Unresponsive keyword?”.  See Pivot Table1 below.
  2. Which are the keyword Groups that each user reported – “Which are the different keyword groups reported by UserID A004?”.  See Pivot Table2 below.
  3. How many users reported each of the different keyword Groups – “How many users reported all 3 problems of Slow, unresponsiveness and crash”.  See Pivot Table 3 below.
  4. How may users who used this keyword group also used this keyword group – “How many users who reported Crash also reported Unresponsive?”.  See Pivot Table 4 below.

This was quite a formidable challenge to solve because of spelling mistakes and multiple keywords reported in each cell.  I have solved this problem with the help of Power Query and PowerPivot.  You may download my workbook from here.

Leave a Comment

Your email address will not be published.