Analyse free flowing text data or user entered remarks from multiple perspectives

{ 0 Comments }

By Ashish Mathur · January 19, 2019 · POWER QUERY + POWERPIVOT · Leave a comment

Here is a 2 column dataset – UserID in column A and Remarks in Column B. This dataset basically tabulates the remarks/comments shared by different users. Entries in the Remarks column are basically free flowing text entries which have the following inconsistencies/nuances:

Users reported multiple errors which are separated by comma, Alt+Enter (same line within the cell) and numbered bullets
Users committed spelling mistakes (see arrows in Table1)
A user ID may be repeated in column A

Given this dataset, one may want to “hunt” for specific “keyword Groups” (column E above) in each user remark cell and get meaningful insights. Some questions which one would like to have answers to are:

How may users reported each type of keyword Group – “How may users used the Unresponsive keyword?”. See Pivot Table1 below.
Which are the keyword Groups that each user reported – “Which are the different keyword groups reported by UserID A004?”. See Pivot Table2 below.
How many users reported each of the different keyword Groups – “How many users reported all 3 problems of Slow, unresponsiveness and crash”. See Pivot Table 3 below.
How may users who used this keyword group also used this keyword group – “How many users who reported Crash also reported Unresponsive?”. See Pivot Table 4 below.

This was quite a formidable challenge to solve because of spelling mistakes and multiple keywords reported in each cell. I have solved this problem with the help of Power Query and PowerPivot. You may download my workbook from here.

Leave a Comment Cancel reply