First thing, I'm not sure what is the best tool / script to achieve what I want to do. I guess that either Excel or SQL should do the job. If you need something else, there should not be a problem if its relatively easy to use.
The objective of the project is to be able to sort a list containing many keywords (many thousands). Here is exactly what I need
1. The script will analyze all the keywords in the list and report a table with the TOP 100 keywords with the highest occurence. So let's say we have 3 keywords:
a. computer printer
b. computer supplies
c. computer printer supplies
The tool would report the following:
Computer : 3 occurences
Printer: 2 occurences
Supplies: 2 occurences
Computer Printer : 2 occurences
Computer supplies: 1 occurence
Computer printer supplies: 1 occurence
This is the first step. Figuring out what are the most popular words in the document.
2. Once we get the TOP 100 word, I want to be able to select among those 100, the ones I want to use, let's call them the Selected Groups. So some sort of Selection process where I check what i'm interested in. Per example i would select 1. Printer and 2. Supplies
3. I now want the tool to sort all the keywords in the Selected Groups according to what they contain. The tool would then output something like:
- computer printer
- computer supplies
As you can see, the issue would be sorting the third word 'computer printer supplies' which would belong to both groups. We will simply use a rule that says that the keyword will belong to the first word we find in the keyword. So in the case of 'computer printer supplies' the first word to be found is 'Printer', the keyword would then be allocated to the Printer group.
4. Once this is done, most of the keywords should belong to a group. however some may not, so I want to know which ones
I need this asap, let me know if you need more details