I need a database (for automatic content categorization system) of English words. It must contain at least 200 000 words.
Each word needs the information about 3-5 categories probability (for Naive Bayes method).
The list of categories must look like:
[url removed, login to view]
It MUST be hierarchical.
Shopping -> 0.3
Library -> 0.3
Library:Education -> 0.2
Entertainment:Humor & Fun -> 0.2
(etc.. ) hope you catch an idea :)
The database must also include stop-words (e.g. "to be" -> no category) and word-combinations.
Please give your price for it. Hope someone already has such database..
I don't care the data format - any will be appreciated (which can be converted).