The skill set needed is an application developer who has expertise with statistical programs such as SAS or other Business Intelligence/Data Mining tools.
The task involves several thousand customers who are currently classified based upon a scheme that involves their purchased items, not including pricing. The universe of items that can be purchased is 457 and customers can buy varying quantities of each item, but there is some (yet to be uncovered) correlation of what they buy and the quantities in which they buy the items.
Our objective is to take a single store, look at the universe of its customer, retain, for the sake of reference their original classification marker; and then using Stastica ([url removed, login to view]), do another classification based upon the charges for the items these customers purchased.
Then we want to compare the original classification scheme with the new price based scheme to determine the correlation between the original and the new schemes. As well, within the clusters, we want to determine the correlation between the items purchased, with PCA, ANOVA and P value analysis.
Once we have completed such an analysis with a single store, we want to do comparisons with stores of similar characteristics, such as geography, size, etc. So, essentially we’ll be doing the same analysis, however, we’ll now also retain the store number, so we can compare across stores.
As well, once were successful with these tasks we want to use historical data and complete the same analysis, to detect any trends.
With Stastica, the output is to a spreadsheet, a table. From those tables, I want to produce web enabled interactive graphs, using Fusion Charts ([url removed, login to view])
Finally, we want to create a model to enable stores to make these comparisons on a batch basis, at the end of their business day. In other words, we’ll take the model and enable stores to run their results through the model on a daily basis.
The developer will have access to the data and tools on our server; the data is stored in SQL SERVER data base. The Statistica algorithms to be used are their unsupervised classification algorithm and the Random Forest algorithm. We anticipate using Kohonen Self Organizing Maps algorithm from the Statistical Automated Neural Networks offering, and the Statistica Multivariate Exploratory Techniques.
If additional clarification or explanation is needed please feel free to contact me.