I am currently working on a project which involves establishing the correlations between prices of investment grade corporate bonds on various electronic trading platforms. Further to this I would like to investigate the correlation between these prices and several external factors which may influence them.
An individual inputs a series of prices onto a screen that represents an electronic market place to trade OTC (over the counter) instruments in this case it is investment grade corporate bonds. The price is a spread in bps (or %) over a specified government bond. Prices can inputed or withdrawn at any time and may be a bid (price where individual buys) or an offer (price where individual sells) l or both. The notional size can be anything as long as it is greater than or equal to a specified minimum. For the purpose of this treatment we will not take account of the size the market is made in. Each data set will be a maximum of 50 prices or less.
Using a piece of logic we can establish a good idea of the mid price for each time point at which spreads are entered onto the platform. The theoretical mid is the arithmetic average of the bid and offer, although sometimes we will need to exclude one side of the price or even both of these if they are identified as outliers in the data set. Thus we can establish a table of mid time vs mid price and we can plot these onto a scatter graph so that we can establish an idea of the evolution of price against time. What we are trying to see is if for a certain period of time an individual is adjusting consequential prices such that they are trending in a particular direction. There may be external factors feeding in to this trend and hence we will need to consider these also.
At the moment I do not have a data set so the discussion is theoretical.
I would like to workout robust methodologies to satisfy the following :
1\ What kind of correlation is there between the random variables established in our table of mid time vs mid price? This may be that the prices are increasing, decreasing or that there is no clear relationship. The second part of this will be to test the statistic derived from the association to determine whether it is statistically significant.
Before establishing this I believe that the distribution of spreads from prices of corporate bonds would look something like what we would see in equity prices and hence the distribution will have fat tails. The significance of this is that we are unable to make the assumption that the underlying prices are distributed normally. From my understanding of calculating a Pearson correlation coefficient or a linear regression we would need that assumption of an underlying normal distribution to be met which it would not [login to view URL] may be possible to use a Spearman rank correlation calculation to successful identify an increasing or decreasing linearity within the set of prices. Even if that was possible it still leaves us without a way of testing that answer for statistical significance because of the almost certain failure of ANOVA in this situation.
If the Spearman Rank correlation can be successfully applied, then is it possible to use these results by fitting them to a Normal distribution to calculate probabilities of very high and very low correlations?
2\ On establishing the correlation/association between the prices as above we then need work out what the correlation is between this and a set of independent, external variables. These may include factors such as single stock, equity index and credit index prices. Essentially we want to strip out the impact of these external variables in order to isolate the effect of an individual moving the prices themselves.
In summary I need some detailed advice on how to deal with the issues raised in points 1\ and 2\. In particular the aspects related to the lack of normal distributions in the underlying variables and hence the limited statistical analysis remaining to us.