I have a big data set (consisting of a lot of different times series of max 12 years - where not all have the same starting point) that I would like to analyze. I want to find common patterns in the different time series to be able to cluster certain time series together. first steps would involve some data cleaning (common starting point for all time series as some are shorter/ some longer), before using distance/ similarity measures to cluster time series together that show similar paths.
1) assure data cleansing is correct for common starting point
2) Identify suitable (dis)similarity measure to enable time series to be grouped/clustered according to similar patterns in their time series (suggestions: DTWARP, CDM or NCD -> perform all and check which ones gives me best outcome)
3) Perform conventional clustering algorithm based on the (dis)similiaries across the time series to form groups - using k means or partitioning around mediods
4) find reasons for these patterns via multinomial logistic regression to see how some additional variables effect the respective clusters