The University of Massachusetts Amherst
University of Massachusetts Amherst

Search Google Appliance


Modeling multivariable hydrological series: Principal component analysis or independent component analysis?

TitleModeling multivariable hydrological series: Principal component analysis or independent component analysis?
Publication TypeJournal Article
Year of Publication2007
AuthorsBrown C, Westra S, Lall U, Sharma A
JournalWater Resources Research
Start PageW06429

The generation of synthetic multivariate rainfall and/or streamflow time series that accurately simulate both the spatial and temporal dependence of the original multivariate series remains a challenging problem in hydrology and frequently requires either the estimation of a large number of model parameters or significant simplifying assumptions on the model structure. As an alternative, we propose a relatively parsimonious two-step approach to generating synthetic multivariate time series at monthly or longer timescales, by first transforming the data to a set of statistically independent univariate time series and then applying a univariate time series model to the transformed data. The transformation is achieved through a technique known as independent component analysis (ICA), which uses an approximation of mutual information to maximize the independence between the transformed series. We compare this with principal component analysis (PCA), which merely removes the covariance (or spatial correlation) of the multivariate time series, without necessarily ensuring complete independence. Both methods are tested using a monthly multivariate data set of reservoir inflows from Colombia. We show that the discrepancy between the synthetically generated data and the original data, measured as the mean integrated squared bias, is reduced by 25% when using ICA compared with PCA for the full joint distribution and by 28% when considering marginal densities in isolation. These results suggest that there may be significant benefits to maximizing statistical independence, rather than merely removing correlation, when developing models for the synthetic generation of multivariate time series.