A Model-Based Approach Machine Learning to Scalable Portfolio Selection


Ana Paula S. Gularte1,2and Vitor V. Curtis1,2, 1Aeronautics Institute of Technology(ITA), Brazil, 2Federal University of São Paulo (UNIFESP), Brazil


This study proposes a scalable asset selection and allocation approach using machine learning that integrates clustering methods into portfolio optimization models. The methodology applies the Uniform Manifold Approximation and Projection method and ensemble clustering techniques to preselect assets from the Ibovespa and S&P 500 indices. The research compares three allocation models and finds that the Hierarchical Risk Parity model outperformed the others, with a Sharpe ratio of 1.11. Despite the pandemic's impact on the portfolios, with drawdowns close to 30%, they recovered in 111 to 149 trading days. The portfolios outperformed the indices in cumulative returns, with similar annual volatilities of 20%. Preprocessing with UMAP allowed for finding clusters with higher discriminatory power, evaluated through internal cluster validation metrics, helping to reduce the problem's size during optimal portfolio allocation. Overall, this study highlights the potential of machine learning in portfolio optimization, providing a useful framework for investment practitioners.


Portfolio Selection, Cluster Analysis, Hierarchical Risk Parity, Invers -Variance Portfolio, Mean-Variance