AUTHOR=Chi Jiarui TITLE=Interpretable multimodal reasoning for robo-advisory: the FinErva framework JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 8 - 2025 YEAR=2026 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1752580 DOI=10.3389/frai.2025.1752580 ISSN=2624-8212 ABSTRACT=The rapid development of robo-advisory and quantitative investment has been accompanied by persistent concerns about limited personalization and the opacity of black-box models operating on multimodal financial information. This paper addresses these issues from a decision-support perspective by constructing FinErva, a multimodal chain-of-thought dataset tailored to financial applications. FinErva comprises 7,544 manually verified question–answer pairs, divided into two economically relevant tasks: contract and disclosure understanding (FinErva-Pact) and candlestick-chart-based technical analysis (FinErva-Price). Building on this dataset, the paper propose a two-stage training framework: Supervised-CoT Learning followed by Self-CoT Refinement, and apply it to eight vision–language models, each with fewer than 0.8 billion parameters. Empirical results show that those lightweight models approach the performance of finance professionals and clearly outperform non-expert investors. Overall, the findings indicate that appropriately designed multimodal chain of thought supervision enables interpretable modeling of key research tasks such as contract review and chart interpretation under realistic computational and deployment constraints, providing new data and methodology for the development of personalized, explainable, and operationally feasible AI systems in investment advisory and risk management.