Skip to content

Latest commit

 

History

History
102 lines (68 loc) · 2.89 KB

README.md

File metadata and controls

102 lines (68 loc) · 2.89 KB

Introduction

This is a simple portfolio chooser based on time series of investment funds, based on the Sharpe Ratio and the Efficient Frontier.

This is part of a dual study trying to compare the implementation of the same data science project in Python and Rust. Check out the Rust repository in case you are interested.

Configuration

Main parameters for the run of the portfolio chooser can be selected in the config/config.toml file.

Running

You can directly run the full pipeline with

make run

or directly

python -m investments.main

and check the visualization of the efficient frontier with

make viz

or

make viz_hull

In order to run the visualization commands, you need Firefox. In case you don't have it, just open the corresponding htmls in the data folder directly instead.

Pipeline

Each part of the pipeline can be run separately through the according subpackage.

Raw files

For the fund time series, we are currently capturing monthly data by using https://dados.cvm.gov.br/dataset/fii-doc-inf_mensal. This seems to be restricted only to real estate, so we probably want to expand this in the future.

Meanwhile, since we aren't able to download this data programatically or structurally, we copy the tables from XP funds list for each year separately to an excel spreadsheet and export it as csv. In order for the pipeline to work, we establish that these files should be names as {CNPJ}_{YEAR}.csv, where {CNPJ} switches the slash char / for an underline. As an example, this would be a valid name: 32.319.351_0001-56_2023.csv

For the CDI time series, we capture the data directly by copy-pasting the data in the following link: https://brasilindicadores.com.br/cdi/.

Preprocessed files

Preprocessing transforms the rentability into a simple multiplier, e.g. a monthly rentability of +1.2% gets translated into 1.012 on a "values" column.

A "dt" column contains date in the format YYYY-MM-01. The funds-related csv, "funds.csv" also has an additional column "CNPJ_Fundo", corresponding to an identifier of the fund (c.f. https://www.gov.br/receitafederal/pt-br/servicos/cadastro/cnpj).

To run this part of the pipeline, run

python -m investments.preprocess.main

Timeseries

Consists of pickled files of the time series of each fund, according to the TimeSeries class.

To run this part of the pipeline, run

python -m investments.timeseries.main

Outputs

We get the risk-return plots, both in .png and .html format, the later for some interactiveness.

Furthermore, we get the convex hull of the plot to more easily identify the efficient frontier.

To run this part of the pipeline, run

python -m investments.outputs.main