This is a simple portfolio chooser based on time series of investment funds, based on the Sharpe Ratio and the Efficient Frontier.
This is part of a dual study trying to compare the implementation of the same data science project in Python and Rust. Check out the Rust repository in case you are interested.
Main parameters for the run of the portfolio chooser can be selected in the
config/config.toml
file.
You can directly run the full pipeline with
make run
or directly
python -m investments.main
and check the visualization of the efficient frontier with
make viz
or
make viz_hull
In order to run the visualization commands, you need Firefox. In case you don't have it, just open the corresponding htmls in the data folder directly instead.
Each part of the pipeline can be run separately through the according subpackage.
For the fund time series, we are currently capturing monthly data by using https://dados.cvm.gov.br/dataset/fii-doc-inf_mensal. This seems to be restricted only to real estate, so we probably want to expand this in the future.
Meanwhile, since we aren't able to download this data programatically or structurally,
we copy the tables from XP funds list for each year separately to an excel spreadsheet
and export it as csv. In order for the pipeline to work, we establish that these files
should be names as {CNPJ}_{YEAR}.csv
, where {CNPJ}
switches the slash char /
for
an underline. As an example, this would be a valid name: 32.319.351_0001-56_2023.csv
For the CDI time series, we capture the data directly by copy-pasting the data in the following link: https://brasilindicadores.com.br/cdi/.
Preprocessing transforms the rentability into a simple multiplier, e.g. a monthly
rentability of +1.2%
gets translated into 1.012
on a "values"
column.
A "dt"
column contains date in the format YYYY-MM-01
. The funds-related csv,
"funds.csv"
also has an additional column "CNPJ_Fundo"
, corresponding to an
identifier of the fund (c.f. https://www.gov.br/receitafederal/pt-br/servicos/cadastro/cnpj).
To run this part of the pipeline, run
python -m investments.preprocess.main
Consists of pickled files of the time series of each fund, according to the TimeSeries
class.
To run this part of the pipeline, run
python -m investments.timeseries.main
We get the risk-return plots, both in .png
and .html
format, the later for some
interactiveness.
Furthermore, we get the convex hull of the plot to more easily identify the efficient frontier.
To run this part of the pipeline, run
python -m investments.outputs.main