This script extracts documentation for Power BI artifacts on the Tenant using:
- Power BI REST APIs with SPN
- DAX Studio CLI
- Power BI Desktop
All combined with some logic. π€―
flowchart TD
A[Start] -->|Request access token| B[get_token]
B -->|Fetch tenant metadata| C[get_tenant_metadata]
C -->|Save JSON files| D[Save metadata to results/tenant_metadata]
D -->|Extract dataset information| E[get_info_datasets]
E -->|Run DAX queries| F[Generate CSV files for each dataset]
F -->|Save to results/datasets_info| G[Save dataset info]
D -->|Export dataflows JSON| H[get_dataflows]
H -->|Save JSON files| I[Save to results/dataflows_json]
D -->|Identify PRO datasets| J[get_pro_datasets]
J -->|Export PBIX files| K[Save to results/exported_pbix]
K -->|Extract dataset info from PBIX| L[get_info_pro_datasets]
L -->|Run DAX Query locally| M[Generate CSV files for each dataset]
M -->|Save to results/datasets_info| N[Save dataset info]
G & N -->|Generate documentation| O[create_documentation]
O -->|Save DOCX files| P[Save to results/documentation]
P -->|Process completed| Q[End]
This is a native Python script that runs locally. Besides Python itself and some additional libraries, you need to have DAX Studio and Power BI Desktop installed on the machine that will run the code.
Every Power BI developer should already have these installed, right? π
See more details in the Installation
section.
Note
The script is written with def
functions that segment each step, making it easier and clearer to debug and maintain the code.
When running the script, this function requests the access_token using the Service Principal properly configured in Microsoft Entra.
With the obtained access_token, this function makes several requests to the POWER BI REST APIs to extract the tenant metadata, saving the .json files in the results/tenant_metadata
folder. These files contain metadata for workspaces, dataflows, datasets, and reports. Each json file includes the necessary IDs and hashes to reconstruct the entire environment. After completing this step, the file structure will be as follows:
pbi-docs(repo-root)/
β-- results/
β β-- tenant_metadata/
β β-- dataflows.json
| |-- datasets.json
| |-- reports.json
| |-- workspaces.json
This function is undoubtedly the most disruptive part of this process π±.
With the metadata extracted from the tenant, we connect each dataset with the DAX Studio CLI and run a DAX query to obtain all tables, columns, measures, relationships, calculation groups, and much more...
Important
At this point, only datasets with workspaces in dedicated capacities (Fabric, Embedded, and PPU) are executed as they depend on the XMLA Endpoint, which is not available for PRO licensing. But I developed a cool feature that also includes PRO users π«΄.
The DAX queries generate 6 *.csv files for each dataset and save them in the results/datasets_info/
folder, creating more subfolders. For example, for Dataset A
in Workspace A
, it would look like:
results/datasets_info/Dataset A/Workspace A/partitions.csv
results/datasets_info/Dataset A/Workspace A/columns.csv
results/datasets_info/Dataset A/Workspace A/measures.csv
results/datasets_info/Dataset A/Workspace A/relationships.csv
results/datasets_info/Dataset A/Workspace A/parameters.csv
results/datasets_info/Dataset A/Workspace A/calculation_groups.csv
The file tree would look like this:
pbi-docs(repo-root)/
β-- results/
β β-- datasets_info/
β β-- Workspace A/
| |-- Dataset A/
| |-- partitions.csv
| |-- columns.csv
| |-- measures.csv
| |-- relationships.csv
| |-- parameters.csv
| |-- calculation_groups.csv
| |-- Dataset B/
| |-- partitions.csv
| |-- columns.csv
| |-- measures.csv
| |-- relationships.csv
| |-- parameters.csv
| |-- calculation_groups.csv
β β-- Workspace C/
| |-- Dataset C/
| |-- partitions.csv
| |-- columns.csv
| |-- measures.csv
| |-- relationships.csv
| |-- parameters.csv
| |-- calculation_groups.csv
Yes, this is the same DAX query I developed in July 2024 to obtain the documentation of a dataset locally. I just gave it a boost π
See the old repository here.
This function exports the JSON files of each dataflow from the tenant. It is advisable to keep these files for potential recoveries and migrations. They are saved in the dataflows_json
folder with the following structure in the file name:
Pattern:
workspace_name$dataflow_name.json
Examples:
Workspace A$Dataflow A.json
Workspace B$Other Dataflow.json
Remember when I said I hadn't forgotten about PRO users?
This function goes to the tenant_metadata folder and filters the datasets that are not in dedicated capacities, listing only the PRO datasets and exporting them to the local results/exported_pbix/
folder using a similar structure to the previous sections, adopting the workspace as subfolders.
pbi-docs(repo-root)/
β-- results/
β β-- exported_pbix/
β β-- Workspace A/
| |-- Dataset A.pbix
| |-- Dataset B.pbix
β β-- Workspace C/
| |-- Dataset A.pbix
| |-- Dataset D.pbix
Important
The API method used is reports/export. There is no method to export the Dataset itself, but using this method to export the report, it brings the dataset along. Obviously, this method does not cover reports that are in direct mode with other datasets. Therefore, always maintain a standard report connected to the dataset, so you can obtain the data through this standard report.
See more at: https://learn.microsoft.com/en-us/rest/api/power-bi/reports/export-report-in-group
This function, similar to the premium datasets, obtains the tables, columns, measures, etc., from the exported PBIX files and adds the data to the datasets_info
folder.
The difference here is that since we do not have the XMLA to connect the DAX Studio CLI to the dataset, we are opening Power BI Desktop with each PBIX file and running the DAX Query locally. Once the data is extracted, Power BI Desktop is automatically closed, and this cycle is repeated for each PBIX file. Amazing, right?
Having all the extracted data in their respective directories, this function creates a Microsoft Word .docx
document for each of the extracted datasets and saves them in the documentation
folder with the file name in the format workspace_name$report_name.docx
.
Important
Follow these steps one by one carefully.
- Ensure that you have the following softwares already installed:
- Microsoft Power BI Desktop MS Store
- DAX Studio Download here
- Python python.org
- Libs pandas pythonnet psutil pydocx
- If you don't have, run:
pip install pandas pythonnet psutil pydocx
- VS CODE MS Store
- Git Download here
- Open GitHub Repo. Fork and Clone to VS Code!;
- Click on
src/pbi_docs.py
- Open the Power BI Desktop. With the Power BI Desktop still opened open the Task Manager (CTRL+ALT+DEL). On the running apps find the Power BI Desktop taks and expand. Click with the right button and then
Open Path
. Find file PBIDesktop.exe and click on and thenCopy as path
. Paste in the code at the constantpbi_desktop
for example:
# Path Power BI Desktop
pbi_desktop = r"C:\Program Files\WindowsApps\Microsoft.MicrosoftPowerBIDesktop_2.140.1205.0_x64__8wekyb3d8bbwe\bin\PBIDesktop.exe"`
- Check if the paths of DAX Studio components are correctly referenced for example:
# Path DAX Studio CLI
cmd = r"C:\Program Files\DAX Studio\dscmd.exe"
# Path Analysis Services
ssas_dll = r"C:\Program Files\DAX Studio\bin\Microsoft.AnalysisServices.dll"
- Service Principal App configured on Portal Azure (Entra)
- Recomended configure on environment variables (or Key Vault)
- Enabled API's and XMLA on Fabric Portal Admin
- Give access on workspaces to Service Principal
- Run the script
- Enjoy your documentation on
results
folder! - Share with the Community! π
On folder pbi
you can refresh the Power BI Report with a case of use of the result files. π€―
Just confirm the path on parameter on the Power BI. Enjoy!
We welcome contributions from the community! If you have suggestions, bug reports, or want to contribute code, please follow these steps:
- Fork the repository on GitHub.
- Create a new branch with a descriptive name.
- Make your changes and commit them with clear and concise messages.
- Push your changes to your forked repository.
- Open a pull request to the main repository.
Please ensure your code adheres to the project's coding standards and includes appropriate tests. We appreciate your contributions and look forward to collaborating with you!
This project is licensed under the MIT License. See the LICENSE file for more details.
For any questions or inquiries, please reach out to us via the GitHub repository's issue tracker or contact the project maintainer directly.
Thank you for using and contributing to PBI-DOCS! Let's make data documentation easier and more efficient together! Let's keep pushing the boundaries of Microsoft Fabric and Power BI Communities! π
If you like this project, give it a β and share it with friends!