diff --git a/docs/source/contribution_guidelines_code_auto_doc_example.rst b/docs/source/contribution_guidelines_code_auto_doc_example.rst index a5bc273a1..ebccae5c5 100644 --- a/docs/source/contribution_guidelines_code_auto_doc_example.rst +++ b/docs/source/contribution_guidelines_code_auto_doc_example.rst @@ -315,13 +315,13 @@ And to reference a module named vDataFrame: .. seealso:: - :py:mod:`vDataFrame` + :py:func:`~verticapy.vDataFrame` **Output:** .. seealso:: - :py:mod:`vDataFrame` + :py:func:`~verticapy.vDataFrame` Now you can go through the below examples to understand the usage in detail. From the examples you will note a few things: diff --git a/docs/source/examples_business_booking.rst b/docs/source/examples_business_booking.rst index 54aeb766d..47f05ff80 100644 --- a/docs/source/examples_business_booking.rst +++ b/docs/source/examples_business_booking.rst @@ -279,7 +279,7 @@ It looks like there are two main predictors: 'mode_hotel_cluster_count' and 'tri - look for a shorter trip duration. - not click as much (spend more time at the same web page). -Let's add our prediction to the :py:mod:`vDataFrame`. +Let's add our prediction to the :py:func:`~verticapy.vDataFrame`. .. code-block:: python diff --git a/docs/source/examples_business_churn.rst b/docs/source/examples_business_churn.rst index bde4c7fbd..52ce9f5a5 100644 --- a/docs/source/examples_business_churn.rst +++ b/docs/source/examples_business_churn.rst @@ -203,7 +203,7 @@ ________ Machine Learning ----------------- -:py:func:`~verticapy.machine_learning.vertica.LogisticRegression` is a very powerful algorithm and we can use it to detect churns. Let's split our :py:mod:`vDataFrame` into training and testing set to evaluate our model. +:py:func:`~verticapy.machine_learning.vertica.LogisticRegression` is a very powerful algorithm and we can use it to detect churns. Let's split our :py:func:`~verticapy.vDataFrame` into training and testing set to evaluate our model. .. ipython:: python diff --git a/docs/source/examples_business_football.rst b/docs/source/examples_business_football.rst index 5565a2f1b..958a78e01 100644 --- a/docs/source/examples_business_football.rst +++ b/docs/source/examples_business_football.rst @@ -979,7 +979,7 @@ To compute a ``k-means`` model, we need to find a value for 'k'. Let's draw an : model_kmeans.fit("football_clustering", predictors) model_kmeans.clusters_ -Let's add the prediction to the :py:mod:`vDataFrame`. +Let's add the prediction to the :py:func:`~verticapy.vDataFrame`. .. code-block:: python @@ -1983,7 +1983,7 @@ Looking at the importance of each feature, it seems like direct confrontations a .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_football_features_importance.html -Let's add the predictions to the :py:mod:`vDataFrame`. +Let's add the predictions to the :py:func:`~verticapy.vDataFrame`. Draws are pretty rare, so we'll only consider them if a tie was very likely to occur. diff --git a/docs/source/examples_business_insurance.rst b/docs/source/examples_business_insurance.rst index 1d7e72148..947b6167d 100644 --- a/docs/source/examples_business_insurance.rst +++ b/docs/source/examples_business_insurance.rst @@ -38,7 +38,7 @@ You can skip the below cell if you already have an established connection. vp.connect("VerticaDSN") -Let's create a new schema and assign the data to a :py:mod:`vDataFrame` object. +Let's create a new schema and assign the data to a :py:func:`~verticapy.vDataFrame` object. .. code-block:: ipython diff --git a/docs/source/examples_business_movies.rst b/docs/source/examples_business_movies.rst index d5260c40c..7adfd46a4 100644 --- a/docs/source/examples_business_movies.rst +++ b/docs/source/examples_business_movies.rst @@ -43,7 +43,7 @@ You can skip the below cell if you already have an established connection. vp.connect("VerticaDSN") -Let's create a new schema and assign the data to a :py:mod:`vDataFrame` object. +Let's create a new schema and assign the data to a :py:func:`~verticapy.vDataFrame` object. .. code-block:: ipython @@ -349,7 +349,7 @@ Let's join our notoriety metrics for actors and directors with the main dataset. ], ) -As we did many operation, it can be nice to save the :py:mod:`vDataFrame` as a table in the Vertica database. +As we did many operation, it can be nice to save the :py:func:`~verticapy.vDataFrame` as a table in the Vertica database. .. code-block:: python @@ -754,7 +754,7 @@ Let's create a model to evaluate an unbiased score for each different movie. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_movies_filmtv_complete_model_report.html -The model is good. Let's add it in our :py:mod:`vDataFrame`. +The model is good. Let's add it in our :py:func:`~verticapy.vDataFrame`. .. code-block:: python @@ -926,7 +926,7 @@ By looking at the elbow curve, we can choose 15 clusters. Let's create a ``k-mea model_kmeans.fit(filmtv_movies_complete, predictors) model_kmeans.clusters_ -Let's add the clusters in the :py:mod:`vDataFrame`. +Let's add the clusters in the :py:func:`~verticapy.vDataFrame`. .. code-block:: python diff --git a/docs/source/examples_business_smart_meters.rst b/docs/source/examples_business_smart_meters.rst index 8fd517847..28253e76f 100644 --- a/docs/source/examples_business_smart_meters.rst +++ b/docs/source/examples_business_smart_meters.rst @@ -44,7 +44,7 @@ You can skip the below cell if you already have an established connection. vp.connect("VerticaDSN") -Create the :py:mod:`vDataFrame` of the datasets: +Create the :py:func:`~verticapy.vDataFrame` of the datasets: .. code-block:: python diff --git a/docs/source/examples_business_spam.rst b/docs/source/examples_business_spam.rst index 1804c1f4b..1ea5bce9c 100644 --- a/docs/source/examples_business_spam.rst +++ b/docs/source/examples_business_spam.rst @@ -138,7 +138,7 @@ Let's compute some statistics using the length of the message. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_spam_table_clean_2.html -Let's add the most occurent words in our :py:mod:`vDataFrame` and compute the correlation vector. +Let's add the most occurent words in our :py:func:`~verticapy.vDataFrame` and compute the correlation vector. .. code-block:: python diff --git a/docs/source/examples_business_spotify.rst b/docs/source/examples_business_spotify.rst index f05130c8a..2717d86be 100644 --- a/docs/source/examples_business_spotify.rst +++ b/docs/source/examples_business_spotify.rst @@ -88,7 +88,7 @@ Create a new schema, "spotify". Data Loading ------------- -Load the datasets into the :py:mod:`vDataFrame` with :py:func:`~verticapy.read_csv` and then view them with :py:func:`~verticapy.vDataFrame.head`. +Load the datasets into the :py:func:`~verticapy.vDataFrame` with :py:func:`~verticapy.read_csv` and then view them with :py:func:`~verticapy.vDataFrame.head`. .. code-block:: diff --git a/docs/source/examples_learn_iris.rst b/docs/source/examples_learn_iris.rst index d77fb69bd..3908aa730 100644 --- a/docs/source/examples_learn_iris.rst +++ b/docs/source/examples_learn_iris.rst @@ -221,7 +221,7 @@ Let's plot the model to see the perfect separation. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_model_plot.html -We can add this probability to the :py:mod:`vDataFrame`. +We can add this probability to the :py:func:`~verticapy.vDataFrame`. .. code-block:: python @@ -275,7 +275,7 @@ Let's create a model to classify the Iris virginica. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_iris_table_ml_cv_2.html -We have another excellent model. Let's add it to the :py:mod:`vDataFrame`. +We have another excellent model. Let's add it to the :py:func:`~verticapy.vDataFrame`. .. code-block:: python diff --git a/docs/source/examples_learn_pokemon.rst b/docs/source/examples_learn_pokemon.rst index 6af81da22..687faeaec 100644 --- a/docs/source/examples_learn_pokemon.rst +++ b/docs/source/examples_learn_pokemon.rst @@ -250,7 +250,7 @@ In terms of missing values, our only concern is the Pokemon's second type (Type_ .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_pokemon_table_clean_2.html -Let's use the current_relation method to see how our data preparation so far on the :py:mod:`vDataFrame` generates SQL code. +Let's use the current_relation method to see how our data preparation so far on the :py:func:`~verticapy.vDataFrame` generates SQL code. .. ipython:: python diff --git a/docs/source/examples_learn_titanic.rst b/docs/source/examples_learn_titanic.rst index 5d027abcf..240f51dc5 100644 --- a/docs/source/examples_learn_titanic.rst +++ b/docs/source/examples_learn_titanic.rst @@ -302,7 +302,7 @@ Survival correlates strongly with whether or not a passenger has a lifeboat (the - Passengers with a lifeboat - Passengers without a lifeboat -Before we move on: we did a lot of work to clean up this data, but we haven't saved anything to our Vertica database! Let's look at the modifications we've made to the :py:mod:`vDataFrame`. +Before we move on: we did a lot of work to clean up this data, but we haven't saved anything to our Vertica database! Let's look at the modifications we've made to the :py:func:`~verticapy.vDataFrame`. .. ipython:: python @@ -322,7 +322,7 @@ VerticaPy dynamically generates SQL code whenever you make modifications to your vp.set_option("sql_on", False) print(titanic.info()) -Let's move on to modeling our data. Save the :py:mod:`vDataFrame` to your Vertica database. +Let's move on to modeling our data. Save the :py:func:`~verticapy.vDataFrame` to your Vertica database. .. ipython:: python :okwarning: diff --git a/docs/source/examples_understand_africa_education.rst b/docs/source/examples_understand_africa_education.rst index 5dd342eaa..65f6d4118 100644 --- a/docs/source/examples_understand_africa_education.rst +++ b/docs/source/examples_understand_africa_education.rst @@ -260,7 +260,7 @@ Eight seems to be a suitable number of clusters. Let's compute a ``k-means`` mod model = KMeans(n_cluster = 8) model.fit(africa, X = ["lon", "lat"]) -We can add the prediction to the :py:mod:`vDataFrame` and draw the scatter map. +We can add the prediction to the :py:func:`~verticapy.vDataFrame` and draw the scatter map. .. code-block:: python @@ -500,7 +500,7 @@ Let's look at the feature importance for each model. Feature importance between between math score and the reading score are almost identical. -We can add these predictions to the main :py:mod:`vDataFrame`. +We can add these predictions to the main :py:func:`~verticapy.vDataFrame`. .. code-block:: python diff --git a/docs/source/examples_understand_covid19.rst b/docs/source/examples_understand_covid19.rst index b7d093542..d35724cb7 100644 --- a/docs/source/examples_understand_covid19.rst +++ b/docs/source/examples_understand_covid19.rst @@ -283,14 +283,14 @@ Because of the upward monotonic trend, we can also look at the correlation betwe covid19["elapsed_days"] = covid19["date"] - fun.min(covid19["date"])._over(by = [covid19["state"]]) -We can generate the SQL code of the :py:mod:`vDataFrame` -to see what happens behind the scenes when we modify our data from within the :py:mod:`vDataFrame`. +We can generate the SQL code of the :py:func:`~verticapy.vDataFrame` +to see what happens behind the scenes when we modify our data from within the :py:func:`~verticapy.vDataFrame`. .. ipython:: python print(covid19.current_relation()) -The :py:mod:`vDataFrame` memorizes all of our operations on the data to dynamically generate the correct SQL statement and passes computation and aggregation to Vertica. +The :py:func:`~verticapy.vDataFrame` memorizes all of our operations on the data to dynamically generate the correct SQL statement and passes computation and aggregation to Vertica. Let's see the correlation between the number of deaths and the other variables. diff --git a/docs/source/user_guide_data_ingestion.rst b/docs/source/user_guide_data_ingestion.rst index f8cb4d34c..24329ec2c 100644 --- a/docs/source/user_guide_data_ingestion.rst +++ b/docs/source/user_guide_data_ingestion.rst @@ -148,7 +148,7 @@ In the following example, we will use :py:func:`~verticapy.read_csv` to ingest a titanic = load_titanic() -To convert a subset of the dataset to a CSV file, select the desired rows in the dataset and use the :py:func:`~verticapy.to_csv` :py:mod:`vDataFrame` method: +To convert a subset of the dataset to a CSV file, select the desired rows in the dataset and use the :py:func:`~verticapy.to_csv` :py:func:`~verticapy.vDataFrame` method: .. ipython:: python diff --git a/docs/source/user_guide_data_preparation_decomposition.rst b/docs/source/user_guide_data_preparation_decomposition.rst index ecb7fb349..a149f1986 100644 --- a/docs/source/user_guide_data_preparation_decomposition.rst +++ b/docs/source/user_guide_data_preparation_decomposition.rst @@ -89,7 +89,7 @@ Notice that the predictors are now independant and combined together and they ha model.explained_variance_ -Most of the information is in the first two components with more than 97.7% of explained variance. We can export this result to a :py:mod:`vDataFrame`. +Most of the information is in the first two components with more than 97.7% of explained variance. We can export this result to a :py:func:`~verticapy.vDataFrame`. .. code-block:: diff --git a/docs/source/user_guide_data_preparation_features_engineering.rst b/docs/source/user_guide_data_preparation_features_engineering.rst index 04cb11121..23f28e9b6 100644 --- a/docs/source/user_guide_data_preparation_features_engineering.rst +++ b/docs/source/user_guide_data_preparation_features_engineering.rst @@ -10,7 +10,7 @@ Features engineering makes use of many techniques - too many to go over in this Customized Features Engineering -------------------------------- -To build a customized feature, you can use the :py:func:`~verticapy.vDataFrame.eval` method of the :py:mod:`vDataFrame`. Let's look at an example with the well-known 'Titanic' dataset. +To build a customized feature, you can use the :py:func:`~verticapy.vDataFrame.eval` method of the :py:func:`~verticapy.vDataFrame`. Let's look at an example with the well-known 'Titanic' dataset. .. code-block:: python diff --git a/docs/source/user_guide_full_stack_complex_data_vmap.rst b/docs/source/user_guide_full_stack_complex_data_vmap.rst index 1fb40c58d..df39d5a4b 100644 --- a/docs/source/user_guide_full_stack_complex_data_vmap.rst +++ b/docs/source/user_guide_full_stack_complex_data_vmap.rst @@ -4,14 +4,10 @@ Complex Data Types & VMaps ========================== - Setup ------ - -In order to work with complex -data types in VerticaPy, you'll need to -complete the following three setup tasks: +In order to work with complex data types in VerticaPy, you'll need to complete the following three setup tasks: - Import relevant libraries: @@ -19,28 +15,20 @@ complete the following three setup tasks: import verticapy as vp -- Connect to Vertica: - -.. code-block:: python +- Connect to Vertica. This example uses an existing connection called "VerticaDSN". For details on how to create a connection, see the :ref:`connection` tutorial. - vp.new_connection( - { - "host": "10.211.55.14", - "port": "5433", - "database": "testdb", - "password": "XxX", - "user": "dbadmin" - }, - name = "Vertica_New_Connection" - ) +.. note:: You can skip the below cell if you already have an established connection. +.. code-block:: python + + vp.connect("VerticaDSN") Check your VerticaPy version to make sure you have access to the right functions: .. ipython:: python - @suppress import verticapy as vp + vp.__version__ You can make it easier to keep track of your work by creating a custom schema: @@ -61,52 +49,32 @@ We also set the path to our data: You can download the demo datasets from `here `_. - Loading Complex Data --------------------- There are two ways to load a nested data file: -- Load directly using :py:func:`verticapy.read_json`. - In this case, you will need to use an additional parameter - to identify all the data types. The function loads the - data using flex tables and VMaps (Native Vertica MAPS, - which are flexible but not optimally performant). - -- Load using :py:func:`verticapy.read_file`. The function preidcts the complex data structure. +- Load directly using :py:func:`~verticapy.read_json`. In this case, you will need to use an additional parameter to identify all the data types. The function loads the data using flex tables and VMaps (Native Vertica MAPS, which are flexible but not optimally performant). +- Load using :py:func:`~verticapy.read_file`. The function preidcts the complex data structure. Let's try both: .. code-block:: python - import verticapy as vp data = vp.read_json( path + "laliga/2008.json", schema = "public", ingest_local = False, use_complex_dt = True, - genSQL = True ) -Similar to the use of :py:func:`verticapy.read_json` above, -we can use :py:func:`verticapy.read_file` to ingest the complex data directly: - -.. code-block:: python - - data = vp.read_file( - path = path + "laliga/2005.json", - ingest_local = False, - schema = "complex_vmap_test", - ) - data - .. ipython:: python :suppress: from verticapy.datasets import load_laliga data = load_laliga() - res = data + res = data.head(100) html_file = open("/project/data/VerticaPy/docs/figures/ug_fs_complex_data.html", "w") html_file.write(res._repr_html_()) html_file.close() @@ -114,11 +82,34 @@ we can use :py:func:`verticapy.read_file` to ingest the complex data directly: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_complex_data.html -We can also use the handy ``genSQL`` parameter to generate -(but not execute) the SQL needed to create the final relation: +Similar to the use of :py:func:`~verticapy.read_json` above, we can use :py:func:`~verticapy.read_file` to ingest the complex data directly: + +.. code-block:: python + + data = vp.read_file( + path = path + "laliga/2005.json", + ingest_local = False, + schema = "complex_vmap_test", + ) + data.head(100) + +.. raw:: html + :file: /project/data/VerticaPy/docs/figures/ug_fs_complex_data.html + +We can also use the handy `genSQL` parameter to generate (but not execute) the SQL needed to create the final relation: .. note:: This is a great way to customize the data ingestion or alter the final relation types. +.. code-block:: python + + data = vp.read_json( + path + "laliga/2008.json", + schema = "public", + ingest_local = False, + use_complex_dt = True, + genSQL = True, + ) + .. code-block:: SQL CREATE TABLE "complex_vmap_test"."laliga_2005" ( @@ -174,16 +165,11 @@ We can also use the handy ``genSQL`` parameter to generate FROM '/scratch_b/qa/ericsson/laliga/2005.json' PARSER FJsonParser(); - - Feature Exploration --------------------- - -In the generated SQL from the above example, we can see -that the ``away_team`` column is a ROW type with a complex -structure consisting of many sub-columns. We can convert -this column into a JSON and view its contents: +In the generated SQL from the above example, we can see that the ``away_team`` column is a ROW type with a complex +structure consisting of many sub-columns. We can convert this column into a JSON and view its contents: .. code-block:: python @@ -200,7 +186,7 @@ this column into a JSON and view its contents: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_complex_data_2.html -As with a normal vDataFrame, we can easily extract the values from the sub-columns: +As with a normal :py:func:`~verticapy.vDataFrame`, we can easily extract the values from the sub-columns: .. code-block:: python @@ -240,9 +226,7 @@ These nested structures can be used to create features: data["name_home"] = data["home_team"]["home_team_name"]; - -We can even flatten the nested structure inside a json file, -either flattening the entire file or just particular columns: +We can even flatten the nested structure inside a json file, either flattening the entire file or just particular columns: .. code-block:: python @@ -253,7 +237,7 @@ either flattening the entire file or just particular columns: ingest_local = False, flatten_maps = True, ) - data + data.head(100) .. ipython:: python :suppress: @@ -261,12 +245,14 @@ either flattening the entire file or just particular columns: vp.drop("complex_vmap_test.laliga_flat") path = "/project/data/VerticaPy/docs" path = path[0:-5] + "/verticapy/datasets/data/" - data = vp.read_json(path = path + "laliga/2008.json", - table_name = "laliga_flat", - schema = "complex_vmap_test", - ingest_local = True, - flatten_maps=True,) - res = data + data = vp.read_json( + path = path + "laliga/2008.json", + table_name = "laliga_flat", + schema = "complex_vmap_test", + ingest_local = True, + flatten_maps=True, + ) + res = data.head(100) html_file = open("/project/data/VerticaPy/docs/figures/ug_fs_complex_flatten.html", "w") html_file.write(res._repr_html_()) html_file.close() @@ -274,41 +260,29 @@ either flattening the entire file or just particular columns: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_complex_flatten.html -We can see that all the columns from the JSON file have been -flattened and multiple columns have been created for each -sub-column. This causes some loss in data structure, -but makes it easy to see the data and to use it for model building. +We can see that all the columns from the JSON file have been flattened and multiple columns have been created for each +sub-column. This causes some loss in data structure, but makes it easy to see the data and to use it for model building. -It is important to note that the data type of certain -columns (home_team.managers) is now ``VMap``, and not the ``ROW`` -type that we saw in the above cells. Even though both are -used to capture nested data, there is in a subtle difference between the two. +It is important to note that the data type of certain columns (home_team.managers) is now ``VMap``, and not the ``ROW`` +type that we saw in the above cells. Even though both are used to capture nested data, there is in a subtle difference between the two. -**VMap:** More flexible as it stores the data as a string of maps, -allowing the ingestion of data in varying shapes. The shape is -not fixed and new keys can easily be handled. This is a great -option when we don't know the structure in advance, or if the structure changes over time. +- **VMap:** More flexible as it stores the data as a string of maps, allowing the ingestion of data in varying shapes. The shape is not fixed and new keys can easily be handled. This is a great option when we don't know the structure in advance, or if the structure changes over time. -**Row:** More rigid because the dictionaries, including -all the data types, are fixed when they are defined. Newly -parsed keys are ignored. But because of it's rigid structure, -it is much more performant than VMaps. They are best used when -the file structure is known in advance. +- **Row:** More rigid because the dictionaries, including all the data types, are fixed when they are defined. Newly +parsed keys are ignored. But because of it's rigid structure, it is much more performant than VMaps. They are best used when the file structure is known in advance. - -To deconvolve the nested structure, we can use the ``flatten_arrays`` -parameter in order to make the output strictly formatted. However, it -can be an expensive process. +To deconvolve the nested structure, we can use the `flatten_arrays` parameter in order to make the output strictly formatted. However, it can be an expensive process. .. code-block:: python - vp.drop("complex_vmap_test.laliga_flat") - data = vp.read_json(path = path + "laliga/2008.json", - table_name = "laliga_flat", - schema = "complex_vmap_test", - ingest_local = False, - flatten_arrays=True,) - data + data = vp.read_json( + path = path + "laliga/2008.json", + table_name = "laliga_flat", + schema = "complex_vmap_test", + ingest_local = False, + flatten_arrays=True, + ) + data.head(100) .. ipython:: python :suppress: @@ -316,12 +290,14 @@ can be an expensive process. vp.drop("complex_vmap_test.laliga_flat") path = "/project/data/VerticaPy/docs" path = path[0:-5] + "/verticapy/datasets/data/" - data = vp.read_json(path = path + "laliga/2008.json", - table_name = "laliga_flat", - schema = "complex_vmap_test", - ingest_local = True, - flatten_arrays=True,) - res = data + data = vp.read_json( + path = path + "laliga/2008.json", + table_name = "laliga_flat", + schema = "complex_vmap_test", + ingest_local = True, + flatten_arrays=True, + ) + res = data.head(100) html_file = open("/project/data/VerticaPy/docs/figures/ug_fs_complex_flatten_arrays.html", "w") html_file.write(res._repr_html_()) html_file.close() @@ -349,7 +325,6 @@ We can even convert columns into other formats, such as string: Or integer: - .. code-block:: python data["match_week"].astype(int) @@ -368,9 +343,9 @@ Or integer: It is also possible to: -- Cast ``str`` to ``array`` -- Cast complex data types to ``json`` str -- Cast ``str`` to ``VMAP``s +- Cast ``str`` to ``array``. +- Cast complex data types to ``json`` str. +- Cast ``str`` to ``VMAP``s. - And much more... Multiple File Ingestion @@ -396,7 +371,7 @@ We can also do this for other file types. For example, csv: table_name = "cities_all", schema = "complex_vmap_test", ingest_local = False, - insert = True + insert = True, ) Materialize @@ -413,6 +388,7 @@ When we do not materialize a table, it automatically becomes a flextable: ingest_local = False, materialize = False, ) + data.head(100) .. ipython:: python :suppress: @@ -420,12 +396,14 @@ When we do not materialize a table, it automatically becomes a flextable: vp.drop("complex_vmap_test.laliga_verticapy_test_json") path = "/project/data/VerticaPy/docs" path = path[0:-5] + "/verticapy/datasets/data/" - data = vp.read_json(path = path + "laliga/*.json", - table_name = "laliga_verticapy_test_json", - schema = "complex_vmap_test", - ingest_local = True, - materialize = False,) - res = data + data = vp.read_json( + path = path + "laliga/*.json", + table_name = "laliga_verticapy_test_json", + schema = "complex_vmap_test", + ingest_local = True, + materialize = False, + ) + res = data.head(100) html_file = open("/project/data/VerticaPy/docs/figures/ug_fs_complex_materialize.html", "w") html_file.write(res._repr_html_()) html_file.close() @@ -433,7 +411,7 @@ When we do not materialize a table, it automatically becomes a flextable: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_complex_materialize.html -Some of the columns are VMAPs: +Some of the columns are ``VMAPs``: .. ipython:: python @@ -441,14 +419,12 @@ Some of the columns are VMAPs: for m in managers: print(data[m].isvmap()) -We can easily flatten the VMaps virtual columns by -using the :py:func:`vDataFrame.flat_vmap` method: +We can easily flatten the VMaps virtual columns by using the :py:func:`~vDataFrame.flat_vmap` method: .. code-block:: python data.flat_vmap(managers).drop(managers) - .. ipython:: python :suppress: @@ -458,8 +434,8 @@ using the :py:func:`vDataFrame.flat_vmap` method: html_file.write(res._repr_html_()) html_file.close() -.. .. raw:: html -.. :file: /project/data/VerticaPy/docs/figures/ug_fs_complex_materialize_flat.html +.. raw:: html + :file: /project/data/VerticaPy/docs/figures/ug_fs_complex_materialize_flat.html To check for a flex table, we can use the following function: @@ -469,8 +445,7 @@ To check for a flex table, we can use the following function: isflextable(table_name = "laliga_verticapy_test_json", schema = "complex_vmap_test") -We can then manually materialize the flextable using the -convenient :py:func:`vDataFrame.to_db` method: +We can then manually materialize the flextable using the convenient :py:func:`~vDataFrame.to_db` method: .. ipython:: python @@ -478,17 +453,14 @@ convenient :py:func:`vDataFrame.to_db` method: vp.drop("complex_vmap_test.laliga_to_db") data.to_db("complex_vmap_test.laliga_to_db"); -Once we have stored the database, we can easily create -a :py:func:`vDataFrame` of the relation: - +Once we have stored the database, we can easily create a :py:func:`~verticapy.vDataFrame` of the relation: .. ipython:: python data_new = vp.vDataFrame("complex_vmap_test.laliga_to_db") - Transformations ------------------ +---------------- First, we load the dataset. @@ -510,12 +482,11 @@ First, we load the dataset. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_complex_cities.html -Once we have data in the form of :py:func:`vDataFrame`, -we can readily convert it to a ``JSON`` file: +Once we have data in the form of :py:func:`~verticapy.vDataFrame`, we can readily convert it to a ``JSON`` file: .. ipython:: python - data.to_json(path = "amazon_json.json"); + data.to_json(path = "amazon_json.json") Now we can load the new JSON file and see the contents: @@ -540,8 +511,7 @@ Let's look at the begining portion of the string: json_str[0:100] -We can edit a portion of the string and save it again. -We'll change the name of the first State from ACRE to XXXX: +We can edit a portion of the string and save it again. We'll change the name of the first State from ACRE to XXXX: .. ipython:: python @@ -551,7 +521,6 @@ Now we can save this edited strings file: .. ipython:: python - out_file = open(path + "amazon_edited.json", "w") out_file.write(json_str) out_file.close() @@ -559,7 +528,8 @@ Now we can save this edited strings file: If we look at the new file, we can see the updated changes: .. ipython:: python - + + @suppress vp.drop("complex_vmap_test.amazon_edit") data = vp.read_json( path = path + "amazon_edited.json", diff --git a/docs/source/user_guide_full_stack_geopandas.rst b/docs/source/user_guide_full_stack_geopandas.rst index 700d36c15..cebca80df 100644 --- a/docs/source/user_guide_full_stack_geopandas.rst +++ b/docs/source/user_guide_full_stack_geopandas.rst @@ -4,7 +4,7 @@ Integrating with GeoPandas =========================== -As of version 0.4.0, VerticaPy features GeoPandas integration. This allows you to easily export a :py:mod:`vDataFrame` as a GeoPandas DataFrame, giving you more control over geospatial data. +As of version 0.4.0, VerticaPy features GeoPandas integration. This allows you to easily export a :py:func:`~verticapy.vDataFrame` as a GeoPandas DataFrame, giving you more control over geospatial data. This example demonstrates the advantages of GeoPandas integration with the 'world' dataset. diff --git a/docs/source/user_guide_full_stack_linear_regression.rst b/docs/source/user_guide_full_stack_linear_regression.rst index 2235130bc..853ee128f 100644 --- a/docs/source/user_guide_full_stack_linear_regression.rst +++ b/docs/source/user_guide_full_stack_linear_regression.rst @@ -438,7 +438,7 @@ Example with decomposition Let's look at the same dataset, but use decomposition techniques to filter out unimportant information. We don't have to normalize our data or look at correlations with these types of methods. -We'll begin by repeating the data preparation process of the previous section and export the resulting :py:mod:`vDataFrame` to Vertica. +We'll begin by repeating the data preparation process of the previous section and export the resulting :py:func:`~verticapy.vDataFrame` to Vertica. .. code-block:: ipython diff --git a/docs/source/user_guide_full_stack_vdataframe_magic.rst b/docs/source/user_guide_full_stack_vdataframe_magic.rst index bb180d88d..0d642ac42 100644 --- a/docs/source/user_guide_full_stack_vdataframe_magic.rst +++ b/docs/source/user_guide_full_stack_vdataframe_magic.rst @@ -4,7 +4,7 @@ The 'Magic' Methods of the vDataFrame ====================================== -VerticaPy 0.3.2 introduces the 'Magic' methods, which offer some additional flexilibility for mathematical operations in the :py:mod:`vDataFrame`. These methods let you handle many operations in a 'pandas-like' or Pythonic style. +VerticaPy 0.3.2 introduces the 'Magic' methods, which offer some additional flexilibility for mathematical operations in the :py:func:`~verticapy.vDataFrame`. These methods let you handle many operations in a 'pandas-like' or Pythonic style. .. code-block:: ipython @@ -246,7 +246,7 @@ Not Equal Operator (!=) 'Pythonic' Invokation of Vertica Functions ------------------------------------------- -You can easily apply Vertica functions to your :py:mod:`vDataFrame`. Here, we use Vertica's COALESCE function to impute the 'age' of the passengers in our dataset. +You can easily apply Vertica functions to your :py:func:`~verticapy.vDataFrame`. Here, we use Vertica's COALESCE function to impute the 'age' of the passengers in our dataset. .. code-block:: ipython @@ -264,7 +264,7 @@ You can easily apply Vertica functions to your :py:mod:`vDataFrame`. Here, we us Slicing the vDataFrame ----------------------- -You can now slice the :py:mod:`vDataFrame` with indexing operators. +You can now slice the :py:func:`~verticapy.vDataFrame` with indexing operators. .. code-block:: ipython diff --git a/docs/source/user_guide_introduction_best_practices.rst b/docs/source/user_guide_introduction_best_practices.rst index 66cb6eb35..c8bf8a08d 100644 --- a/docs/source/user_guide_introduction_best_practices.rst +++ b/docs/source/user_guide_introduction_best_practices.rst @@ -10,9 +10,9 @@ Restrict objects and operations to essential columns As VerticaPy is effectively an abstraction of SQL, any database-level optimizations you make in your Vertica database carry over to VerticaPy. In Vertica, optimization is centered on projections, which are collections of table columns—from one or more tables—stored on disk in a format that optimizes query execution. When you write queries in terms of the original tables, the query uses the projections to return query results. For details about creating and designing projections, see the Projections section in the Vertica documentation. -Projections are created and managed in the Vertica database, but you can leverage the power of projections in VerticaPy with features such as the :py:mod:`vDataFrame`'s usecols parameter, which specifies the columns from the input relation to include in the :py:mod:`vDataFrame`. As columnar databases perform better when there are fewer columns in the query, especially when you are working with large datasets, limiting :py:mod:`vDataFrame` and operations to essential columns can lead to a significant performance improvement. By default, most :py:mod:`vDataFrame` methods use all numerical columns in the :py:mod:`vDataFrame`, but you can restrict the operation to specific columns. +Projections are created and managed in the Vertica database, but you can leverage the power of projections in VerticaPy with features such as the :py:func:`~verticapy.vDataFrame`'s usecols parameter, which specifies the columns from the input relation to include in the :py:func:`~verticapy.vDataFrame`. As columnar databases perform better when there are fewer columns in the query, especially when you are working with large datasets, limiting :py:func:`~verticapy.vDataFrame` and operations to essential columns can lead to a significant performance improvement. By default, most :py:func:`~verticapy.vDataFrame` methods use all numerical columns in the :py:func:`~verticapy.vDataFrame`, but you can restrict the operation to specific columns. -In the following examples, we'll demonstrate how to create a `vDataFrame` from specific columns in the input relation, and then run methods on that :py:mod:`vDataFrame`. First, load the titanic dataset into Vertica using the :py:func:`~verticapy.datasets.load_titanic` function: +In the following examples, we'll demonstrate how to create a `vDataFrame` from specific columns in the input relation, and then run methods on that :py:func:`~verticapy.vDataFrame`. First, load the titanic dataset into Vertica using the :py:func:`~verticapy.datasets.load_titanic` function: .. code-block:: python @@ -78,7 +78,7 @@ To turn off the SQL code generation option: # Turning off SQL. vp.set_option("sql_on", False) -To restrict the operation to specific columns in the :py:mod:`vDataFrame`, provide the column names in the `columns` parameter: +To restrict the operation to specific columns in the :py:func:`~verticapy.vDataFrame`, provide the column names in the `columns` parameter: .. code-block:: python @@ -105,7 +105,7 @@ Instead of specifying essential columns to include, some methods allow you to li .. note:: - To list all columns in a :py:mod:`vDataFrame`, including non-numerical columns, use the :py:func:`~verticapy.vDataFrame.get_columns` method. + To list all columns in a :py:func:`~verticapy.vDataFrame`, including non-numerical columns, use the :py:func:`~verticapy.vDataFrame.get_columns` method. You can then use this truncated list of columns in another method call; for instance, to compute a correlation matrix: @@ -126,12 +126,12 @@ You can then use this truncated list of columns in another method call; for inst Save the current relation -------------------------- -The :py:mod:`vDataFrame` works like a `view`, a stored query that encapsulates one or more SELECT statements. +The :py:func:`~verticapy.vDataFrame` works like a `view`, a stored query that encapsulates one or more SELECT statements. If the generated relation uses many different functions, the computation time for each method call is greatly increased. Small transformations don't drastically slow down computation, but heavy transformations (multiple joins, frequent use of advanced analytical funcions, moving windows, etc.) can result in noticeable slowdown. When performing computationally expensive operations, you can aid performance by saving the vDataFrame structure as a table in the Vertica database. We will demonstrate this process in the following example. -First, create a :py:mod:`vDataFrame`, then perform some operations on that :py:mod:`vDataFrame`: +First, create a :py:func:`~verticapy.vDataFrame`, then perform some operations on that :py:func:`~verticapy.vDataFrame`: .. code-block:: python @@ -162,7 +162,7 @@ To understand how Vertica executes the different aggregations in the above relat Looking at the plan and its associated relation, it's clear that the transformations we applied to the vDataFrame result in a complicated relation. -Each method call to the :py:mod:`vDataFrame` must use this relation for computation. +Each method call to the :py:func:`~verticapy.vDataFrame` must use this relation for computation. .. note:: @@ -406,13 +406,13 @@ To monitor how VerticaPy is computing the aggregations, use the :py:func:`~verti VerticaPy allows you to send multiple queries, either iteratively or concurrently, to the database when computing aggregations. -First, let's send a single query to compute the average for all columns in the :py:mod:`vDataFrame`: +First, let's send a single query to compute the average for all columns in the :py:func:`~verticapy.vDataFrame`: .. ipython:: python display(vdf.avg(ncols_block = 20)) -We see that there was one SELECT query for all columns in the :py:mod:`vDataFrame`. +We see that there was one SELECT query for all columns in the :py:func:`~verticapy.vDataFrame`. You can reduce the impact on the system by using the `ncols_block` parameter to split the computation into multiple iterative queries, where the value of the parameter is the number of columns included in each query. For example, setting `ncols_block` to 5 will split the computation, which consists of 20 total columns, into 4 separate queries, each of which computes the average for 5 columns: diff --git a/docs/source/user_guide_introduction_vdf.rst b/docs/source/user_guide_introduction_vdf.rst index ba5a260ae..284484b08 100644 --- a/docs/source/user_guide_introduction_vdf.rst +++ b/docs/source/user_guide_introduction_vdf.rst @@ -3,37 +3,17 @@ The Virtual DataFrame ===================== +The Virtual DataFrame (vDataFrame) is the core object of the VerticaPy library. Leveraging the power of Vertica and the flexibility of Python, the :py:func:`~verticapy.vDataFrame` is a Python object that lets you manipulate the data representation in a Vertica database without modifying the underlying data. The data represented by a :py:func:`~verticapy.vDataFrame` remains in the Vertica database, bypassing the limitations of working memory. When a :py:func:`~verticapy.vDataFrame` is created or altered, VerticaPy formulates the operation as an SQL query and pushes the computation to the Vertica database, harnessing Vertica's massive parallel processing and in-built functions. Vertica then aggregates and returns the result to VerticaPy. In essence, vDataFrames behave similar to `views `_ in the Vertica database. +For more information about Vertica's performance advantages, including its columnar orientation and parallelization across +nodes, see the `Vertica documentation `_. -The Virtual DataFrame (vDataFrame) is the core object of the -VerticaPy library. Leveraging the power of Vertica and the -flexibility of Python, the :py:func:`verticapy.vDataFrame` is a Python object that -lets you manipulate the data representation in a Vertica -database without modifying the underlying data. The data -represented by a :py:func:`verticapy.vDataFrame` remains in the Vertica database, -bypassing the limitations of working memory. When a :py:func:`verticapy.vDataFrame` -is created or altered, VerticaPy formulates the operation as -an SQL query and pushes the computation to the Vertica database, -harnessing Vertica's massive parallel processing and in-built -functions. Vertica then aggregates and returns the result to -VerticaPy. In essence, vDataFrames behave similar to -`views `_ -in the Vertica database. - -For more information about Vertica's performance advantages, -including its columnar orientation and parallelization across -nodes, see the -`Vertica documentation `_. - -In the following tutorial, we will introduce the basic -functionality of the :py:func:`verticapy.vDataFrame` and then explore the ways -in which they utilize in-database processing to enhance performance. - +In the following tutorial, we will introduce the basic functionality of the :py:func:`~verticapy.vDataFrame` and then explore the ways in which they utilize in-database processing to enhance performance. Creating vDataFrames --------------------- -First, run the :py:func:`verticapy.datasets.load_titanic` function to ingest into +First, run the :py:func:`~verticapy.datasets.load_titanic` function to ingest into Vertica a dataset with information about titanic passengers: .. code-block:: python @@ -54,12 +34,9 @@ Vertica a dataset with information about titanic passengers: .. raw:: html :file: /project/data/VerticaPy/docs/figures/user_guide_introduction_best_practices_laod_titanic.html +You can create a :py:func:`~verticapy.vDataFrame` from either an existing relation or a customized relation. -You can create a :py:func:`verticapy.vDataFrame` from either an existing -relation or a customized relation. - -To create a :py:func:`verticapy.vDataFrame` using an existing relation, in this -case the Titanic dataset, provide the name of the dataset: +To create a :py:func:`~verticapy.vDataFrame` using an existing relation, in this case the Titanic dataset, provide the name of the dataset: .. code-block:: python @@ -67,10 +44,7 @@ case the Titanic dataset, provide the name of the dataset: vp.vDataFrame("public.titanic") - -To create a :py:func:`verticapy.vDataFrame` using a customized relation, -specify the SQL query for that relation as the argument: - +To create a :py:func:`~verticapy.vDataFrame` using a customized relation, specify the SQL query for that relation as the argument: .. code-block:: python @@ -88,18 +62,14 @@ specify the SQL query for that relation as the argument: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_intro_vdf_1.html - -For more examples of creating vDataFrames, see vDataFrame. +For more examples of creating vDataFrames, see :py:func:`~verticapy.vDataFrame`. In-memory vs. in-database ---------------------------- +-------------------------- -The following examples demonstrate the performance advantages of -loading and processing data in-database versus in-memory. +The following examples demonstrate the performance advantages of loading and processing data in-database versus in-memory. -First, we download the -`Expedia dataset `_ - from Kaggle and then load it into Vertica: +First, we download the `Expedia dataset `_ from Kaggle and then load it into Vertica: .. note:: @@ -110,7 +80,7 @@ First, we download the vp.read_csv("expedia.csv", schema = "public", parse_nrows = 20000000) -Once the data is loaded into the Vertica database, we can create a :py:func:`verticapy.vDataFrame` using the relation that contains the Expedia dataset: +Once the data is loaded into the Vertica database, we can create a :py:func:`~verticapy.vDataFrame` using the relation that contains the Expedia dataset: .. ipython:: python @@ -121,19 +91,15 @@ Once the data is loaded into the Vertica database, we can create a :py:func:`ver vp.read_csv( "/project/data/VerticaPy/docs/source/_static/website/examples/data/booking/expedia.csv", schema = "public", - parse_nrows = 20000000 + parse_nrows = 20000000, ) start_time = time.time() expedia = vp.vDataFrame("public.expedia") print("elapsed time = {}".format(time.time() - start_time)) +The :py:func:`~verticapy.vDataFrame` was created in about a second. All the data—about 4GB—is stored in Vertica, requiring no in-memory data loading. -The :py:func:`verticapy.vDataFrame` was created in about a second. -All the data—about 4GB—is stored in Vertica, -requiring no in-memory data loading. - -Now, to compare the above result with in-memory -loading, we load about half the dataset into pandas: +Now, to compare the above result with in-memory loading, we load about half the dataset into pandas: .. note:: @@ -141,7 +107,6 @@ loading, we load about half the dataset into pandas: avoid running the following code if your computer has less than 2GB of memory. - .. code-block:: python import pandas as pd @@ -154,7 +119,23 @@ loading, we load about half the dataset into pandas: elapsed_time = time.time() - start_time L_time.append(elapsed_time) +.. code-block:: python + + import pandas as pd + + L_nrows = [10000, 100000, 149814] + L_time = [] + for nrows in L_nrows: + start_time = time.time() + expedia_df = pd.read_csv( + "expedia.csv", + nrows = nrows, + ) + elapsed_time = time.time() - start_time + L_time.append(elapsed_time) + .. ipython:: python + :suppress: import pandas as pd @@ -164,25 +145,17 @@ loading, we load about half the dataset into pandas: start_time = time.time() expedia_df = pd.read_csv( "/project/data/VerticaPy/docs/source/_static/website/examples/data/booking/expedia.csv", - nrows = nrows + nrows = nrows, ) elapsed_time = time.time() - start_time L_time.append(elapsed_time) - .. ipython:: python for i in range(len(L_time)): print("nrows = {}; elapsed time = {}".format(L_nrows[i], L_time[i])) - -It took an order of magnitude more to load into memory -compared with the time required to create the -vDataFrame. Loading data into -pandas is quite fast when the data volume is low -(less than some MB), but as the size of the dataset increases, -the load time can become exponentially more expensive, as seen -in the following plot: +It took an order of magnitude more to load into memory compared with the time required to create the :py:func:`~verticapy.vDataFrame`. Loading data into pandas is quite fast when the data volume is low (less than some MB), but as the size of the dataset increases, the load time can become exponentially more expensive, as seen in the following plot: .. ipython:: python @@ -192,10 +165,7 @@ in the following plot: @savefig ug_intro_vdf_plot_2 plt.show() -Even after the data is loaded into memory, the -performance is very slow. The following example -removes non-numeric columns from the dataset, then -computes a correlation matrix: +Even after the data is loaded into memory, the performance is very slow. The following example removes non-numeric columns from the dataset, then computes a correlation matrix: .. ipython:: python @@ -205,10 +175,7 @@ computes a correlation matrix: expedia_df.corr(); print(f"elapsed time = {time.time() - start_time}") - -Let's compare the performance in-database using a -vDataFrame to compute the correlation matrix of -the entire dataset: +Let's compare the performance in-database using a :py:func:`~verticapy.vDataFrame` to compute the correlation matrix of the entire dataset: .. ipython:: python @@ -218,16 +185,11 @@ the entire dataset: expedia.corr(show = False); print(f"elapsed time = {time.time() - start_time}") - -VerticaPy also caches the computed aggregations. -With this cache available, we can repeat the -correlation matrix computation almost instantaneously: +VerticaPy also caches the computed aggregations. With this cache available, we can repeat the correlation matrix computation almost instantaneously: .. note:: - If necessary, you can deactivate the cache by calling - the :py:func:`verticapy.set_option` function with - the `cache` parameter set to False. + If necessary, you can deactivate the cache by calling the :py:func:`~verticapy.set_option` function with the `cache` parameter set to False. .. ipython:: python @@ -235,14 +197,12 @@ correlation matrix computation almost instantaneously: expedia.corr(show = False); print(f"elapsed time = {time.time() - start_time}") - Memory usage +++++++++++++ Now, we will examine how the memory usage compares between in-memory and in-database. -First, use the pandas `info()` method to -explore the DataFrame's memory usage: +First, use the pandas `info()` method to explore the DataFrame's memory usage: .. ipython:: python @@ -265,39 +225,23 @@ Compare this with vDataFrame: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_intro_vdf_mem.html +The :py:func:`~verticapy.vDataFrame` only uses about 37KB! By storing the data in the Vertica database, and only recording the +user's data modifications in memory, the memory usage is reduced to a minimum. -The :py:func:`verticapy.vDataFrame` only uses about 37KB! By storing the data -in the Vertica database, and only recording the -user's data modifications in memory, the memory -usage is reduced to a minimum. +With VerticaPy, we can take advantage of Vertica's structure and scalability, providing fast queries without ever loading the data into memory. In the above examples, we've seen that in-memory processing is much more expensive in both computation and memory usage. This often leads to the decesion to downsample the data, which sacrfices the possibility of further data insights. -With VerticaPy, we can take advantage of Vertica's -structure and scalability, providing fast queries -without ever loading the data into memory. In the -above examples, we've seen that in-memory processing -is much more expensive in both computation and memory -usage. This often leads to the decesion to downsample -the data, which sacrfices the possibility of further -data insights. +The :py:func:`~verticapy.vDataFrame` structure +---------------------------------------------- -The :py:func:`verticapy.vDataFrame` structure -------------------------- +Now that we've seen the performance and memory benefits of the vDataFrame, let's dig into some of the underlying structures and methods that produce these great results. -Now that we've seen the performance and memory -benefits of the vDataFrame, let's dig into some -of the underlying structures and methods that -produce these great results. - -vDataFrames are composed of columns called -vDataColumns. To view all vDataColumns in a -vDataFrame, use the :py:func:`verticapy.get_columns` method: +:py:func:`~verticapy.vDataFrame`s are composed of columns called :py:mod:`vDataColumn`s. To view all :py:mod:`vDataColumn`s in a :py:func:`~verticapy.vDataFrame`, use the :py:func:`~verticapy.get_columns` method: .. ipython:: python expedia.get_columns() -To access a :py:func:`verticapy.vDataColumn`, -specify the column name in square brackets, for example: +To access a :py:func:`~verticapy.vDataColumn`, specify the column name in square brackets, for example: .. note:: @@ -318,21 +262,16 @@ specify the column name in square brackets, for example: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_intro_vdf_describe.html -Each vDataColumn has its own catalog to save user -modifications. In the previous example, we computed -some aggregations for the ``is_booking`` column. Let's -look at the catalog for that vDataColumn: +Each :py:func:`~verticapy.vDataColumn` has its own catalog to save user modifications. In the previous example, we computed +some aggregations for the ``is_booking`` column. Let's look at the catalog for that :py:func:`~verticapy.vDataColumn`: .. ipython:: python expedia["is_booking"]._catalog -The catalog is updated whenever major changes are -made to the data. +The catalog is updated whenever major changes are made to the data. -We can also view the vDataFrame's backend SQL code -generation by setting the ``sql_on`` parameter to -True with the :py:func:`verticapy.set_option` function: +We can also view the vDataFrame's backend SQL code generation by setting the `sql_on` parameter to ``True`` with the :py:func:`~verticapy.set_option` function: .. code-block:: python @@ -402,7 +341,6 @@ True with the :py:func:`verticapy.set_option` function: FROM "public"."expedia" ) VERTICAPY_SUBTABLE; - .. ipython:: python :suppress: @@ -414,22 +352,16 @@ True with the :py:func:`verticapy.set_option` function: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_intro_vdf_describe_cnt.html -To control whether each query outputs its elasped time, -use the ``time_on`` parameter of the :py:func:`verticapy.set_option` -function: +To control whether each query outputs its elasped time, use the ``time_on`` parameter of the :py:func:`~verticapy.set_option` function: .. ipython:: python vp.set_option("sql_on", False) - expedia = vp.vDataFrame("public.expedia") # creating a new :py:func:`verticapy.vDataFrame` to delete the catalog + expedia = vp.vDataFrame("public.expedia") # creating a new vDataFrame to delete the catalog vp.set_option("time_on", True) expedia.corr() -The aggregation's for each vDataColumn are saved to its catalog. -If we again call the :py:func:`verticapy.vDataFrame.corr` method, it'll complete in a -couple seconds—the time needed to draw the graphic—because -the aggregations have already been computed and saved during -the last call: +The aggregation's for each vDataColumn are saved to its catalog. If we again call the :py:func:`~verticapy.vDataFrame.corr` method, it'll complete in a couple seconds—the time needed to draw the graphic—because the aggregations have already been computed and saved during the last call: .. ipython:: python @@ -444,19 +376,13 @@ To turn off the elapsed time and the SQL code generation options: vp.set_option("sql_on", False) vp.set_option("time_on", False) -You can obtain the current :py:func:`verticapy.vDataFrame` relation -with the :py:func:`verticapy.vDataFrame.current_relation` -method: +You can obtain the current :py:func:`~verticapy.vDataFrame` relation with the :py:func:`~verticapy.vDataFrame.current_relation` method: .. ipython:: python print(expedia.current_relation()) -The generated SQL for the relation changes according to the user's -modifications. For example, if we impute the missing values of the -``orig_destination_distance`` vDataColumn by its average and then -drop the ``is_package`` vDataColumn, these changes are reflected -in the relation: +The generated SQL for the relation changes according to the user's modifications. For example, if we impute the missing values of the ``orig_destination_distance`` vDataColumn by its average and then drop the ``is_package`` vDataColumn, these changes are reflected in the relation: .. ipython:: python @@ -464,16 +390,12 @@ in the relation: expedia["is_package"].drop(); print(expedia.current_relation()) -Notice that the ``is_package`` column has been removed -from the ``SELECT`` statement and the -``orig_destination_distance`` is now using a ``COALESCE SQL`` function. +Notice that the ``is_package`` column has been removed from the ``SELECT`` statement and the ``orig_destination_distance`` is now using a ``COALESCE SQL`` function. vDataFrame attributes and management ------------------------------------- -The :py:func:`verticapy.vDataFrame` has many attributes and methods, some of -which were demonstrated in the above examples. -vDataFrames have two types of attributes: +The :py:func:`~verticapy.vDataFrame` has many attributes and methods, some of which were demonstrated in the above examples. :py:func:`~verticapy.vDataFrame`s have two types of attributes: - Virtual Columns (vDataColumn) - Main attributes (columns, main_relation ...) @@ -487,29 +409,24 @@ The vDataFrame's main attributes are stored in the ``_vars`` dictionary: expedia._vars Data types ------------- +----------- +:py:func:`~verticapy.vDataFrame`s use the data types of its :py:func:`~verticapy.vDataColumn`s. The behavior of some :py:func:`~verticapy.vDataFrame` methods depend on the data type of the columns. -vDataFrames use the data types of its vDataColumns. The -behavior of some :py:func:`verticapy.vDataFrame` methods depend on the data type of the columns. -For example, computing a histogram for a numerical data -type is not the same as computing a histogram for a categorical data type. +For example, computing a histogram for a numerical data type is not the same as computing a histogram for a categorical data type. -The :py:func:`verticapy.vDataFrame` identifies four main data types: +The :py:func:`~verticapy.vDataFrame` identifies four main data types: - ``int``: integers are treated like categorical data types when their cardinality is low; otherwise, they are considered numeric - - ``float``: numeric data types - - ``date``: date-like data types (including timestamp) - - ``text``: categorical data types Data types not included in the above list are automatically treated as categorical. You can examine the data types of -the vDataColumns in a :py:func:`verticapy.vDataFrame` using the -:py:func:`verticapy.vDataFrame.dtypes` method: +the vDataColumns in a :py:func:`~verticapy.vDataFrame` using the +:py:func:`~verticapy.vDataFrame.dtypes` method: .. code-block:: python @@ -526,17 +443,14 @@ the vDataColumns in a :py:func:`verticapy.vDataFrame` using the .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_intro_vdf_expedia_dtypes.html -To convert the data type of a vDataColumn, use -the :py:func:`verticapy.vDataColumn.astype` method: +To convert the data type of a vDataColumn, use the :py:func:`~verticapy.vDataColumn.astype` method: .. ipython:: python expedia["hotel_market"].astype("varchar"); expedia["hotel_market"].ctype() -To view the category of a specific vDataColumn, -specify the vDataColumn and use the -:py:func:`verticapy.vDataColumn.category` method: +To view the category of a specific :py:func:`~verticapy.vDataColumn`, specify the :py:func:`~verticapy.vDataColumn` and use the :py:func:`~verticapy.vDataColumn.category` method: .. ipython:: python @@ -545,10 +459,7 @@ specify the vDataColumn and use the Exporting, saving, and loading ------------------------------- - -The :py:func:`verticapy.vDataFrame.save` and -:py:func:`verticapy.vDataFrame.load` functions -allow you to save and load vDataFrames: +The :py:func:`~verticapy.vDataFrame.save` and :py:func:`~verticapy.vDataFrame.load` functions allow you to save and load vDataFrames: .. code-block:: python @@ -567,23 +478,16 @@ allow you to save and load vDataFrames: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_intro_vdf_expedia_filter.html -To return a :py:func:`verticapy.vDataFrame` to a previously saved structure, -use the :py:func:`verticapy.vDataFrame.load` function: +To return a :py:func:`~verticapy.vDataFrame` to a previously saved structure, use the :py:func:`~verticapy.vDataFrame.load` function: .. ipython:: python expedia = expedia.load(); print(expedia.shape()) -Because vDataFrames are views of data stored in the -connected Vertica database, any modifications made to -the :py:func:`verticapy.vDataFrame` are not reflected in the underlying -data in the database. To save a vDataFrame's relation -to the database, use the :py:func:`verticapy.vDataFrame.to_db` -method. +Because :py:func:`~verticapy.vDataFrame`s are views of data stored in the connected Vertica database, any modifications made to the :py:func:`~verticapy.vDataFrame` are not reflected in the underlying data in the database. To save a :py:func:`~verticapy.vDataFrame`'s relation to the database, use the :py:func:`~verticapy.vDataFrame.to_db` method. -It's good practice to examine the expected disk usage of -the :py:func:`verticapy.vDataFrame` before exporting it to the database: +It's good practice to examine the expected disk usage of the :py:func:`~verticapy.vDataFrame` before exporting it to the database: .. code-block:: python @@ -600,12 +504,11 @@ the :py:func:`verticapy.vDataFrame` before exporting it to the database: .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_intro_vdf_expedia_storage_gb.html -If you decide that there is sufficient space to store the -vDataFrame in the database, run the :py:func:`verticapy.vDataFrame.to_db` method: +If you decide that there is sufficient space to store the :py:func:`~verticapy.vDataFrame` in the database, run the :py:func:`~verticapy.vDataFrame.to_db` method: .. code-block:: python expedia.to_db( "public.expedia_clean", - relation_type = "table" + relation_type = "table", ) \ No newline at end of file diff --git a/docs/source/user_guide_performance_qprof.rst b/docs/source/user_guide_performance_qprof.rst index 004199250..aca4e2679 100644 --- a/docs/source/user_guide_performance_qprof.rst +++ b/docs/source/user_guide_performance_qprof.rst @@ -4,237 +4,223 @@ Getting started with Query Profiler ==================================== +This starter notebook will help you get up and running with the Query Profiler (:py:func:`~verticapy.performance.vertica.QueryProfiler`) tool in VerticaPyLab and demonstrate functionality through various examples and use cases. +This tool is a work in progress and the VerticaPy team is continuously adding new features. -This starter notebook will help you get up and running with the Query -Profiler (QProf) tool in VerticaPyLab and demonstrate functionality -through various examples and use cases. - -This tool is a work in progress and the VerticaPy team -is continuously adding new features. - -See also :py:func:`verticapy.performance.vertica.QueryProfiler`, -:py:func:`verticapy.performance.vertica.QueryProfilerInterface`, -:py:func:`verticapy.performance.vertica.QueryProfilerComparison`. +See also :py:func:`~verticapy.performance.vertica.QueryProfiler`, +:py:func:`~verticapy.performance.vertica.QueryProfilerInterface`, +:py:func:`~verticapy.performance.vertica.QueryProfilerComparison`. VerticaPyLab ------------- -The easiest way to use the QProf tool is through VerticaPyLab. -For installation instructions, see :ref:`getting_started`. - -Before using the QProf tool, confirm that you are connected -to a Vertica database. If not, follow the connection instructions -in a Jupyter notebook or connect using the Connect option on the -VerticaPyLab homepage. +The easiest way to use the :py:func:`~verticapy.performance.vertica.QueryProfiler` tool is through VerticaPyLab. For installation instructions, see :ref:`getting_started`. +Before using the :py:func:`~verticapy.performance.vertica.QueryProfiler` tool, confirm that you are connected to a Vertica database. If not, follow the connection instructions in a Jupyter notebook or connect using the Connect option on the VerticaPyLab homepage. QueryProfiler -------------- -The QueryProfiler object is a python object that includes many -built-in methods for analyzing queries and their performance. -There are a few different ways to create a QProf object. - +The :py:func:`~verticapy.performance.vertica.QueryProfiler` object is a python object that includes many built-in methods for analyzing queries and their performance. There are a few different ways to create a :py:func:`~verticapy.performance.vertica.QueryProfiler` object. Create and save a QueryProfiler object -++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++ First, import the verticapy package and load the datasets: .. ipython:: python - import verticapy as vp - from verticapy.datasets import load_titanic, load_amazon - # load datasets - titanic = load_titanic() - amazon = load_amazon() + import verticapy as vp + from verticapy.datasets import load_titanic, load_amazon + # load datasets + titanic = load_titanic() + amazon = load_amazon() -Create QProf object from transaction id and statement id +Create :py:func:`~verticapy.performance.vertica.QueryProfiler` object from transaction id and statement id +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -Before creating the QProf object, take a look at the data -used by the query: +Before creating the :py:func:`~verticapy.performance.vertica.QueryProfiler` object, take a look at the data used by the query: .. code-block:: python - display(amazon) + amazon.head(100) .. ipython:: python - :suppress: + :suppress: - res = amazon - html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_amazon.html", "w") - html_file.write(res._repr_html_()) - html_file.close() + res = amazon.head(100) + html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_amazon.html", "w") + html_file.write(res._repr_html_()) + html_file.close() .. raw:: html - :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_amazon.html - + :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_amazon.html .. code-block:: python - display(titanic) + titanic.head(100) .. ipython:: python - :suppress: + :suppress: - res = titanic - html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_titanic.html", "w") - html_file.write(res._repr_html_()) - html_file.close() + res = titanic.head(100) + html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_titanic.html", "w") + html_file.write(res._repr_html_()) + html_file.close() .. raw:: html - :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_titanic.html + :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_titanic.html +We can now run some queries to create a :py:func:`~verticapy.performance.vertica.QueryProfiler` object. One way to do so is by using the queries `statement_id` and `transaction_id`. -We can now run some queries to create a QProf object. -One way to do so is by using the queries statement id -and transaction id. - -To allow for SQL execution in Jupyter cells, -load the sql extension: +To allow for SQL execution in Jupyter cells, load the sql extension: .. ipython:: python - %load_ext verticapy.sql + %load_ext verticapy.sql Next, let us run the queries: .. code-block:: python - %%sql - SELECT date, MONTH(date) as month, AVG(number) as avg_number_test from public.amazon group by date order by avg_number_test desc; + %%sql + SELECT + date, + MONTH(date) as month, + AVG(number) as avg_number_test + FROM public.amazon + GROUP BY date + ORDER BY avg_number_test DESC; .. ipython:: python - :suppress: + :suppress: - query = """ - SELECT date, MONTH(date) as month, AVG(number) as avg_number_test from public.amazon group by date order by avg_number_test desc; - """ - res = vp.vDataFrame(query) - query_1 = query - html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_sql.html", "w") - html_file.write(res._repr_html_()) - html_file.close() + query = """ + SELECT + date, + MONTH(date) as month, + AVG(number) as avg_number_test + FROM public.amazon + GROUP BY date + ORDER BY avg_number_test DESC; + """ + res = vp.vDataFrame(query) + query_1 = query + html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_sql.html", "w") + html_file.write(res._repr_html_()) + html_file.close() .. raw:: html - :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_sql.html - + :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_sql.html .. code-block:: python - %%sql - SELECT - a.date, - MONTH(a.date) AS month, - AVG(a.number) AS avg_number_test, - b.max_number - FROM - public.amazon AS a - JOIN ( - SELECT - date, - MAX(number) AS max_number - FROM - public.amazon - GROUP BY - date - ) AS b - ON - a.date = b.date - GROUP BY - a.date, b.max_number - ORDER BY - avg_number_test DESC; + %%sql + SELECT + a.date, + MONTH(a.date) AS month, + AVG(a.number) AS avg_number_test, + b.max_number + FROM + public.amazon AS a + JOIN ( + SELECT + date, + MAX(number) AS max_number + FROM + public.amazon + GROUP BY + date + ) AS b + ON + a.date = b.date + GROUP BY + a.date, b.max_number + ORDER BY + avg_number_test DESC; .. ipython:: python - :suppress: + :suppress: - query = """ - SELECT - a.date, - MONTH(a.date) AS month, - AVG(a.number) AS avg_number_test, - b.max_number - FROM - public.amazon AS a - JOIN ( - SELECT - date, - MAX(number) AS max_number - FROM - public.amazon - GROUP BY - date - ) AS b - ON - a.date = b.date - GROUP BY - a.date, b.max_number - ORDER BY - avg_number_test DESC; - """ - query_2 = query - res = vp.vDataFrame(query_2) - html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_sql_2.html", "w") - html_file.write(res._repr_html_()) - html_file.close() + query = """ + SELECT + a.date, + MONTH(a.date) AS month, + AVG(a.number) AS avg_number_test, + b.max_number + FROM + public.amazon AS a + JOIN ( + SELECT + date, + MAX(number) AS max_number + FROM + public.amazon + GROUP BY + date + ) AS b + ON + a.date = b.date + GROUP BY + a.date, b.max_number + ORDER BY + avg_number_test DESC; + """ + query_2 = query + res = vp.vDataFrame(query_2) + html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_sql_2.html", "w") + html_file.write(res._repr_html_()) + html_file.close() .. raw:: html - :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_sql_2.html + :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_sql_2.html -In order to create a QProf object from a query, -we need the queries statement_id and transaction_id, -both of which are found in the QUERY_REQUESTS system table: +In order to create a :py:func:`~verticapy.performance.vertica.QueryProfiler` object from a query, we need the queries statement_id and transaction_id, both of which are found in the QUERY_REQUESTS system table: .. code-block:: python - from verticapy.performance.vertica import QueryProfiler, QueryProfilerInterface + from verticapy.performance.vertica import QueryProfiler, QueryProfilerInterface - qprof = QueryProfiler((45035996273780927,76)) + qprof = QueryProfiler((45035996273780927,76)) -To create a QProf object w/ multiple queries, provide a list of tuples +To create a :py:func:`~verticapy.performance.vertica.QueryProfiler` object w/ multiple queries, provide a list of tuples .. code-block:: python - qprof = QueryProfilerInterface([(45035996273780927,74), (45035996273780075,6)]) + qprof = QueryProfilerInterface([(45035996273780927,74), (45035996273780075,6)]) -Once the QProf object is created, you can run the get_queries() method to view the queries contained in the QProf object: +Once the :py:func:`~verticapy.performance.vertica.QueryProfiler` object is created, you can run the get_queries() method to view the queries contained in the :py:func:`~verticapy.performance.vertica.QueryProfiler` object: .. code-block:: python - qprof.get_queries() + qprof.get_queries() .. ipython:: python - :suppress: - :okwarning: + :suppress: + :okwarning: - from verticapy.performance.vertica import QueryProfiler, QueryProfilerInterface - qprof = QueryProfilerInterface([query_1, query_2]) - res = qprof.get_queries() - html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_get_queries.html", "w") - html_file.write(res._repr_html_()) - html_file.close() + from verticapy.performance.vertica import QueryProfiler, QueryProfilerInterface + qprof = QueryProfilerInterface([query_1, query_2]) + res = qprof.get_queries() + html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_get_queries.html", "w") + html_file.write(res._repr_html_()) + html_file.close() .. raw:: html - :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_get_queries.html - - + :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_get_queries.html To visualize the query plan, run :py;func:`verticapy.QueryProfilerInterface.get_qplan_tree`, -which is customizable, allowing you to specify certain -metrics or focus on a specified tree path: - +which is customizable, allowing you to specify certain metrics or focus on a specified tree path: .. image:: ../../source/_static/website/user_guides/performance/user_guide_performance_qprof_get_qplan_tree.PNG :width: 80% :align: center -Create a QProf object directly from a query -++++++++++++++++++++++++++++++++++++++++++++ +Create a :py:func:`~verticapy.performance.vertica.QueryProfiler` object directly from a query +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -You can also create the QProf Object directly from an SQL Command: +You can also create the :py:func:`~verticapy.performance.vertica.QueryProfiler` Object directly from an SQL Command: .. code-block:: python @@ -248,52 +234,50 @@ You can also create the QProf Object directly from an SQL Command: Save the QueryProfiler object in a target schema +++++++++++++++++++++++++++++++++++++++++++++++++ -After you create a QProf object, you can save it to a target schema. +After you create a :py:func:`~verticapy.performance.vertica.QueryProfiler` object, you can save it to a target schema. + In this example, we will save the object to the ``sc_demo`` schema: .. ipython:: python vp.create_schema("sc_demo") -To save the QProf object, specify the ``target_schema`` and, optionally, -a ``key_id`` (it is a unique key which is used to search for the stored Qprof object) -when creating the QProf object: +To save the :py:func:`~verticapy.performance.vertica.QueryProfiler` object, specify the `target_schema` and, optionally, a `key_id` (it is a unique key which is used to search for the stored Qprof object) when creating the :py:func:`~verticapy.performance.vertica.QueryProfiler` object: .. code-block:: python # Save it to your schema qprof = QueryProfiler( (45035996273780927, 76), - target_schema='sc_demo', + target_schema = "sc_demo", key_id = "unique_xx1", - overwrite=True, + overwrite = True, ) -Load a QProf object --------------------- - +Load a :py:func:`~verticapy.performance.vertica.QueryProfiler` object +---------------------------------------------------------------------- -To load a previously saved QProf, simply provide its ``target_schema`` and ``key_id``: +To load a previously saved :py:func:`~verticapy.performance.vertica.QueryProfiler`, simply provide its `target_schema` and `key_id`: .. code-block:: python from verticapy.performance.vertica import QueryProfiler, QueryProfilerInterface - #Someone else can now connect to my DB and use the object. + + # Someone else can now connect to my DB and use the object. qprof = QueryProfiler( target_schema = "sc_demo", - key_id = "unique_xx1" + key_id = "unique_xx1", ) - Export and import ------------------ -You can export and import QProf objects as .tar files. +You can export and import :py:func:`~verticapy.performance.vertica.QueryProfiler` objects as .tar files. Export +++++++ -To export a QProf object, use the export_profile() method: +To export a :py:func:`~verticapy.performance.vertica.QueryProfiler` object, use the :py:func:`~verticapy.performance.vertica.QueryProfiler.export_profile` method: .. code-block:: python @@ -301,41 +285,35 @@ To export a QProf object, use the export_profile() method: .. note:: - There is also a shell script which helps you export - ``qprof`` data without python. See - `qprof_export `_. - + There is also a shell script which helps you export ``qprof`` data without python. See `qprof_export `_. Import +++++++ -To import a QProf object, use the -:py:func:`verticapy.performance.vertica.QueryProfiler.import_profile` -method and provide the ``target_schema`` and ``key_id``. -Make sure the ``key_id`` is unique/unused. Let us create -a new schema to load this into: +To import a :py:func:`~verticapy.performance.vertica.QueryProfiler` object, use the :py:func:`~verticapy.performance.vertica.QueryProfiler.import_profile` method and provide the `target_schema` and `key_id`. + +Make sure the `key_id` is unique/unused. Let us create a new schema to load this into: .. code-block:: python vp.create_schema("sc_demo_1") qprof = QueryProfiler.import_profile( - target_schema="sc_demo_1", - key_id="unique_load_xx1", - filename="test_export_1.tar", - auto_initialize = True + target_schema = "sc_demo_1", + key_id = "unique_load_xx1", + filename = "test_export_1.tar", + auto_initialize = True, ) Methods & attributes --------------------- -The QProf object includes many useful methods and attributes -to aid in the analysis of query performence. +The :py:func:`~verticapy.performance.vertica.QueryProfiler` object includes many useful methods and attributes to aid in the analysis of query performence. Access performance tables ++++++++++++++++++++++++++ -With the QProf object, you can access any of the following tables: +With the :py:func:`~verticapy.performance.vertica.QueryProfiler` object, you can access any of the following tables: .. ipython:: python @@ -345,12 +323,12 @@ For example, view the ``QUERY_EVENTS`` table: .. code-block:: python - qprof.get_table('query_events') + qprof.get_table("query_events") .. ipython:: python :suppress: - res = qprof.get_table('query_events') + res = qprof.get_table("query_events") html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_query_events.html", "w") html_file.write(res._repr_html_()) html_file.close() @@ -365,39 +343,37 @@ Or the ``DC_EXPLAIN_PLANS`` table: qprof.get_table('dc_explain_plans') .. ipython:: python - :suppress: + :suppress: - res = qprof.get_table('dc_explain_plans') - html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_dc_explain_plans.html", "w") - html_file.write(res._repr_html_()) - html_file.close() + res = qprof.get_table('dc_explain_plans') + html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_dc_explain_plans.html", "w") + html_file.write(res._repr_html_()) + html_file.close() .. raw:: html - :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_dc_explain_plans.html - + :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_dc_explain_plans.html Or the ``QUERY_CONSUMPTION`` table: .. code-block:: python - qprof.get_table('query_consumption') + qprof.get_table("query_consumption") .. ipython:: python - :suppress: + :suppress: - res = qprof.get_table('query_consumption') - html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_query_consumption.html", "w") - html_file.write(res._repr_html_()) - html_file.close() + res = qprof.get_table("query_consumption") + html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_query_consumption.html", "w") + html_file.write(res._repr_html_()) + html_file.close() .. raw:: html - :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_query_consumption.html + :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_query_consumption.html Get query information ++++++++++++++++++++++ -You can retrieve the query information, such as -``transaction id`` and ``statement id``, from the QProf object: +You can retrieve the query information, such as `transaction_id` and `statement_id`, from the :py:func:`~verticapy.performance.vertica.QueryProfiler` object: .. ipython:: python :okwarning: @@ -409,7 +385,7 @@ You can retrieve the query information, such as """ ) -View the statement and transaction ids: +View the `transaction_id` and `statement_id`: .. ipython:: python @@ -442,7 +418,6 @@ View the number of query steps in a bar graph: .. raw:: html :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_bar.html - .. ipython:: python qprof.get_qplan() @@ -484,21 +459,20 @@ To view the cpu time of the query in a bar graph: .. raw:: html :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_cpu_bar.html -QProf execution report -+++++++++++++++++++++++ +:py:func:`~verticapy.performance.vertica.QueryProfiler` execution report ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -The QProf object can also generate a report that includes various performence metrics, -including which operation took the most amount of time: +The :py:func:`~verticapy.performance.vertica.QueryProfiler` object can also generate a report that includes various performence metrics, including which operation took the most amount of time: .. code-block:: python - qprof.get_qexecution_report().sort({'exec_time_us':'desc'}) + qprof.get_qexecution_report().sort({"exec_time_us": "desc"}) .. ipython:: python :suppress: :okwarning: - res = qprof.get_qexecution_report().sort({'exec_time_us':'desc'}) + res = qprof.get_qexecution_report().sort({"exec_time_us": "desc"}) html_file = open("/project/data/VerticaPy/docs/figures/user_guides_performance_qprof_query_report.html", "w") html_file.write(res._repr_html_()) html_file.close() @@ -522,14 +496,10 @@ To view the query execution details: .. raw:: html :file: /project/data/VerticaPy/docs/figures/user_guides_performance_qprof_last.html +:py:func:`~verticapy.performance.vertica.QueryProfiler` Summary Report Export +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -QProf Summary Report Export -++++++++++++++++++++++++++++ - -You can also easily export the entire report in an HTML -format. This report can be read without having any -connection to database or a jupyter environment making -it very convenient to share and analyze offline. +You can also easily export the entire report in an HTML format. This report can be read without having any connection to database or a jupyter environment making it very convenient to share and analyze offline. .. code-block:: python