From d593c2a7d0fb073bcc4c9a4030aafb239e250b69 Mon Sep 17 00:00:00 2001 From: Badr Date: Wed, 23 Oct 2024 08:26:28 -0400 Subject: [PATCH] corrections 2 --- docs/source/examples.rst | 3 -- .../examples_business_africa_education.rst | 4 +- docs/source/examples_business_battery.rst | 18 ++------ docs/source/examples_business_booking.rst | 12 ++--- docs/source/examples_business_churn.rst | 2 +- .../examples_business_credit_card_fraud.rst | 10 ++-- docs/source/examples_business_football.rst | 6 +-- docs/source/examples_business_insurance.rst | 2 +- docs/source/examples_business_movies.rst | 10 ++-- .../source/examples_business_smart_meters.rst | 4 +- docs/source/examples_business_spam.rst | 4 +- docs/source/examples_business_spotify.rst | 14 +++--- docs/source/examples_learn_commodities.rst | 6 +-- docs/source/examples_learn_iris.rst | 8 ++-- docs/source/examples_learn_pokemon.rst | 2 +- docs/source/examples_learn_titanic.rst | 6 +-- docs/source/examples_understand_amazon.rst | 4 +- docs/source/examples_understand_covid19.rst | 9 ++-- docs/source/user_guide.rst | 5 -- .../user_guide_data_exploration_charts.rst | 6 +-- ...er_guide_data_exploration_correlations.rst | 6 +-- ...ata_exploration_descriptive_statistics.rst | 4 +- docs/source/user_guide_data_ingestion.rst | 2 +- ...r_guide_data_preparation_decomposition.rst | 6 +-- ...user_guide_data_preparation_duplicates.rst | 4 +- .../user_guide_data_preparation_encoding.rst | 6 +-- ..._data_preparation_features_engineering.rst | 8 ++-- .../user_guide_data_preparation_joins.rst | 2 +- ..._guide_data_preparation_missing_values.rst | 8 ++-- ...r_guide_data_preparation_normalization.rst | 8 ++-- .../user_guide_data_preparation_outliers.rst | 16 +++---- ...er_guide_full_stack_dblink_integration.rst | 5 +- .../user_guide_full_stack_geopandas.rst | 6 +-- ...ser_guide_full_stack_linear_regression.rst | 10 ++-- docs/source/user_guide_full_stack_to_json.rst | 46 ++++++------------- ...user_guide_full_stack_train_test_split.rst | 10 ++-- ...user_guide_full_stack_vdataframe_magic.rst | 6 +-- ...user_guide_introduction_best_practices.rst | 22 ++++----- .../user_guide_introduction_installation.rst | 10 ++-- ...er_guide_machine_learning_introduction.rst | 2 +- ..._guide_machine_learning_model_tracking.rst | 10 ++-- ...ser_guide_machine_learning_time_series.rst | 4 +- 42 files changed, 151 insertions(+), 185 deletions(-) diff --git a/docs/source/examples.rst b/docs/source/examples.rst index 947033450..9bc8810a0 100644 --- a/docs/source/examples.rst +++ b/docs/source/examples.rst @@ -1,12 +1,9 @@ .. _examples: - ============ Examples ============ - - .. grid:: 1 1 2 2 .. grid-item:: diff --git a/docs/source/examples_business_africa_education.rst b/docs/source/examples_business_africa_education.rst index f023d1451..da83f472b 100644 --- a/docs/source/examples_business_africa_education.rst +++ b/docs/source/examples_business_africa_education.rst @@ -260,7 +260,7 @@ Eight seems to be a suitable number of clusters. Let's compute a ``k-means`` mod model = KMeans(n_cluster = 8) model.fit(africa, X = ["lon", "lat"]) -We can add the prediction to the ``vDataFrame`` and draw the scatter map. +We can add the prediction to the :py:mod:`vDataFrame` and draw the scatter map. .. code-block:: python @@ -501,7 +501,7 @@ Let's look at the feature importance for each model. Feature importance between between math score and the reading score are almost identical. -We can add these predictions to the main ``vDataFrame``. +We can add these predictions to the main :py:mod:`vDataFrame`. .. code-block:: python diff --git a/docs/source/examples_business_battery.rst b/docs/source/examples_business_battery.rst index 9fc2d38e7..7d61db351 100644 --- a/docs/source/examples_business_battery.rst +++ b/docs/source/examples_business_battery.rst @@ -20,11 +20,7 @@ Dataset ++++++++ In this example of **predictive maintenance**, we propose a data-driven method -to estimate the health of a battery using the -`Li-ion battery dataset `_ -released by NASA. - - +to estimate the health of a battery using the `Li-ion battery dataset `_ released by NASA. This dataset includes information on Li-ion batteries over several charge and discharge cycles at room temperature. Charging was at a constant current @@ -87,8 +83,7 @@ Let us now ingest the data. Understanding the Data ----------------------- -Let's examine our data. Here, we use `vDataFrame.head()` -to retrieve the first five rows of the dataset. +Let's examine our data. Here, we use :py:func:`~verticapy.vDataFrame.head` to retrieve the first five rows of the dataset. .. ipython:: python :suppress: @@ -103,7 +98,7 @@ to retrieve the first five rows of the dataset. :file: /project/data/VerticaPy/docs/figures/examples_battery_table_head.html -Let's perform a few aggregations with `vDataFrame.describe()` to get a high-level overview of the dataset. +Let's perform a few aggregations with :py:func:`~verticapy.vDataFrame.describe` to get a high-level overview of the dataset. .. code-block:: python @@ -567,12 +562,9 @@ and the time needed to reach minimum voltage and maximum temperature. Machine Learning ----------------- +AutoML tests several models and returns input scores for each. We can use this to find the best model for our dataset. -AutoML tests several models and returns input -scores for each. We can use this to find the best model for our dataset. - -.. note:: We are only using the three algorithms, but you can change the `estiamtor` parameter to try all the 'native' algorithms. - ``estiamtor = 'native' `` +.. note:: We are only using the three algorithms, but you can change the `estimator` parameter to try all the 'native' algorithms: ``estimator = 'native' ``. .. code-block:: python diff --git a/docs/source/examples_business_booking.rst b/docs/source/examples_business_booking.rst index 8442aa375..98a5da743 100644 --- a/docs/source/examples_business_booking.rst +++ b/docs/source/examples_business_booking.rst @@ -77,7 +77,7 @@ Data Exploration and Preparation Sessionization is the process of gathering clicks for a certain period of time. We usually consider that after 30 minutes of inactivity, the user session ends (``date_time - lag(date_time) > 30 minutes``). For these kinds of use cases, aggregating sessions with meaningful statistics is the key for making accurate predictions. -We start by using the ``sessionize`` method to create the variable 'session_id'. We can then use this variable to aggregate the data. +We start by using the :py:func:`~verticapy.vDataFrame.sessionize` method to create the variable 'session_id'. We can then use this variable to aggregate the data. .. code-block:: python @@ -234,7 +234,7 @@ We can see huge links between some of the variables ('mode_hotel_cluster_count' Machine Learning ----------------- -Let's create our ``LogisticRegression`` model. +Let's create our :py:func:`~verticapy.machine_learning.vertica.LogisticRegression` model. .. ipython:: python @@ -279,7 +279,7 @@ It looks like there are two main predictors: 'mode_hotel_cluster_count' and 'tri - look for a shorter trip duration. - not click as much (spend more time at the same web page). -Let's add our prediction to the ``vDataFrame``. +Let's add our prediction to the :py:mod:`vDataFrame`. .. code-block:: python @@ -304,7 +304,7 @@ Let's add our prediction to the ``vDataFrame``. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_expedia_predict_proba_1.html -While analyzing the following boxplot (prediction partitioned by 'is_booking'), we can notice that the ``cutoff`` is around 0.22 because most of the positive predictions have a probability between 0.23 and 0.5. Most of the negative predictions are between 0.05 and 0.2. +While analyzing the following boxplot (prediction partitioned by 'is_booking'), we can notice that the `cutoff` is around 0.22 because most of the positive predictions have a probability between 0.23 and 0.5. Most of the negative predictions are between 0.05 and 0.2. .. code-block:: python @@ -320,13 +320,13 @@ While analyzing the following boxplot (prediction partitioned by 'is_booking'), .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_expedia_predict_boxplot_1.html -Let's confirm our hypothesis by computing the best ``cutoff``. +Let's confirm our hypothesis by computing the best `cutoff`. .. ipython:: python model_logit.score(metric = "best_cutoff") -Let's look at the efficiency of our model with a cutoff of ``0.22``. +Let's look at the efficiency of our model with a cutoff of 0.22. .. code-block:: python diff --git a/docs/source/examples_business_churn.rst b/docs/source/examples_business_churn.rst index 8da9b321a..bde4c7fbd 100644 --- a/docs/source/examples_business_churn.rst +++ b/docs/source/examples_business_churn.rst @@ -203,7 +203,7 @@ ________ Machine Learning ----------------- -``LogisticRegression`` is a very powerful algorithm and we can use it to detect churns. Let's split our ``vDataFrame`` into training and testing set to evaluate our model. +:py:func:`~verticapy.machine_learning.vertica.LogisticRegression` is a very powerful algorithm and we can use it to detect churns. Let's split our :py:mod:`vDataFrame` into training and testing set to evaluate our model. .. ipython:: python diff --git a/docs/source/examples_business_credit_card_fraud.rst b/docs/source/examples_business_credit_card_fraud.rst index 33caba098..974820e01 100644 --- a/docs/source/examples_business_credit_card_fraud.rst +++ b/docs/source/examples_business_credit_card_fraud.rst @@ -328,7 +328,7 @@ We will split the dataset into a train (day 1) and a test (day 2). Supervising would make this pretty easy since it would just be a binary classification problem. We can use different algorithms to optimize the prediction. Our dataset is unbalanced, so the AUC might be a good metric to evaluate the model. The PRC AUC would also be a relevant metric. -``LogisticRegression`` works well with monotonic relationships. Since we have a lot of independent features that correlate with the response, it should be a good first model to use. +:py:func:`~verticapy.machine_learning.vertica.LogisticRegression` works well with monotonic relationships. Since we have a lot of independent features that correlate with the response, it should be a good first model to use. .. code-block:: python @@ -398,7 +398,7 @@ Due to the complexity of the computations, anomalies are difficult to detect in - **Machine Learning:** We need to use easily-deployable algorithms to perform real-time fraud detection. Isolation forests and ``k-means`` can be easily deployed and they work well for detecting anomalies. - **Rules & Thresholds:** The z-score can be an efficient solution for detecting global outliers. -- **Decomposition:** Robust ``PCA`` is another technique for detecting outliers. +- **Decomposition:** Robust :py:func:`~verticapy.machine_learning.vertica.PCA` is another technique for detecting outliers. Before using these techniques, let's draw some scatter plots to get a better idea of what kind of anomalies we can expect. @@ -453,7 +453,7 @@ For the rest of this example, we'll investigate labels and how they can help us We begin by examining ``k-means`` clustering, which partitions the data into k clusters. -We can use an elbow curve to find a suitable number of clusters. We can then add more clusters then the amount suggested by the ``elbow`` curve to create clusters mainly composed of anomalies. Clusters with relatively fewer elements can then be investigated by an expert to label the anomalies. +We can use an elbow curve to find a suitable number of clusters. We can then add more clusters then the amount suggested by the :py:func:`~verticapy.machine_learning.model_selection.elbow` curve to create clusters mainly composed of anomalies. Clusters with relatively fewer elements can then be investigated by an expert to label the anomalies. From there, we perform the following procedure: @@ -535,7 +535,7 @@ Notice that clusters with fewer elemenets tend to contain much more fraudulent e **Outliers of the distribution** -Let's use the ``z-score`` to detect global outliers of the distribution. +Let's use the ``Z-score`` to detect global outliers of the distribution. .. code-block:: python @@ -635,7 +635,7 @@ We can catch outliers with a neighbors score. Again, the main problem with these **Other Techniques** -Other scalable techniques that can solve this problem are robust ``PCA`` and isolation forest. +Other scalable techniques that can solve this problem are robust :py:func:`~verticapy.machine_learning.vertica.PCA` and isolation forest. Conclusion ----------- diff --git a/docs/source/examples_business_football.rst b/docs/source/examples_business_football.rst index 5d1878d40..3ccc75936 100644 --- a/docs/source/examples_business_football.rst +++ b/docs/source/examples_business_football.rst @@ -903,7 +903,7 @@ Let's export the result to our Vertica database. Team Rankings with k-means --------------------------- -To compute a ``k-means`` model, we need to find a value for 'k'. Let's draw an ``elbow`` curve to find a suitable number of clusters. +To compute a ``k-means`` model, we need to find a value for 'k'. Let's draw an :py:func:`~verticapy.machine_learning.model_selection.elbow` curve to find a suitable number of clusters. .. code-block:: python @@ -975,7 +975,7 @@ To compute a ``k-means`` model, we need to find a value for 'k'. Let's draw an ` model_kmeans.fit("football_clustering", predictors) model_kmeans.clusters_ -Let's add the prediction to the ``vDataFrame``. +Let's add the prediction to the :py:mod:`vDataFrame`. .. code-block:: python @@ -1974,7 +1974,7 @@ Looking at the importance of each feature, it seems like direct confrontations a .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_football_features_importance.html -Let's add the predictions to the ``vDataFrame``. +Let's add the predictions to the :py:mod:`vDataFrame`. Draws are pretty rare, so we'll only consider them if a tie was very likely to occur. diff --git a/docs/source/examples_business_insurance.rst b/docs/source/examples_business_insurance.rst index 52ee0dfbf..1d7e72148 100644 --- a/docs/source/examples_business_insurance.rst +++ b/docs/source/examples_business_insurance.rst @@ -38,7 +38,7 @@ You can skip the below cell if you already have an established connection. vp.connect("VerticaDSN") -Let's create a new schema and assign the data to a ``vDataFrame`` object. +Let's create a new schema and assign the data to a :py:mod:`vDataFrame` object. .. code-block:: ipython diff --git a/docs/source/examples_business_movies.rst b/docs/source/examples_business_movies.rst index 3c77cb9e2..d5260c40c 100644 --- a/docs/source/examples_business_movies.rst +++ b/docs/source/examples_business_movies.rst @@ -43,7 +43,7 @@ You can skip the below cell if you already have an established connection. vp.connect("VerticaDSN") -Let's create a new schema and assign the data to a ``vDataFrame`` object. +Let's create a new schema and assign the data to a :py:mod:`vDataFrame` object. .. code-block:: ipython @@ -349,7 +349,7 @@ Let's join our notoriety metrics for actors and directors with the main dataset. ], ) -As we did many operation, it can be nice to save the ``vDataFrame`` as a table in the Vertica database. +As we did many operation, it can be nice to save the :py:mod:`vDataFrame` as a table in the Vertica database. .. code-block:: python @@ -754,7 +754,7 @@ Let's create a model to evaluate an unbiased score for each different movie. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_movies_filmtv_complete_model_report.html -The model is good. Let's add it in our ``vDataFrame``. +The model is good. Let's add it in our :py:mod:`vDataFrame`. .. code-block:: python @@ -871,7 +871,7 @@ Since ``k-means`` clustering is sensitive to unnormalized data, let's normalize .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_movies_filmtv_normalize_minmax.html -Let's compute the ``elbow`` curve to find a suitable number of clusters. +Let's compute the :py:func:`~verticapy.machine_learning.model_selection.elbow` curve to find a suitable number of clusters. .. ipython:: python @@ -926,7 +926,7 @@ By looking at the elbow curve, we can choose 15 clusters. Let's create a ``k-mea model_kmeans.fit(filmtv_movies_complete, predictors) model_kmeans.clusters_ -Let's add the clusters in the ``vDataFrame``. +Let's add the clusters in the :py:mod:`vDataFrame`. .. code-block:: python diff --git a/docs/source/examples_business_smart_meters.rst b/docs/source/examples_business_smart_meters.rst index 28f38f1de..d745f4480 100644 --- a/docs/source/examples_business_smart_meters.rst +++ b/docs/source/examples_business_smart_meters.rst @@ -44,7 +44,7 @@ You can skip the below cell if you already have an established connection. vp.connect("VerticaDSN") -Create the ``vDataFrames`` of the datasets: +Create the :py:mod:`vDataFrame` of the datasets: .. code-block:: python @@ -217,7 +217,7 @@ The dataset 'sm_meters' is pretty important. In particular, the type of residenc :width: 100% :align: center -Based on the scatter plot, five seems like the optimal number of clusters. Let's verify this hypothesis using an ``elbow`` curve. +Based on the scatter plot, five seems like the optimal number of clusters. Let's verify this hypothesis using an :py:func:`~verticapy.machine_learning.model_selection.elbow` curve. .. code-block:: python diff --git a/docs/source/examples_business_spam.rst b/docs/source/examples_business_spam.rst index 8cb5c499b..4f02dd97f 100644 --- a/docs/source/examples_business_spam.rst +++ b/docs/source/examples_business_spam.rst @@ -106,7 +106,7 @@ Let's compute some statistics using the length of the message. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_spam_table_describe.html -**Notice:** spam tends to be longer than a normal message. First, let's create a view with just spam. Then, we'll use the ``CountVectorizer`` to create a dictionary and identify keywords. +**Notice:** spam tends to be longer than a normal message. First, let's create a view with just spam. Then, we'll use the :py:func:`~verticapy.machine_learning.vertica.CountVectorizer` to create a dictionary and identify keywords. .. code-block:: python @@ -138,7 +138,7 @@ Let's compute some statistics using the length of the message. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_spam_table_clean_2.html -Let's add the most occurent words in our ``vDataFrame`` and compute the correlation vector. +Let's add the most occurent words in our :py:mod:`vDataFrame` and compute the correlation vector. .. code-block:: python diff --git a/docs/source/examples_business_spotify.rst b/docs/source/examples_business_spotify.rst index 49c836c0f..ba214a6a8 100644 --- a/docs/source/examples_business_spotify.rst +++ b/docs/source/examples_business_spotify.rst @@ -88,7 +88,7 @@ Create a new schema, "spotify". Data Loading ------------- -Load the datasets into the ``vDataFrame`` with ``read_csv()`` and then view them with ``display()``. +Load the datasets into the :py:mod:`vDataFrame` with :py:func:`~verticapy.read_csv` and then view them with :py:func:`~verticapy.vDataFrame.head`. .. code-block:: @@ -521,14 +521,14 @@ Define a list of predictors and the response, and then save the normalized versi Machine Learning ----------------- -We can use ``AutoML`` to easily get a well-performing model. +We can use :py:func:`~verticapy.machine_learning.vertica.automl.AutoML` to easily get a well-performing model. .. ipython:: python # define a random seed so models tested by AutoML produce consistent results vp.set_option("random_state", 2) -``AutoML`` automatically tests several machine learning models and picks the best performing one. +:py:func:`~verticapy.machine_learning.vertica.automl.AutoML` automatically tests several machine learning models and picks the best performing one. .. ipython:: python :okwarning: @@ -569,7 +569,7 @@ Train the model. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_spotify_automl_plot.html -Extract the best model according to ``AutoML``. From here, we can look at the model type and its hyperparameters. +Extract the best model according to :py:func:`~verticapy.machine_learning.vertica.automl.AutoML`. From here, we can look at the model type and its hyperparameters. .. ipython:: python @@ -581,7 +581,7 @@ Extract the best model according to ``AutoML``. From here, we can look at the mo print(bm_type) print(hyperparams) -Thanks to ``AutoML``, we know best model type and its hyperparameters. Let's create a new model with this information in mind. +Thanks to :py:func:`~verticapy.machine_learning.vertica.automl.AutoML`, we know best model type and its hyperparameters. Let's create a new model with this information in mind. .. code-block:: @@ -797,7 +797,7 @@ Let's start by taking the averages of these numerical features for each artist. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_spotify_artists_features.html -Grouping means clustering, so we use an ``elbow`` curve to find a suitable number of clusters. +Grouping means clustering, so we use an :py:func:`~verticapy.machine_learning.model_selection.elbow` curve to find a suitable number of clusters. .. ipython:: python :okwarning: @@ -915,4 +915,4 @@ Let's see how our model groups these artists together: Conclusion ----------- -We were able to predict the popularity Polish songs with a ``RandomForestRegressor`` model suggested by ``AutoML``. We then created a ``k-means`` model to group artists into "genres" (clusters) based on the feature-commonalities in their tracks. \ No newline at end of file +We were able to predict the popularity Polish songs with a :py:func:`~verticapy.machine_learning.vertica.RandomForestRegressor` model suggested by :py:func:`~verticapy.machine_learning.vertica.automl.AutoML`. We then created a ``k-means`` model to group artists into "genres" (clusters) based on the feature-commonalities in their tracks. \ No newline at end of file diff --git a/docs/source/examples_learn_commodities.rst b/docs/source/examples_learn_commodities.rst index f32a71b43..eb4b82b16 100644 --- a/docs/source/examples_learn_commodities.rst +++ b/docs/source/examples_learn_commodities.rst @@ -320,12 +320,12 @@ Moving on to the correlation matrix, we can see many events that changed drastic .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_commodities_table_corr_2.html -We can see strong correlations between most of the variables. A vector autoregression (``VAR``) model seems ideal. +We can see strong correlations between most of the variables. A vector autoregression (:py:func:`~verticapy.machine_learning.vertica.VAR`) model seems ideal. Machine Learning ----------------- -Let's create the ``VAR`` model to predict the value of various commodities. +Let's create the :py:func:`~verticapy.machine_learning.vertica.VAR` model to predict the value of various commodities. .. code-block:: python @@ -441,7 +441,7 @@ Our model is excellent. Let's predict the values these commodities in the near f .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_commodities_table_pred_plot_4.html -The model performs well but may be somewhat unstable. To improve it, we could apply data preparation techniques, such as seasonal decomposition, before building the ``VAR`` model. +The model performs well but may be somewhat unstable. To improve it, we could apply data preparation techniques, such as seasonal decomposition, before building the :py:func:`~verticapy.machine_learning.vertica.VAR` model. Conclusion ----------- diff --git a/docs/source/examples_learn_iris.rst b/docs/source/examples_learn_iris.rst index 180431d84..d77fb69bd 100644 --- a/docs/source/examples_learn_iris.rst +++ b/docs/source/examples_learn_iris.rst @@ -170,7 +170,7 @@ Our strategy is simple: we'll use two Linear Support Vector Classification (SVC) Machine Learning ----------------- -Let's build the first ``LinearSVC`` to predict if a flower is an Iris setosa. +Let's build the first :py:func:`~verticapy.machine_learning.vertica.LinearSVC` to predict if a flower is an Iris setosa. .. code-block:: python @@ -221,7 +221,7 @@ Let's plot the model to see the perfect separation. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_model_plot.html -We can add this probability to the ``vDataFrame``. +We can add this probability to the :py:mod:`vDataFrame`. .. code-block:: python @@ -275,7 +275,7 @@ Let's create a model to classify the Iris virginica. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_iris_table_ml_cv_2.html -We have another excellent model. Let's add it to the ``vDataFrame``. +We have another excellent model. Let's add it to the :py:mod:`vDataFrame`. .. code-block:: python @@ -294,7 +294,7 @@ We have another excellent model. Let's add it to the ``vDataFrame``. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_model_predict_proba_2.html -Let's evaluate our final model (the combination of two ``LinearSVC``s). +Let's evaluate our final model (the combination of two :py:func:`~verticapy.machine_learning.vertica.LinearSVC`). .. code-block:: python diff --git a/docs/source/examples_learn_pokemon.rst b/docs/source/examples_learn_pokemon.rst index e94087262..6af81da22 100644 --- a/docs/source/examples_learn_pokemon.rst +++ b/docs/source/examples_learn_pokemon.rst @@ -250,7 +250,7 @@ In terms of missing values, our only concern is the Pokemon's second type (Type_ .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_pokemon_table_clean_2.html -Let's use the current_relation method to see how our data preparation so far on the ``vDataFrame`` generates SQL code. +Let's use the current_relation method to see how our data preparation so far on the :py:mod:`vDataFrame` generates SQL code. .. ipython:: python diff --git a/docs/source/examples_learn_titanic.rst b/docs/source/examples_learn_titanic.rst index 5f3b098f8..38930bb6f 100644 --- a/docs/source/examples_learn_titanic.rst +++ b/docs/source/examples_learn_titanic.rst @@ -217,7 +217,7 @@ The "sibsp" column represents the number of siblings for each passenger, while t titanic["family_size"] = titanic["parch"] + titanic["sibsp"] + 1 -Let's move on to outliers. We have several tools for locating outliers (``LocalOutlier Factor``, ``DBSCAN``, ``k-means``...), but we'll just use winsorization in this example. Again, "fare" has many outliers, so we'll start there. +Let's move on to outliers. We have several tools for locating outliers (:py:func:`~verticapy.machine_learning.vertica.LocalOutlierFactor`, :py:func:`~verticapy.machine_learning.vertica.DBSCAN`, ``k-means``...), but we'll just use winsorization in this example. Again, "fare" has many outliers, so we'll start there. .. code-block:: python @@ -302,7 +302,7 @@ Survival correlates strongly with whether or not a passenger has a lifeboat (the - Passengers with a lifeboat - Passengers without a lifeboat -Before we move on: we did a lot of work to clean up this data, but we haven't saved anything to our Vertica database! Let's look at the modifications we've made to the ``vDataFrame``. +Before we move on: we did a lot of work to clean up this data, but we haven't saved anything to our Vertica database! Let's look at the modifications we've made to the :py:mod:`vDataFrame`. .. ipython:: python @@ -322,7 +322,7 @@ VerticaPy dynamically generates SQL code whenever you make modifications to your vp.set_option("sql_on", False) print(titanic.info()) -Let's move on to modeling our data. Save the ``vDataFrame`` to your Vertica database. +Let's move on to modeling our data. Save the :py:mod:`vDataFrame` to your Vertica database. .. ipython:: python :okwarning: diff --git a/docs/source/examples_understand_amazon.rst b/docs/source/examples_understand_amazon.rst index 04226cdab..f067ceeb5 100644 --- a/docs/source/examples_understand_amazon.rst +++ b/docs/source/examples_understand_amazon.rst @@ -73,7 +73,7 @@ We can explore our data by displaying descriptive statistics of all the columns. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_amazon_table_describe.html -Using the ``describe()`` method, we can see that our data ranges from the beginning of 1998 to the end of 2017. +Using the :py:func:`~verticapy.vDataFrame.describe` method, we can see that our data ranges from the beginning of 1998 to the end of 2017. .. code-block:: python @@ -224,7 +224,7 @@ Although it would be preferable to use seasonal decomposition and predict the re Machine Learning ----------------- -Since the seasonality occurs monthly, we set ``p = 12``. There is no trend in the data, and we observe some moving average in the residuals, so ``q`` should be around ``2``. Let's proceed with building the model. +Since the seasonality occurs monthly, we set ``p = 12``. There is no trend in the data, and we observe some moving average in the residuals, so `q` should be around 2. Let's proceed with building the model. .. code-block:: python diff --git a/docs/source/examples_understand_covid19.rst b/docs/source/examples_understand_covid19.rst index 4889eece0..07b62467c 100644 --- a/docs/source/examples_understand_covid19.rst +++ b/docs/source/examples_understand_covid19.rst @@ -283,13 +283,14 @@ Because of the upward monotonic trend, we can also look at the correlation betwe covid19["elapsed_days"] = covid19["date"] - fun.min(covid19["date"])._over(by = [covid19["state"]]) -We can generate the SQL code of the ``vDataFrame`` to see what happens behind the scenes when we modify our data from within the ``vDataFrame``. +We can generate the SQL code of the :py:mod:`vDataFrame` +to see what happens behind the scenes when we modify our data from within the :py:mod:`vDataFrame`. .. ipython:: python print(covid19.current_relation()) -The ``vDataFrame`` memorizes all of our operations on the data to dynamically generate the correct SQL statement and passes computation and aggregation to Vertica. +The :py:mod:`vDataFrame` memorizes all of our operations on the data to dynamically generate the correct SQL statement and passes computation and aggregation to Vertica. Let's see the correlation between the number of deaths and the other variables. @@ -306,7 +307,7 @@ Let's see the correlation between the number of deaths and the other variables. .. raw:: html :file: /project/data/VerticaPy/docs/figures/examples_covid19_table_plot_corr_5.html -We can see clearly a high correlation for some variables. We can use them to compute a ``SARIMAX`` model, but we'll stick to a VAR model for this study. +We can see clearly a high correlation for some variables. We can use them to compute a ``SARIMAX`` model, but we'll stick to a :py:func:`~verticapy.machine_learning.vertica.VAR` model for this study. Let's compute the total number of deaths and cases to create our VAR model. @@ -334,7 +335,7 @@ Let's compute the total number of deaths and cases to create our VAR model. Machine Learning ----------------- -Let's create a ``VAR`` model to predict the number of COVID-19 deaths and cases in the USA. +Let's create a :py:func:`~verticapy.machine_learning.vertica.VAR` model to predict the number of COVID-19 deaths and cases in the USA. .. code-block:: python diff --git a/docs/source/user_guide.rst b/docs/source/user_guide.rst index 305e253d5..b0991d92f 100644 --- a/docs/source/user_guide.rst +++ b/docs/source/user_guide.rst @@ -1,12 +1,9 @@ .. _user_guide: - ============ User Guide ============ - - .. grid:: 1 1 2 2 .. grid-item:: @@ -24,10 +21,8 @@ User Guide (such as :py:class:`~vDataFrame`, :py:class:`~vDataColumn` etc), and set the stage for a seamless analytical journey. - :bdg-primary:`vDataFrame` :bdg-primary:`vDataColumn` - .. grid-item:: .. card:: Data Ingestion diff --git a/docs/source/user_guide_data_exploration_charts.rst b/docs/source/user_guide_data_exploration_charts.rst index c254406b7..7b13979f7 100644 --- a/docs/source/user_guide_data_exploration_charts.rst +++ b/docs/source/user_guide_data_exploration_charts.rst @@ -24,7 +24,7 @@ First, let's import the modules needed for this notebook. Let's start with pies and histograms. Drawing the pie or histogram of a categorical column in VerticaPy is quite easy. -.. note:: You can conveniently switch between the three available plotting libraries using :py:func:`verticapy.set_option`. +.. note:: You can conveniently switch between the three available plotting libraries using :py:func:`~verticapy.set_option`. .. code-block:: @@ -385,7 +385,7 @@ It is also possible to use SHP datasets to draw maps. max_cardinality = 100 ) -Time-series plots are also available with the ``plot`` method. +Time-series plots are also available with the :py:func:`~verticapy.vDataFrame.plot` method. .. ipython:: python @@ -394,7 +394,7 @@ Time-series plots are also available with the ``plot`` method. @savefig user_guides_data_exploration_amazon_time.png amazon["number"].plot(ts = "date", by = "state") -Since time-series plots do not aggregate the data, it's important to choose the correct ``start_date`` and ``end_date``. +Since time-series plots do not aggregate the data, it's important to choose the correct `start_date` and `end_date`. .. code-block:: python diff --git a/docs/source/user_guide_data_exploration_correlations.rst b/docs/source/user_guide_data_exploration_correlations.rst index 1ef1c2ecc..a2d1f87c0 100644 --- a/docs/source/user_guide_data_exploration_correlations.rst +++ b/docs/source/user_guide_data_exploration_correlations.rst @@ -162,7 +162,7 @@ Binary features are considered numerical, but this isn't technically accurate. S .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_de_plot_corr_7.html -Lastly, we'll look at the relationship between categorical columns. In this case, the 'Cramer's V' method is very efficient. Since there is no position in the Euclidean space for those variables, the 'Cramer's V' coefficients cannot be negative (which is a sign of an opposite relationship) and they will range in the interval ``[0,1]``. +Lastly, we'll look at the relationship between categorical columns. In this case, the 'Cramer's V' method is very efficient. Since there is no position in the Euclidean space for those variables, the 'Cramer's V' coefficients cannot be negative (which is a sign of an opposite relationship) and they will range in the interval `[0,1]`. .. code-block:: python @@ -180,7 +180,7 @@ Lastly, we'll look at the relationship between categorical columns. In this case .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_de_plot_corr_8.html -Sometimes, we just need to look at the correlation between a response and other variables. The parameter ``focus`` will isolate and show us the specified correlation vector. +Sometimes, we just need to look at the correlation between a response and other variables. The parameter `focus` will isolate and show us the specified correlation vector. .. code-block:: python @@ -198,7 +198,7 @@ Sometimes, we just need to look at the correlation between a response and other .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_de_plot_corr_9.html -Sometimes a correlation coefficient can lead to incorrect assumptions, so we should always look at the coefficient ``p-value``. +Sometimes a correlation coefficient can lead to incorrect assumptions, so we should always look at the coefficient `p-value`. .. ipython:: python diff --git a/docs/source/user_guide_data_exploration_descriptive_statistics.rst b/docs/source/user_guide_data_exploration_descriptive_statistics.rst index 992b02955..bc7e8788a 100644 --- a/docs/source/user_guide_data_exploration_descriptive_statistics.rst +++ b/docs/source/user_guide_data_exploration_descriptive_statistics.rst @@ -185,8 +185,8 @@ You can also use the 'groupby' method to compute customized aggregations. :file: /project/data/VerticaPy/docs/figures/user_guides_data_exploration_descriptive_stats_group_by_python.html Computing many aggregations at the same time can be resource intensive. -You can use the parameters ``ncols_block`` and ``processes`` to manage the ressources. +You can use the parameters `ncols_block` and `processes` to manage the ressources. -For example, the parameter ``ncols_block`` will divide the main query into smaller using a specific number of columns. The parameter `processes` allows you to manage the number of queries you want to send at the same time. +For example, the parameter `ncols_block` will divide the main query into smaller using a specific number of columns. The parameter `processes` allows you to manage the number of queries you want to send at the same time. An entire example is available in the :py:func:`~verticapy.vDataFrame.agg` documentation. \ No newline at end of file diff --git a/docs/source/user_guide_data_ingestion.rst b/docs/source/user_guide_data_ingestion.rst index 38658802a..f8cb4d34c 100644 --- a/docs/source/user_guide_data_ingestion.rst +++ b/docs/source/user_guide_data_ingestion.rst @@ -148,7 +148,7 @@ In the following example, we will use :py:func:`~verticapy.read_csv` to ingest a titanic = load_titanic() -To convert a subset of the dataset to a CSV file, select the desired rows in the dataset and use the :py:func:`~verticapy.to_csv` ``vDataFrame`` method: +To convert a subset of the dataset to a CSV file, select the desired rows in the dataset and use the :py:func:`~verticapy.to_csv` :py:mod:`vDataFrame` method: .. ipython:: python diff --git a/docs/source/user_guide_data_preparation_decomposition.rst b/docs/source/user_guide_data_preparation_decomposition.rst index a2f7345d6..ecb7fb349 100644 --- a/docs/source/user_guide_data_preparation_decomposition.rst +++ b/docs/source/user_guide_data_preparation_decomposition.rst @@ -6,7 +6,7 @@ Decomposition Decomposition is the process of using an orthogonal transformation to convert a set of observations of possibly-correlated variables (with numerical values) into a set of values of linearly-uncorrelated variables called principal components. -Since some algorithms are sensitive to correlated predictors, it can be a good idea to use the ``PCA`` (Principal Component Analysis: Decomposition Technique) before applying the algorithm. Since some algorithms are also sensitive to the number of predictors, we'll have to be picky with which variables we include. +Since some algorithms are sensitive to correlated predictors, it can be a good idea to use the :py:func:`~verticapy.machine_learning.vertica.PCA` (Principal Component Analysis: Decomposition Technique) before applying the algorithm. Since some algorithms are also sensitive to the number of predictors, we'll have to be picky with which variables we include. To demonstrate data decomposition in VerticaPy, we'll use the well-known 'Iris' dataset. @@ -65,7 +65,7 @@ Let's compute the PCA of the different elements. ], ) -Let's compute the correlation matrix of the result of the ``PCA``. +Let's compute the correlation matrix of the result of the :py:func:`~verticapy.machine_learning.vertica.PCA`. .. code-block:: python @@ -89,7 +89,7 @@ Notice that the predictors are now independant and combined together and they ha model.explained_variance_ -Most of the information is in the first two components with more than 97.7% of explained variance. We can export this result to a ``vDataFrame``. +Most of the information is in the first two components with more than 97.7% of explained variance. We can export this result to a :py:mod:`vDataFrame`. .. code-block:: diff --git a/docs/source/user_guide_data_preparation_duplicates.rst b/docs/source/user_guide_data_preparation_duplicates.rst index e6be420fe..944f217fe 100644 --- a/docs/source/user_guide_data_preparation_duplicates.rst +++ b/docs/source/user_guide_data_preparation_duplicates.rst @@ -30,7 +30,7 @@ Let's use the Iris dataset to understand the tools VerticaPy gives you for handl .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_duplicates_1.html -To find all the duplicates, you can use the ``duplicated`` method. +To find all the duplicates, you can use the :py:func:`~verticapy.vDataFrame.duplicated` method. .. code-block:: python @@ -49,7 +49,7 @@ To find all the duplicates, you can use the ``duplicated`` method. As you might expect, some flowers might share the exact same characteristics. But we have to be careful; this doesn't mean that they are real duplicates. In this case, we don't have to drop them. -That said, if we did want to drop these duplicates, we can do so with the ``drop_duplicates`` method. +That said, if we did want to drop these duplicates, we can do so with the :py:func:`~verticapy.vDataFrame.drop_duplicates` method. .. code-block:: python diff --git a/docs/source/user_guide_data_preparation_encoding.rst b/docs/source/user_guide_data_preparation_encoding.rst index dbee163dd..1f269c983 100644 --- a/docs/source/user_guide_data_preparation_encoding.rst +++ b/docs/source/user_guide_data_preparation_encoding.rst @@ -54,7 +54,7 @@ Let's look at the 'age' of the passengers. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_plot_encoding_2.html -By using the ``discretize`` method, we can discretize the data using equal-width binning. +By using the :py:func:`~verticapy.vDataFrame.discretize` method, we can discretize the data using equal-width binning. .. code-block:: python @@ -118,7 +118,7 @@ Computing categories using a response column can also be a good solution. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_plot_encoding_5.html -We can view the available techniques in the ``discretize`` method with the ``help`` method. +We can view the available techniques in the :py:func:`~verticapy.vDataFrame.discretize` method with the :py:func:`help` method. .. ipython:: python @@ -183,4 +183,4 @@ Let's use a mean encoding on the 'home.dest' variable. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_encoding_8.html -VerticaPy offers many encoding techniques. For example, the ``case_when`` and ``decode`` methods allow the user to use a customized encoding on a column. The ``discretize`` method allows you to reduce the number of categories in a column. It's important to get familiar with all the techniques available so you can make informed decisions about which to use for a given dataset. \ No newline at end of file +VerticaPy offers many encoding techniques. For example, the :py:func:`~verticapy.vDataFrame.case_when` and :py:func:`~verticapy.vDataFrame.decode` methods allow the user to use a customized encoding on a column. The :py:func:`~verticapy.vDataFrame.discretize` method allows you to reduce the number of categories in a column. It's important to get familiar with all the techniques available so you can make informed decisions about which to use for a given dataset. \ No newline at end of file diff --git a/docs/source/user_guide_data_preparation_features_engineering.rst b/docs/source/user_guide_data_preparation_features_engineering.rst index a305233a8..04cb11121 100644 --- a/docs/source/user_guide_data_preparation_features_engineering.rst +++ b/docs/source/user_guide_data_preparation_features_engineering.rst @@ -10,7 +10,7 @@ Features engineering makes use of many techniques - too many to go over in this Customized Features Engineering -------------------------------- -To build a customized feature, you can use the ``eval`` method of the ``vDataFrame``. Let's look at an example with the well-known 'Titanic' dataset. +To build a customized feature, you can use the :py:func:`~verticapy.vDataFrame.eval` method of the :py:mod:`vDataFrame`. Let's look at an example with the well-known 'Titanic' dataset. .. code-block:: python @@ -53,12 +53,12 @@ The feature 'parch' corresponds to the number of parents and children on-board. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_fe_2.html -When using the ``eval`` method, you can enter any SQL expression and VerticaPy will evaluate it! +When using the :py:func:`~verticapy.vDataFrame.eval` method, you can enter any SQL expression and VerticaPy will evaluate it! Regular Expressions -------------------- -To compute features using regular expressions, we'll use the ``regexp`` method. +To compute features using regular expressions, we'll use the :py:func:`~verticapy.vDataFrame.regexp` method. .. ipython:: python @@ -173,7 +173,7 @@ For each state, let's compute the previous number of forest fires. Moving Windows --------------- -Moving windows are powerful features. Moving windows are managed by the ``rolling`` method in VerticaPy. +Moving windows are powerful features. Moving windows are managed by the :py:func:`~verticapy.vDataFrame.rolling` method in VerticaPy. .. ipython:: python diff --git a/docs/source/user_guide_data_preparation_joins.rst b/docs/source/user_guide_data_preparation_joins.rst index 74f01ae61..e9376ea68 100644 --- a/docs/source/user_guide_data_preparation_joins.rst +++ b/docs/source/user_guide_data_preparation_joins.rst @@ -72,7 +72,7 @@ Third, we have the names of each airline. Notice that each dataset has a primary or secondary key on which to join the data. For example, we can join the 'flights' dataset to the 'airlines' and 'airport' datasets using the corresponding IATA code. -To join datasets in VerticaPy, use the vDataFrame's ``join`` method. +To join datasets in VerticaPy, use the vDataFrame's :py:func:`~verticapy.vDataFrame.join` method. .. ipython:: python diff --git a/docs/source/user_guide_data_preparation_missing_values.rst b/docs/source/user_guide_data_preparation_missing_values.rst index f67f5372f..53e8b5c68 100644 --- a/docs/source/user_guide_data_preparation_missing_values.rst +++ b/docs/source/user_guide_data_preparation_missing_values.rst @@ -36,7 +36,7 @@ To see how to handle missing values in VerticaPy, we'll use the well-known 'Tita .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_mv_1.html -We can examine the missing values with the ``count`` method. +We can examine the missing values with the :py:func:`~verticapy.vDataFrame.count` method. .. code-block:: python @@ -53,7 +53,7 @@ We can examine the missing values with the ``count`` method. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_mv_2.html -The missing values for 'boat' are MNAR; missing values simply indicate that the passengers didn't pay for a lifeboat. We can replace all the missing values with a new category 'No Lifeboat' using the ``fillna`` method. +The missing values for 'boat' are MNAR; missing values simply indicate that the passengers didn't pay for a lifeboat. We can replace all the missing values with a new category 'No Lifeboat' using the :py:func:`~verticapy.vDataFrame.fillna` method. .. code-block:: python @@ -97,7 +97,7 @@ Missing values for 'age' seem to be MCAR, so the best way to impute them is with .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_mv_4.html -The features 'embarked' and 'fare' have a couple missing values. Instead of using a technique to impute them, we can just drop them with the ``dropna`` method. +The features 'embarked' and 'fare' have a couple missing values. Instead of using a technique to impute them, we can just drop them with the :py:func:`~verticapy.vDataFrame.dropna` method. .. code-block:: python @@ -116,7 +116,7 @@ The features 'embarked' and 'fare' have a couple missing values. Instead of usin .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_mv_5.html -The ``fillna`` method offers many options. Let's use the ``help`` method to view its parameters. +The :py:func:`~verticapy.vDataFrame.fillna` method offers many options. Let's use the :py:func:`help` function to view its parameters. .. ipython:: python diff --git a/docs/source/user_guide_data_preparation_normalization.rst b/docs/source/user_guide_data_preparation_normalization.rst index 30aab4779..2da50167f 100644 --- a/docs/source/user_guide_data_preparation_normalization.rst +++ b/docs/source/user_guide_data_preparation_normalization.rst @@ -4,7 +4,7 @@ Normalization ============== -Normalizing data is crucial when using machine learning algorithms because of how sensitive most of them are to un-normalized data. For example, the neighbors-based and ``k-means`` algorithms use the ``p-distance`` in their learning phase. Normalization is the first step before using a linear regression due to Gauss-Markov assumptions. +Normalizing data is crucial when using machine learning algorithms because of how sensitive most of them are to un-normalized data. For example, the neighbors-based and `k-means` algorithms use the `p-distance` in their learning phase. Normalization is the first step before using a linear regression due to Gauss-Markov assumptions. Unnormalized data can also create complications for the convergence of some ML algorithms. Normalization is also a way to encode the data and to retain the global distribution. When we know the estimators to use to normalize the data, we can easily un-normalize the data and come back to the original distribution. @@ -12,7 +12,7 @@ There are three main normalization techniques: - **Z-Score:** We reduce and center the feature values using the average and standard deviation. This normalization is sensitive to outliers. - **Robust Z-Score:** We reduce and center the feature values using the median and the median absolute deviation. This normalization is robust to outliers. -- **Min-Max:** We reduce the feature values by using a bijection to ``[0,1]``. The max will reach 1 and the min will reach 0. This normalization is robust to outliers. +- **Min-Max:** We reduce the feature values by using a bijection to `[0,1]`. The max will reach 1 and the min will reach 0. This normalization is robust to outliers. To demonstrate data normalization in VerticaPy, we will use the well-known 'Titanic' dataset. @@ -53,7 +53,7 @@ Let's look at the 'fare' and 'age' of the passengers. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_norm_2.html -These lie in different numerical intervals so it's probably a good idea to normalize them. To normalize data in VerticaPy, we can use the ``normalize`` method. +These lie in different numerical intervals so it's probably a good idea to normalize them. To normalize data in VerticaPy, we can use the :py:func:`~verticapy.vDataFrame.normalize` method. .. ipython:: python @@ -80,4 +80,4 @@ The three main normalization techniques are available. Let's normalize the 'fare .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_table_norm_3.html -Both of the features now scale in ``[0,1]``. It is also possible to normalize by a specific partition with the ``by`` parameter. \ No newline at end of file +Both of the features now scale in `[0,1]`. It is also possible to normalize by a specific partition with the `by` parameter. \ No newline at end of file diff --git a/docs/source/user_guide_data_preparation_outliers.rst b/docs/source/user_guide_data_preparation_outliers.rst index 4d3eb2bb3..630d04eac 100644 --- a/docs/source/user_guide_data_preparation_outliers.rst +++ b/docs/source/user_guide_data_preparation_outliers.rst @@ -14,7 +14,7 @@ Outliers consist of three main types: - **Contextual Outliers:** Values deviate significantly from the rest of the data points in the same context. - **Collective Outliers:** Values that aren't global or contextual outliers, but as a collection deviate significantly from the entire dataset. -Global outliers are often the most critical type and can add a significant amount of bias into the data. Fortunately, we can easily identify these outliers by computing the ``Z-Score``. +Global outliers are often the most critical type and can add a significant amount of bias into the data. Fortunately, we can easily identify these outliers by computing the `Z-Score`. Let's look at some examples using the `Heart Disease `_ dataset. This dataset contains information on patients who are likely to have heart-related complications. @@ -56,7 +56,7 @@ Let's focus on a patient's maximum heart rate (thalach) and the cholesterol (cho .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_plot_outliers_2.html -We can see some outliers of the distribution: people with high cholesterol and others with a very low heart rate. Let's compute the global outliers using the ``outlier`` method. +We can see some outliers of the distribution: people with high cholesterol and others with a very low heart rate. Let's compute the global outliers using the :py:func:`~verticapy.vDataFrame.outlier` method. .. code-block:: python @@ -76,7 +76,7 @@ We can see some outliers of the distribution: people with high cholesterol and o .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_plot_outliers_3.html -It is also possible to draw an outlier plot using the ``outliers_plot`` method. +It is also possible to draw an outlier plot using the :py:func:`~verticapy.vDataFrame.outliers_plot` method. .. code-block:: python @@ -94,9 +94,9 @@ It is also possible to draw an outlier plot using the ``outliers_plot`` method. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_plot_outliers_4.html -We've detected some global outliers in the distribution and we can impute these with the ``fill_outliers`` method. +We've detected some global outliers in the distribution and we can impute these with the :py:func:`~verticapy.vDataFrame.fill_outliers` method. -Generally, you can identify global outliers with the ``Z-Score``. We'll consider a ``Z-Score`` greater than 3 indicates that the datapoint is an outlier. Some less precise techniques consider the data points belonging in the first and last alpha-quantile as outliers. You're free to choose either of these strategies when filling outliers. +Generally, you can identify global outliers with the `Z-Score`. We'll consider a `Z-Score` greater than 3 indicates that the datapoint is an outlier. Some less precise techniques consider the data points belonging in the first and last alpha-quantile as outliers. You're free to choose either of these strategies when filling outliers. .. code-block:: python @@ -140,7 +140,7 @@ Generally, you can identify global outliers with the ``Z-Score``. We'll consider .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_plot_outliers_5.html -Other techniques like ``DBSCAN`` or local outlier factor (``LOF``) can be to used to check other data points for outliers. +Other techniques like :py:func:`~verticapy.machine_learning.vertica.DBSCAN` or local outlier factor (`LOF`) can be to used to check other data points for outliers. .. code-block:: python @@ -192,7 +192,7 @@ Other techniques like ``DBSCAN`` or local outlier factor (``LOF``) can be to use .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_plot_outliers_7.html -While ``DBSCAN`` identifies outliers when computing the clusters, ``LOF`` computes an outlier score. Generally, a ``LOF`` Score greater than 1.5 indicates an outlier. +While :py:func:`~verticapy.machine_learning.vertica.DBSCAN` identifies outliers when computing the clusters, `LOF` computes an outlier score. Generally, a `LOF` Score greater than 1.5 indicates an outlier. .. code-block:: python @@ -244,4 +244,4 @@ While ``DBSCAN`` identifies outliers when computing the clusters, ``LOF`` comput .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_dp_plot_outliers_9.html -We have many other techniques like the ``k-means`` clustering for finding outliers, but the most important method is using the ``Z-Score``. After identifying outliers, we just have to decide how to impute the missing values. We'll focus on missing values in the next lesson. \ No newline at end of file +We have many other techniques like the `k-means` clustering for finding outliers, but the most important method is using the `Z-Score`. After identifying outliers, we just have to decide how to impute the missing values. We'll focus on missing values in the next lesson. \ No newline at end of file diff --git a/docs/source/user_guide_full_stack_dblink_integration.rst b/docs/source/user_guide_full_stack_dblink_integration.rst index 0a6f71d9f..17ffb71b2 100644 --- a/docs/source/user_guide_full_stack_dblink_integration.rst +++ b/docs/source/user_guide_full_stack_dblink_integration.rst @@ -204,7 +204,7 @@ For the above examples, the queries were pushed to the external database. If the function is unique to Vertica, it automatically fetches the data from the external database to compute on the Vertica server. -Let's try an example with the :py:func:`verticapy.vDataFrame.describe` function, which is a +Let's try an example with the :py:func:`~verticapy.vDataFrame.describe` function, which is a unique Vertica function. .. code-block:: python @@ -627,8 +627,7 @@ Pandas.DataFrame The joins also work with pandas.Dataframe. We can perform the same query that required multiple joins, but now with a local Pandas dataframe. -We can read a local passengers CSV file using :py:func:`verticapy.read_csv` -or we could create an artificial dataset as well. +We can read a local passengers CSV file using :py:func:`~verticapy.read_csv` or we could create an artificial dataset as well. .. code-block:: python diff --git a/docs/source/user_guide_full_stack_geopandas.rst b/docs/source/user_guide_full_stack_geopandas.rst index c2a0a01f4..700d36c15 100644 --- a/docs/source/user_guide_full_stack_geopandas.rst +++ b/docs/source/user_guide_full_stack_geopandas.rst @@ -4,7 +4,7 @@ Integrating with GeoPandas =========================== -As of version 0.4.0, VerticaPy features GeoPandas integration. This allows you to easily export a ``vDataFrame`` as a GeoPandas DataFrame, giving you more control over geospatial data. +As of version 0.4.0, VerticaPy features GeoPandas integration. This allows you to easily export a :py:mod:`vDataFrame` as a GeoPandas DataFrame, giving you more control over geospatial data. This example demonstrates the advantages of GeoPandas integration with the 'world' dataset. @@ -35,7 +35,7 @@ This example demonstrates the advantages of GeoPandas integration with the 'worl .. raw:: html :file: SPHINX_DIRECTORY/figures/ug_fs_table_gpd_1.html -The ``apply`` function of the VerticaPy stats module allows you to apply any Vertica function to the data. Let's compute the area of each country. +The :py:func:`~verticapy.vDataFrame.apply` function of the VerticaPy stats module allows you to apply any Vertica function to the data. Let's compute the area of each country. .. code-block:: python @@ -110,7 +110,7 @@ From there, we can draw any geospatial object. ax = ax, ) -You can also draw maps using the ``geo_plot`` method. +You can also draw maps using the :py:func:`~verticapy.vDataFrame.geo_plot` method. .. ipython:: python :okwarning: diff --git a/docs/source/user_guide_full_stack_linear_regression.rst b/docs/source/user_guide_full_stack_linear_regression.rst index e8a5ac67c..2235130bc 100644 --- a/docs/source/user_guide_full_stack_linear_regression.rst +++ b/docs/source/user_guide_full_stack_linear_regression.rst @@ -382,7 +382,7 @@ Let's draw a residual plot. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_plot_lr_13.html -We see a high heteroscedasticity, indicating that we can't trust the ``p-value`` of the coefficients. +We see a high heteroscedasticity, indicating that we can't trust the `p-value` of the coefficients. .. ipython:: python @@ -438,7 +438,7 @@ Example with decomposition Let's look at the same dataset, but use decomposition techniques to filter out unimportant information. We don't have to normalize our data or look at correlations with these types of methods. -We'll begin by repeating the data preparation process of the previous section and export the resulting ``vDataFrame`` to Vertica. +We'll begin by repeating the data preparation process of the previous section and export the resulting :py:mod:`vDataFrame` to Vertica. .. code-block:: ipython @@ -526,7 +526,7 @@ We'll begin by repeating the data preparation process of the previous section an .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_table_lr_16.html -Let's create our principal component analysis (``PCA``) model. +Let's create our principal component analysis (:py:func:`~verticapy.machine_learning.vertica.PCA`) model. .. code-block:: ipython @@ -560,7 +560,7 @@ Let's create our principal component analysis (``PCA``) model. .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_table_lr_17.html -We can verify the Gauss-Markov assumptions with our ``PCA`` model. +We can verify the Gauss-Markov assumptions with our :py:func:`~verticapy.machine_learning.vertica.PCA` model. .. code-block:: python @@ -610,4 +610,4 @@ As you can see, we've created a much more accurate model here than in our first Conclusion ----------- -We've seen two techniques that can help us create powerful linear regression models. While the first method normalized the data and looked for correlations, the second method applied a ``PCA`` model. The second one allows us to confirm the Gauss-Markov assumptions - an essential part of using linear models. \ No newline at end of file +We've seen two techniques that can help us create powerful linear regression models. While the first method normalized the data and looked for correlations, the second method applied a :py:func:`~verticapy.machine_learning.vertica.PCA` model. The second one allows us to confirm the Gauss-Markov assumptions - an essential part of using linear models. \ No newline at end of file diff --git a/docs/source/user_guide_full_stack_to_json.rst b/docs/source/user_guide_full_stack_to_json.rst index 18aea4964..098bf77a3 100644 --- a/docs/source/user_guide_full_stack_to_json.rst +++ b/docs/source/user_guide_full_stack_to_json.rst @@ -7,7 +7,6 @@ Example: XGBoost.to_json Connect to Vertica -------------------- - For a demonstration on how to create a new connection to Vertica, see :ref:`connection`. In this example, we will use an existing connection named 'VerticaDSN'. @@ -17,19 +16,13 @@ existing connection named 'VerticaDSN'. import verticapy as vp vp.connect("VerticaDSN") - Create a Schema (Optional) --------------------------- +Schemas allow you to organize database objects in a collection, similar to a namespace. If you create a database object +without specifying a schema, Vertica uses the 'public' schema. For example, to specify the 'example_table' in 'example_schema', you would use: 'example_schema.example_table'. -Schemas allow you to organize database objects in a collection, -similar to a namespace. If you create a database object -without specifying a schema, Vertica uses the 'public' -schema. For example, to specify the 'example_table' in 'example_schema', -you would use: 'example_schema.example_table'. - -To keep things organized, this example creates the 'xgb_to_json' -schema and drops it (and its associated tables, views, etc.) at the end: +To keep things organized, this example creates the 'xgb_to_json' schema and drops it (and its associated tables, views, etc.) at the end: .. ipython:: python :suppress: @@ -57,7 +50,7 @@ For a full list, check out :ref:`datasets`. You can also load your own data. To ingest data from a CSV file, -use the :py:func:`verticapy.read_csv` function. +use the :py:func:`~verticapy.read_csv` function. Create a vDataFrame -------------------- @@ -73,14 +66,14 @@ To create a vDataFrame out of a table in your Vertica database, specify its sche Create an XGB model ------------------- -Create a :py:func:`verticapy.machine_learning.vertica.ensemble.XGBClassifier` model. +Create a :py:func:`~verticapy.machine_learning.vertica.ensemble.XGBClassifier` model. Unlike a vDataFrame object, which simply queries the table it -was created with, the VerticaPy :py:func:`verticapy.machine_learning.vertica.ensemble.XGBClassifier` object creates +was created with, the VerticaPy :py:func:`~verticapy.machine_learning.vertica.ensemble.XGBClassifier` object creates and then references a model in Vertica, so it must be stored in a schema like any other database object. -This example creates the 'my_model' :py:func:`verticapy.machine_learning.vertica.ensemble.XGBClassifier` model in +This example creates the 'my_model' :py:func:`~verticapy.machine_learning.vertica.ensemble.XGBClassifier` model in the 'xgb_to_json' schema: This example loads the Titanic dataset with the load_titanic function @@ -98,16 +91,11 @@ into a table called 'titanic' in the 'xgb_to_json' schema: Prepare the Data ----------------- - -While Vertica XGBoost supports columns of type VARCHAR, -Python XGBoost does not, so you must encode the categorical +While Vertica XGBoost supports columns of type VARCHAR, Python XGBoost does not, so you must encode the categorical columns you want to use. You must also drop or impute missing values. -This example drops 'age,' 'fare,' 'sex,' 'embarked,' and -'survived' columns from the vDataFrame and then encodes the -'sex' and 'embarked' columns. These changes are applied to -the vDataFrame's query and does not affect the main -"xgb_to_json.titanic' table stored in Vertica: +This example drops 'age', 'fare', 'sex', 'embarked' and 'survived' columns from the vDataFrame and then encodes the +'sex' and 'embarked' columns. These changes are applied to the vDataFrame's query and does not affect the main "xgb_to_json.titanic' table stored in Vertica: .. ipython:: python @@ -129,8 +117,6 @@ the vDataFrame's query and does not affect the main .. raw:: html :file: /project/data/VerticaPy/docs/figures/ug_fs_to_json_vdf.html - - Split your data into training and testing: .. ipython:: python @@ -158,7 +144,7 @@ Train the model with fit(): Evaluate the Model -------------------- -Evaluate the model with ``.report()``: +Evaluate the model with :py:func:`~verticapy.machine_learning.vertica.ensemble.XGBClassifier.report`: .. code-block:: ipython @@ -189,16 +175,12 @@ To export and save the model as a JSON file, specify a filename: model.to_json("exported_xgb_model.json"); -Unlike Python XGBoost, Vertica does not store some information like -'sum_hessian' or 'loss_changes,' and the exported model from -``to_json()`` replaces this information with a list of zeroes -These information are replaced by a list filled with zeros. +Unlike Python XGBoost, Vertica does not store some information like 'sum_hessian' or 'loss_changes,' and the exported model from :py:func:`~verticapy.machine_learning.vertica.ensemble.XGBClassifier.to_json` replaces this information with a list of zeroes. These information are replaced by a list filled with zeros. Make Predictions with an Exported Model ---------------------------------------- -This exported model can be used with the Python XGBoost API right away, -and exported models make identical predictions in Vertica and Python: +This exported model can be used with the Python XGBoost API right away, and exported models make identical predictions in Vertica and Python: .. ipython:: python @@ -222,7 +204,7 @@ Clean the Example Environment Drop the 'xgb_to_json' schema, using CASCADE to drop any database objects stored inside (the 'titanic' table, the -:py:func:`verticapy.machine_learning.vertica.ensemble.XGBClassifier` +:py:func:`~verticapy.machine_learning.vertica.ensemble.XGBClassifier` model, etc.), then delete the 'exported_xgb_model.json' file: .. ipython:: python diff --git a/docs/source/user_guide_full_stack_train_test_split.rst b/docs/source/user_guide_full_stack_train_test_split.rst index b48908417..7474509df 100644 --- a/docs/source/user_guide_full_stack_train_test_split.rst +++ b/docs/source/user_guide_full_stack_train_test_split.rst @@ -6,7 +6,7 @@ Train Test Split Before you test a supervised model, you'll need separate, non-overlapping sets for training and testing. -In VerticaPy, the ``train_test_split`` method uses a random number generator to decide how to split the data. +In VerticaPy, the :py:func:`~verticapy.vDataFrame.train_test_split` method uses a random number generator to decide how to split the data. .. code-block:: ipython @@ -29,7 +29,7 @@ In VerticaPy, the ``train_test_split`` method uses a random number generator to .. raw:: html :file: SPHINX_DIRECTORY/figures/ug_fs_table_tts_1.html -The ``SEEDED_RANDOM`` function chooses a number in the interval ``[0,1)``. Since the seed is user-provided, these results are reproducible. In this example, passing '0' as the seed always returns the same value. +The ``SEEDED_RANDOM`` function chooses a number in the interval `[0,1)`. Since the seed is user-provided, these results are reproducible. In this example, passing '0' as the seed always returns the same value. .. code-block:: ipython @@ -67,7 +67,7 @@ A different seed will generate a different value. .. raw:: html :file: SPHINX_DIRECTORY/figures/ug_fs_table_tts_3.html -The ``train_test_split`` function generates a random seed and we can then share that seed between the training and testing sets. +The :py:func:`~verticapy.vDataFrame.train_test_split` function generates a random seed and we can then share that seed between the training and testing sets. .. ipython:: python @@ -88,7 +88,7 @@ The ``train_test_split`` function generates a random seed and we can then share test.shape() -Note that ``SEEDED_RANDOM`` depends on the order of your data. That is, if your data isn't sorted by a unique feature, the selected data might be inconsistent. To avoid this, we'll want to use the ``order_by`` parameter. +Note that ``SEEDED_RANDOM`` depends on the order of your data. That is, if your data isn't sorted by a unique feature, the selected data might be inconsistent. To avoid this, we'll want to use the `order_by` parameter. .. ipython:: python @@ -104,7 +104,7 @@ Let's create a model and evaluate it. model = LinearRegression() -When fitting the model with the ``fit`` function, you can use the parameter ``test_relation`` to score your data on a specific relation. +When fitting the model with the :py:func:`~verticapy.machine_learning.vertica.LinearRegression.fit` method, you can use the parameter `test_relation` to score your data on a specific relation. .. ipython:: python diff --git a/docs/source/user_guide_full_stack_vdataframe_magic.rst b/docs/source/user_guide_full_stack_vdataframe_magic.rst index 9aa388385..5d4df9248 100644 --- a/docs/source/user_guide_full_stack_vdataframe_magic.rst +++ b/docs/source/user_guide_full_stack_vdataframe_magic.rst @@ -4,7 +4,7 @@ The 'Magic' Methods of the vDataFrame ====================================== -VerticaPy 0.3.2 introduces the 'Magic' methods, which offer some additional flexilibility for mathematical operations in the ``vDataFrame``. These methods let you handle many operations in a 'pandas-like' or Pythonic style. +VerticaPy 0.3.2 introduces the 'Magic' methods, which offer some additional flexilibility for mathematical operations in the :py:mod:`vDataFrame`. These methods let you handle many operations in a 'pandas-like' or Pythonic style. .. code-block:: ipython @@ -239,7 +239,7 @@ You can now filter your data with conditional operators like and ('&'), or ('|') 'Pythonic' Invokation of Vertica Functions ------------------------------------------- -You can easily apply Vertica functions to your ``vDataFrame``. Here, we use Vertica's COALESCE function to impute the 'age' of the passengers in our dataset. +You can easily apply Vertica functions to your :py:mod:`vDataFrame`. Here, we use Vertica's COALESCE function to impute the 'age' of the passengers in our dataset. .. code-block:: ipython @@ -257,7 +257,7 @@ You can easily apply Vertica functions to your ``vDataFrame``. Here, we use Vert Slicing the vDataFrame ----------------------- -You can now slice the ``vDataFrame`` with indexing operators. +You can now slice the :py:mod:`vDataFrame` with indexing operators. .. code-block:: ipython diff --git a/docs/source/user_guide_introduction_best_practices.rst b/docs/source/user_guide_introduction_best_practices.rst index 401ecc02d..66cb6eb35 100644 --- a/docs/source/user_guide_introduction_best_practices.rst +++ b/docs/source/user_guide_introduction_best_practices.rst @@ -10,9 +10,9 @@ Restrict objects and operations to essential columns As VerticaPy is effectively an abstraction of SQL, any database-level optimizations you make in your Vertica database carry over to VerticaPy. In Vertica, optimization is centered on projections, which are collections of table columns—from one or more tables—stored on disk in a format that optimizes query execution. When you write queries in terms of the original tables, the query uses the projections to return query results. For details about creating and designing projections, see the Projections section in the Vertica documentation. -Projections are created and managed in the Vertica database, but you can leverage the power of projections in VerticaPy with features such as the `vDataFrame`'s usecols parameter, which specifies the columns from the input relation to include in the `vDataFrame`. As columnar databases perform better when there are fewer columns in the query, especially when you are working with large datasets, limiting `vDataFrame` and operations to essential columns can lead to a significant performance improvement. By default, most `vDataFrame` methods use all numerical columns in the `vDataFrame`, but you can restrict the operation to specific columns. +Projections are created and managed in the Vertica database, but you can leverage the power of projections in VerticaPy with features such as the :py:mod:`vDataFrame`'s usecols parameter, which specifies the columns from the input relation to include in the :py:mod:`vDataFrame`. As columnar databases perform better when there are fewer columns in the query, especially when you are working with large datasets, limiting :py:mod:`vDataFrame` and operations to essential columns can lead to a significant performance improvement. By default, most :py:mod:`vDataFrame` methods use all numerical columns in the :py:mod:`vDataFrame`, but you can restrict the operation to specific columns. -In the following examples, we'll demonstrate how to create a `vDataFrame` from specific columns in the input relation, and then run methods on that `vDataFrame`. First, load the titanic dataset into Vertica using the :py:func:`~verticapy.datasets.load_titanic` function: +In the following examples, we'll demonstrate how to create a `vDataFrame` from specific columns in the input relation, and then run methods on that :py:mod:`vDataFrame`. First, load the titanic dataset into Vertica using the :py:func:`~verticapy.datasets.load_titanic` function: .. code-block:: python @@ -78,7 +78,7 @@ To turn off the SQL code generation option: # Turning off SQL. vp.set_option("sql_on", False) -To restrict the operation to specific columns in the ``vDataFrame``, provide the column names in the `columns` parameter: +To restrict the operation to specific columns in the :py:mod:`vDataFrame`, provide the column names in the `columns` parameter: .. code-block:: python @@ -105,7 +105,7 @@ Instead of specifying essential columns to include, some methods allow you to li .. note:: - To list all columns in a ``vDataFrame``, including non-numerical columns, use the :py:func:`~verticapy.vDataFrame.get_columns` method. + To list all columns in a :py:mod:`vDataFrame`, including non-numerical columns, use the :py:func:`~verticapy.vDataFrame.get_columns` method. You can then use this truncated list of columns in another method call; for instance, to compute a correlation matrix: @@ -126,12 +126,12 @@ You can then use this truncated list of columns in another method call; for inst Save the current relation -------------------------- -The ``vDataFrame`` works like a `view`, a stored query that encapsulates one or more SELECT statements. +The :py:mod:`vDataFrame` works like a `view`, a stored query that encapsulates one or more SELECT statements. If the generated relation uses many different functions, the computation time for each method call is greatly increased. Small transformations don't drastically slow down computation, but heavy transformations (multiple joins, frequent use of advanced analytical funcions, moving windows, etc.) can result in noticeable slowdown. When performing computationally expensive operations, you can aid performance by saving the vDataFrame structure as a table in the Vertica database. We will demonstrate this process in the following example. -First, create a ``vDataFrame``, then perform some operations on that `vDataFrame`: +First, create a :py:mod:`vDataFrame`, then perform some operations on that :py:mod:`vDataFrame`: .. code-block:: python @@ -162,13 +162,13 @@ To understand how Vertica executes the different aggregations in the above relat Looking at the plan and its associated relation, it's clear that the transformations we applied to the vDataFrame result in a complicated relation. -Each method call to the ``vDataFrame`` must use this relation for computation. +Each method call to the :py:mod:`vDataFrame` must use this relation for computation. .. note:: To better understand your queries, check out the :ref:`~verticapy.performance.vertica.qprof.QueryProfiler` function. -To save the relation as a table in the Vertica and replace the current relation in VerticaPy with the new table relation, use the ``to_db()`` method with the `inplace` parameter set to True: +To save the relation as a table in the Vertica and replace the current relation in VerticaPy with the new table relation, use the :py:func:`~verticapy.vDataFrame.to_db` method with the `inplace` parameter set to True: .. code-block:: python @@ -203,7 +203,7 @@ When dealing with very large datasets, it's best to take caution before saving r Use the help function ---------------------- -For a quick and convenient way to view information about an object or function, use the `help()` function: +For a quick and convenient way to view information about an object or function, use the :py:func:`help` function: .. ipython:: python @@ -406,13 +406,13 @@ To monitor how VerticaPy is computing the aggregations, use the :py:func:`~verti VerticaPy allows you to send multiple queries, either iteratively or concurrently, to the database when computing aggregations. -First, let's send a single query to compute the average for all columns in the ``vDataFrame``: +First, let's send a single query to compute the average for all columns in the :py:mod:`vDataFrame`: .. ipython:: python display(vdf.avg(ncols_block = 20)) -We see that there was one SELECT query for all columns in the `vDataFrame`. +We see that there was one SELECT query for all columns in the :py:mod:`vDataFrame`. You can reduce the impact on the system by using the `ncols_block` parameter to split the computation into multiple iterative queries, where the value of the parameter is the number of columns included in each query. For example, setting `ncols_block` to 5 will split the computation, which consists of 20 total columns, into 4 separate queries, each of which computes the average for 5 columns: diff --git a/docs/source/user_guide_introduction_installation.rst b/docs/source/user_guide_introduction_installation.rst index 8a07f51b9..780922d45 100644 --- a/docs/source/user_guide_introduction_installation.rst +++ b/docs/source/user_guide_introduction_installation.rst @@ -15,12 +15,12 @@ Before connecting to a database, you must satisfy the following requirements: - Install VerticaPy on your machine For more information about these installations, -see :ref:`gettting_started`. +see :ref:`getting_started`. Connect to a DB ---------------- -To connect to a database for the first time, use the :py:func:`verticapy.new_connection` function, replacing the configuration values with the credentials for your database: +To connect to a database for the first time, use the :py:func:`~verticapy.new_connection` function, replacing the configuration values with the credentials for your database: .. code-block:: python @@ -43,13 +43,13 @@ To connect to a database for the first time, use the :py:func:`verticapy.new_con import verticapy as vp The connection is saved to the VerticaPy connection file under the name specified in the name parameter. To reconnect to -the database using this connection, run the :py:func:`verticapy.connect` function with the name of the connection as the argument value: +the database using this connection, run the :py:func:`~verticapy.connect` function with the name of the connection as the argument value: .. code-block:: python vp.connect("Vertica_Connection") -To view all available connections, use :py:func:`verticapy.available_connection`. +To view all available connections, use :py:func:`~verticapy.available_connection`. .. code-block:: python @@ -63,7 +63,7 @@ If you need to confirm the parameters for a given function, you can also use the help(vp.new_connection) -For an interactive start guide, you can use the ``help_start()`` function: +For an interactive start guide, you can use the :py:func:`~verticapy.help_start()` function: .. code-block:: python diff --git a/docs/source/user_guide_machine_learning_introduction.rst b/docs/source/user_guide_machine_learning_introduction.rst index d1b33b125..6f95449c4 100644 --- a/docs/source/user_guide_machine_learning_introduction.rst +++ b/docs/source/user_guide_machine_learning_introduction.rst @@ -100,7 +100,7 @@ When we have more than two categories, we use the expression 'Multiclass Classif Unsupervised Learning ---------------------- -These algorithms are to used to segment the data (``k-means``, ``DBSCAN``, etc.) or to detect anomalies (``Local Outlier Factor``, ``Z-Score`` Techniques...). In particular, they're useful for finding patterns in data without labels. For example, let's use a k-means algorithm to create different clusters on the Iris dataset. Each cluster will represent a flower's species. +These algorithms are to used to segment the data (``k-means``, :py:func:`~verticapy.machine_learning.vertica.DBSCAN`, etc.) or to detect anomalies (:py:func:`~verticapy.machine_learning.vertica.LocalOutlierFactor`, ``Z-Score`` Techniques...). In particular, they're useful for finding patterns in data without labels. For example, let's use a k-means algorithm to create different clusters on the Iris dataset. Each cluster will represent a flower's species. .. code-block:: python diff --git a/docs/source/user_guide_machine_learning_model_tracking.rst b/docs/source/user_guide_machine_learning_model_tracking.rst index bdb7a3e83..aedd51034 100644 --- a/docs/source/user_guide_machine_learning_model_tracking.rst +++ b/docs/source/user_guide_machine_learning_model_tracking.rst @@ -27,7 +27,7 @@ The following example demonstrates how the model tracking feature can be used fo predictors = ["age", "fare", "pclass"] response = "survived" -We then define a ``vExperiment`` object to track the candidate models. To define the experiment object, specify the following parameters: +We then define a :py:func:`~verticapy.mlops.model_tracking.vExperiment` object to track the candidate models. To define the experiment object, specify the following parameters: - experiment_name: The name of the experiment. - test_relation: Relation or vDF to use to test the model. @@ -104,7 +104,7 @@ So far we have only added three models to the experiment, but we could add many top_model = my_experiment_1.load_best_model(metric = "auc") The experiment object facilitates not only model tracking but also makes cleanup super easy, especially in real-world -scenarios where there is often a large number of leftover models. The ``drop`` method drops from the database the info of the experiment and all associated models other than those specified in the keeping_models list. +scenarios where there is often a large number of leftover models. The :py:func:`~verticapy.machine_learning.vertica.LogisticRegression.drop` method drops from the database the info of the experiment and all associated models other than those specified in the keeping_models list. .. ipython:: python :okwarning: @@ -112,7 +112,7 @@ scenarios where there is often a large number of leftover models. The ``drop`` m my_experiment_1.drop(keeping_models=[top_model.model_name]) Experiments are also helpful for performing grid search on hyper-parameters. The following example shows how they can -be used to study the impact of the max_iter parameter on the prediction performance of ``LogisticRegression`` models. +be used to study the impact of the max_iter parameter on the prediction performance of :py:func:`~verticapy.machine_learning.vertica.LogisticRegression` models. .. ipython:: python :okwarning: @@ -150,9 +150,9 @@ To showcase model versioning, we will begin by registering the ``top_model`` pic top_model.register("top_model_demo") -When the model owner registers the model, its ownership changes to ``DBADMIN``, and the previous owner receives ``USAGE`` privileges. Registered models are referred to by their registered_name and version. Only DBADMIN or a user with the MLSUPERVISOR role can change the status of a registered model. We have provided the ``RegisteredModel`` class in VerticaPy for working with registered models. +When the model owner registers the model, its ownership changes to ``DBADMIN``, and the previous owner receives ``USAGE`` privileges. Registered models are referred to by their registered_name and version. Only DBADMIN or a user with the MLSUPERVISOR role can change the status of a registered model. We have provided the :py:func:`~verticapy.mlops.model_versioning.RegisteredModel` class in VerticaPy for working with registered models. -We will now make a ``RegisteredModel`` object for our recently registered model and change its status to "production". We can then use the registered model for scoring. +We will now make a :py:func:`~verticapy.mlops.model_versioning.RegisteredModel` object for our recently registered model and change its status to "production". We can then use the registered model for scoring. .. ipython:: python diff --git a/docs/source/user_guide_machine_learning_time_series.rst b/docs/source/user_guide_machine_learning_time_series.rst index 55ad5155c..0dfddf733 100644 --- a/docs/source/user_guide_machine_learning_time_series.rst +++ b/docs/source/user_guide_machine_learning_time_series.rst @@ -83,7 +83,7 @@ To help visualize the seasonality of forest fires, we'll draw some autocorrelati Forest fires follow a predictable, seasonal pattern, so it should be easy to predict future forest fires with past data. -VerticaPy offers several models, including a multiple time series model. For this example, let's use a ARIMA model. +VerticaPy offers several models, including a multiple time series model. For this example, let's use a :py:func:`~verticapy.machine_learning.vertica.ARIMA` model. .. ipython:: python @@ -96,7 +96,7 @@ VerticaPy offers several models, including a multiple time series model. For thi ts = "date", ) -Just like with other regression models, we'll evaluate our model with the ``report()`` method. +Just like with other regression models, we'll evaluate our model with the :py:func:`~verticapy.machine_learning.vertica.ARIMA.report` method. .. code-block::