corrections 2

vertica · Oct 23, 2024 · d593c2a · d593c2a
1 parent 512900d
commit d593c2a
Showing 42 changed files with 151 additions and 185 deletions.
diff --git a/docs/source/examples.rst b/docs/source/examples.rst
@@ -1,12 +1,9 @@
 .. _examples:
 
-
 ============
 Examples
 ============
 
-
-
 .. grid:: 1 1 2 2
 
     .. grid-item::

diff --git a/docs/source/examples_business_africa_education.rst b/docs/source/examples_business_africa_education.rst
@@ -260,7 +260,7 @@ Eight seems to be a suitable number of clusters. Let's compute a ``k-means`` mod
     model = KMeans(n_cluster = 8)
     model.fit(africa, X = ["lon", "lat"])
 
-We can add the prediction to the ``vDataFrame`` and draw the scatter map.
+We can add the prediction to the :py:mod:`vDataFrame` and draw the scatter map.
 
 
 .. code-block:: python
@@ -501,7 +501,7 @@ Let's look at the feature importance for each model.
 
 Feature importance between between math score and the reading score are almost identical.
 
-We can add these predictions to the main ``vDataFrame``.
+We can add these predictions to the main :py:mod:`vDataFrame`.
 
 .. code-block:: python
 

diff --git a/docs/source/examples_business_battery.rst b/docs/source/examples_business_battery.rst
@@ -20,11 +20,7 @@ Dataset
 ++++++++
 
 In this example of **predictive maintenance**, we propose a data-driven method 
-to estimate the health of a battery using the 
-`Li-ion battery dataset <https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/>`_ 
-released by NASA.
-
-
+to estimate the health of a battery using the `Li-ion battery dataset <https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/>`_ released by NASA.
 
 This dataset includes information on Li-ion batteries over several charge 
 and discharge cycles at room temperature. Charging was at a constant current 
@@ -87,8 +83,7 @@ Let us now ingest the data.
 Understanding the Data
 -----------------------
 
-Let's examine our data. Here, we use `vDataFrame.head()`
-to retrieve the first five rows of the dataset.
+Let's examine our data. Here, we use :py:func:`~verticapy.vDataFrame.head` to retrieve the first five rows of the dataset.
 
 .. ipython:: python
     :suppress:
@@ -103,7 +98,7 @@ to retrieve the first five rows of the dataset.
     :file: /project/data/VerticaPy/docs/figures/examples_battery_table_head.html
 
 
-Let's perform a few aggregations with `vDataFrame.describe()` to get a high-level overview of the dataset.
+Let's perform a few aggregations with :py:func:`~verticapy.vDataFrame.describe` to get a high-level overview of the dataset.
 
 
 .. code-block:: python
@@ -567,12 +562,9 @@ and the time needed to reach minimum voltage and maximum temperature.
 Machine Learning
 -----------------
 
+AutoML tests several models and returns input scores for each. We can use this to find the best model for our dataset.
 
-AutoML tests several models and returns input 
-scores for each. We can use this to find the best model for our dataset.
-
-.. note:: We are only using the three algorithms, but you can change the `estiamtor` parameter to try all the 'native' algorithms.
-    ``estiamtor = 'native' ``
+.. note:: We are only using the three algorithms, but you can change the `estimator` parameter to try all the 'native' algorithms: ``estimator = 'native' ``.
 
 .. code-block:: python
 

diff --git a/docs/source/examples_business_booking.rst b/docs/source/examples_business_booking.rst
@@ -77,7 +77,7 @@ Data Exploration and Preparation
 
 Sessionization is the process of gathering clicks for a certain period of time. We usually consider that after 30 minutes of inactivity, the user session ends (``date_time - lag(date_time) > 30 minutes``). For these kinds of use cases, aggregating sessions with meaningful statistics is the key for making accurate predictions.
 
-We start by using the ``sessionize`` method to create the variable 'session_id'. We can then use this variable to aggregate the data.
+We start by using the :py:func:`~verticapy.vDataFrame.sessionize` method to create the variable 'session_id'. We can then use this variable to aggregate the data.
 
 .. code-block:: python
 
@@ -234,7 +234,7 @@ We can see huge links between some of the variables ('mode_hotel_cluster_count'
 Machine Learning
 -----------------
 
-Let's create our ``LogisticRegression`` model.
+Let's create our :py:func:`~verticapy.machine_learning.vertica.LogisticRegression` model.
 
 .. ipython:: python
 
@@ -279,7 +279,7 @@ It looks like there are two main predictors: 'mode_hotel_cluster_count' and 'tri
 - look for a shorter trip duration.
 - not click as much (spend more time at the same web page).
 
-Let's add our prediction to the ``vDataFrame``.
+Let's add our prediction to the :py:mod:`vDataFrame`.
 
 .. code-block:: python
 
@@ -304,7 +304,7 @@ Let's add our prediction to the ``vDataFrame``.
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_expedia_predict_proba_1.html
 
-While analyzing the following boxplot (prediction partitioned by 'is_booking'), we can notice that the ``cutoff`` is around 0.22 because most of the positive predictions have a probability between 0.23 and 0.5. Most of the negative predictions are between 0.05 and 0.2.
+While analyzing the following boxplot (prediction partitioned by 'is_booking'), we can notice that the `cutoff` is around 0.22 because most of the positive predictions have a probability between 0.23 and 0.5. Most of the negative predictions are between 0.05 and 0.2.
 
 .. code-block:: python
 
@@ -320,13 +320,13 @@ While analyzing the following boxplot (prediction partitioned by 'is_booking'),
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_expedia_predict_boxplot_1.html
 
-Let's confirm our hypothesis by computing the best ``cutoff``.
+Let's confirm our hypothesis by computing the best `cutoff`.
 
 .. ipython:: python
 
     model_logit.score(metric = "best_cutoff")
 
-Let's look at the efficiency of our model with a cutoff of ``0.22``.
+Let's look at the efficiency of our model with a cutoff of 0.22.
 
 .. code-block:: python
 

diff --git a/docs/source/examples_business_churn.rst b/docs/source/examples_business_churn.rst
@@ -203,7 +203,7 @@ ________
 Machine Learning
 -----------------
 
-``LogisticRegression`` is a very powerful algorithm and we can use it to detect churns. Let's split our ``vDataFrame`` into training and testing set to evaluate our model.
+:py:func:`~verticapy.machine_learning.vertica.LogisticRegression` is a very powerful algorithm and we can use it to detect churns. Let's split our :py:mod:`vDataFrame` into training and testing set to evaluate our model.
 
 .. ipython:: python
 

diff --git a/docs/source/examples_business_credit_card_fraud.rst b/docs/source/examples_business_credit_card_fraud.rst
@@ -328,7 +328,7 @@ We will split the dataset into a train (day 1) and a test (day 2).
 
 Supervising would make this pretty easy since it would just be a binary classification problem. We can use different algorithms to optimize the prediction. Our dataset is unbalanced, so the AUC might be a good metric to evaluate the model. The PRC AUC would also be a relevant metric.
 
-``LogisticRegression`` works well with monotonic relationships. Since we have a lot of independent features that correlate with the response, it should be a good first model to use.
+:py:func:`~verticapy.machine_learning.vertica.LogisticRegression` works well with monotonic relationships. Since we have a lot of independent features that correlate with the response, it should be a good first model to use.
 
 .. code-block:: python
 
@@ -398,7 +398,7 @@ Due to the complexity of the computations, anomalies are difficult to detect in
 
 - **Machine Learning:** We need to use easily-deployable algorithms to perform real-time fraud detection. Isolation forests and ``k-means`` can be easily deployed and they work well for detecting anomalies.
 - **Rules & Thresholds:** The z-score can be an efficient solution for detecting global outliers.
-- **Decomposition:** Robust ``PCA`` is another technique for detecting outliers.
+- **Decomposition:** Robust :py:func:`~verticapy.machine_learning.vertica.PCA` is another technique for detecting outliers.
 
 Before using these techniques, let's draw some scatter plots to get a better idea of what kind of anomalies we can expect.
 
@@ -453,7 +453,7 @@ For the rest of this example, we'll investigate labels and how they can help us
 
 We begin by examining ``k-means`` clustering, which partitions the data into k clusters.
 
-We can use an elbow curve to find a suitable number of clusters. We can then add more clusters then the amount suggested by the ``elbow`` curve to create clusters mainly composed of anomalies. Clusters with relatively fewer elements can then be investigated by an expert to label the anomalies.
+We can use an elbow curve to find a suitable number of clusters. We can then add more clusters then the amount suggested by the :py:func:`~verticapy.machine_learning.model_selection.elbow` curve to create clusters mainly composed of anomalies. Clusters with relatively fewer elements can then be investigated by an expert to label the anomalies.
 
 From there, we perform the following procedure:
 
@@ -535,7 +535,7 @@ Notice that clusters with fewer elemenets tend to contain much more fraudulent e
 
 **Outliers of the distribution**
 
-Let's use the ``z-score`` to detect global outliers of the distribution.
+Let's use the ``Z-score`` to detect global outliers of the distribution.
 
 .. code-block:: python
 
@@ -635,7 +635,7 @@ We can catch outliers with a neighbors score. Again, the main problem with these
 
 **Other Techniques**
 
-Other scalable techniques that can solve this problem are robust ``PCA`` and isolation forest.
+Other scalable techniques that can solve this problem are robust :py:func:`~verticapy.machine_learning.vertica.PCA` and isolation forest.
 
 Conclusion
 -----------

diff --git a/docs/source/examples_business_football.rst b/docs/source/examples_business_football.rst
@@ -903,7 +903,7 @@ Let's export the result to our Vertica database.
 Team Rankings with k-means
 ---------------------------
 
-To compute a ``k-means`` model, we need to find a value for 'k'. Let's draw an ``elbow`` curve to find a suitable number of clusters.
+To compute a ``k-means`` model, we need to find a value for 'k'. Let's draw an :py:func:`~verticapy.machine_learning.model_selection.elbow` curve to find a suitable number of clusters.
 
 .. code-block:: python
 
@@ -975,7 +975,7 @@ To compute a ``k-means`` model, we need to find a value for 'k'. Let's draw an `
     model_kmeans.fit("football_clustering", predictors)
     model_kmeans.clusters_
 
-Let's add the prediction to the ``vDataFrame``.
+Let's add the prediction to the :py:mod:`vDataFrame`.
 
 .. code-block:: python
 
@@ -1974,7 +1974,7 @@ Looking at the importance of each feature, it seems like direct confrontations a
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_football_features_importance.html
 
-Let's add the predictions to the ``vDataFrame``.
+Let's add the predictions to the :py:mod:`vDataFrame`.
 
 Draws are pretty rare, so we'll only consider them if a tie was very likely to occur.
 

diff --git a/docs/source/examples_business_insurance.rst b/docs/source/examples_business_insurance.rst
@@ -38,7 +38,7 @@ You can skip the below cell if you already have an established connection.
     
     vp.connect("VerticaDSN")
 
-Let's create a new schema and assign the data to a ``vDataFrame`` object.
+Let's create a new schema and assign the data to a :py:mod:`vDataFrame` object.
 
 .. code-block:: ipython
 

diff --git a/docs/source/examples_business_movies.rst b/docs/source/examples_business_movies.rst
@@ -43,7 +43,7 @@ You can skip the below cell if you already have an established connection.
     
     vp.connect("VerticaDSN")
 
-Let's  create a new schema and assign the data to a ``vDataFrame`` object.
+Let's  create a new schema and assign the data to a :py:mod:`vDataFrame` object.
 
 .. code-block:: ipython
 
@@ -349,7 +349,7 @@ Let's join our notoriety metrics for actors and directors with the main dataset.
         ],
     )
 
-As we did many operation, it can be nice to save the ``vDataFrame`` as a table in the Vertica database.
+As we did many operation, it can be nice to save the :py:mod:`vDataFrame` as a table in the Vertica database.
 
 .. code-block:: python
 
@@ -754,7 +754,7 @@ Let's create a model to evaluate an unbiased score for each different movie.
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_movies_filmtv_complete_model_report.html
 
-The model is good. Let's add it in our ``vDataFrame``.
+The model is good. Let's add it in our :py:mod:`vDataFrame`.
 
 .. code-block:: python
 
@@ -871,7 +871,7 @@ Since ``k-means`` clustering is sensitive to unnormalized data, let's normalize
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_movies_filmtv_normalize_minmax.html
 
-Let's compute the ``elbow`` curve to find a suitable number of clusters.
+Let's compute the :py:func:`~verticapy.machine_learning.model_selection.elbow` curve to find a suitable number of clusters.
 
 .. ipython:: python
 
@@ -926,7 +926,7 @@ By looking at the elbow curve, we can choose 15 clusters. Let's create a ``k-mea
     model_kmeans.fit(filmtv_movies_complete, predictors)
     model_kmeans.clusters_
 
-Let's add the clusters in the ``vDataFrame``.
+Let's add the clusters in the :py:mod:`vDataFrame`.
 
 
 .. code-block:: python

diff --git a/docs/source/examples_business_smart_meters.rst b/docs/source/examples_business_smart_meters.rst
@@ -44,7 +44,7 @@ You can skip the below cell if you already have an established connection.
     
     vp.connect("VerticaDSN")
 
-Create the ``vDataFrames`` of the datasets:
+Create the :py:mod:`vDataFrame` of the datasets:
 
 .. code-block:: python
 
@@ -217,7 +217,7 @@ The dataset 'sm_meters' is pretty important. In particular, the type of residenc
     :width: 100%
     :align: center
 
-Based on the scatter plot, five seems like the optimal number of clusters. Let's verify this hypothesis using an ``elbow`` curve.
+Based on the scatter plot, five seems like the optimal number of clusters. Let's verify this hypothesis using an :py:func:`~verticapy.machine_learning.model_selection.elbow` curve.
 
 .. code-block:: python
 

diff --git a/docs/source/examples_business_spam.rst b/docs/source/examples_business_spam.rst
@@ -106,7 +106,7 @@ Let's compute some statistics using the length of the message.
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_spam_table_describe.html
 
-**Notice:** spam tends to be longer than a normal message. First, let's create a view with just spam. Then, we'll use the ``CountVectorizer`` to create a dictionary and identify keywords.
+**Notice:** spam tends to be longer than a normal message. First, let's create a view with just spam. Then, we'll use the :py:func:`~verticapy.machine_learning.vertica.CountVectorizer` to create a dictionary and identify keywords.
 
 .. code-block:: python
 
@@ -138,7 +138,7 @@ Let's compute some statistics using the length of the message.
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_spam_table_clean_2.html
 
-Let's add the most occurent words in our ``vDataFrame`` and compute the correlation vector.
+Let's add the most occurent words in our :py:mod:`vDataFrame` and compute the correlation vector.
 
 .. code-block:: python
 

diff --git a/docs/source/examples_business_spotify.rst b/docs/source/examples_business_spotify.rst
@@ -88,7 +88,7 @@ Create a new schema, "spotify".
 Data Loading
 -------------
 
-Load the datasets into the ``vDataFrame`` with ``read_csv()`` and then view them with ``display()``.
+Load the datasets into the :py:mod:`vDataFrame` with :py:func:`~verticapy.read_csv` and then view them with :py:func:`~verticapy.vDataFrame.head`.
 
 .. code-block::
 
@@ -521,14 +521,14 @@ Define a list of predictors and the response, and then save the normalized versi
 Machine Learning
 -----------------
 
-We can use ``AutoML`` to easily get a well-performing model.
+We can use :py:func:`~verticapy.machine_learning.vertica.automl.AutoML` to easily get a well-performing model.
 
 .. ipython:: python
 
     # define a random seed so models tested by AutoML produce consistent results
     vp.set_option("random_state", 2)
 
-``AutoML`` automatically tests several machine learning models and picks the best performing one.
+:py:func:`~verticapy.machine_learning.vertica.automl.AutoML` automatically tests several machine learning models and picks the best performing one.
 
 .. ipython:: python
     :okwarning:
@@ -569,7 +569,7 @@ Train the model.
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_spotify_automl_plot.html
 
-Extract the best model according to ``AutoML``. From here, we can look at the model type and its hyperparameters.
+Extract the best model according to :py:func:`~verticapy.machine_learning.vertica.automl.AutoML`. From here, we can look at the model type and its hyperparameters.
 
 .. ipython:: python
 
@@ -581,7 +581,7 @@ Extract the best model according to ``AutoML``. From here, we can look at the mo
     print(bm_type)
     print(hyperparams)
 
-Thanks to ``AutoML``, we know best model type and its hyperparameters. Let's create a new model with this information in mind.
+Thanks to :py:func:`~verticapy.machine_learning.vertica.automl.AutoML`, we know best model type and its hyperparameters. Let's create a new model with this information in mind.
 
 .. code-block:: 
 
@@ -797,7 +797,7 @@ Let's start by taking the averages of these numerical features for each artist.
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_spotify_artists_features.html
 
-Grouping means clustering, so we use an ``elbow`` curve to find a suitable number of clusters.
+Grouping means clustering, so we use an :py:func:`~verticapy.machine_learning.model_selection.elbow` curve to find a suitable number of clusters.
 
 .. ipython:: python
     :okwarning:
@@ -915,4 +915,4 @@ Let's see how our model groups these artists together:
 Conclusion
 -----------
 
-We were able to predict the popularity Polish songs with a ``RandomForestRegressor`` model suggested by ``AutoML``. We then created a ``k-means`` model to group artists into "genres" (clusters) based on the feature-commonalities in their tracks.
+We were able to predict the popularity Polish songs with a :py:func:`~verticapy.machine_learning.vertica.RandomForestRegressor` model suggested by :py:func:`~verticapy.machine_learning.vertica.automl.AutoML`. We then created a ``k-means`` model to group artists into "genres" (clusters) based on the feature-commonalities in their tracks.
diff --git a/docs/source/examples_learn_commodities.rst b/docs/source/examples_learn_commodities.rst
@@ -320,12 +320,12 @@ Moving on to the correlation matrix, we can see many events that changed drastic
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_commodities_table_corr_2.html
 
-We can see strong correlations between most of the variables. A vector autoregression (``VAR``) model seems ideal.
+We can see strong correlations between most of the variables. A vector autoregression (:py:func:`~verticapy.machine_learning.vertica.VAR`) model seems ideal.
 
 Machine Learning
 -----------------
 
-Let's create the ``VAR`` model to predict the value of various commodities.
+Let's create the :py:func:`~verticapy.machine_learning.vertica.VAR` model to predict the value of various commodities.
 
 .. code-block:: python
 
@@ -441,7 +441,7 @@ Our model is excellent. Let's predict the values these commodities in the near f
 .. raw:: html
     :file: /project/data/VerticaPy/docs/figures/examples_commodities_table_pred_plot_4.html
 
-The model performs well but may be somewhat unstable. To improve it, we could apply data preparation techniques, such as seasonal decomposition, before building the ``VAR`` model.
+The model performs well but may be somewhat unstable. To improve it, we could apply data preparation techniques, such as seasonal decomposition, before building the :py:func:`~verticapy.machine_learning.vertica.VAR` model.
 
 Conclusion
 -----------