Skip to content

Commit

Permalink
first correction
Browse files Browse the repository at this point in the history
  • Loading branch information
oualib committed Oct 23, 2024
1 parent c6965ee commit 512900d
Show file tree
Hide file tree
Showing 6 changed files with 217 additions and 383 deletions.
32 changes: 15 additions & 17 deletions docs/source/user_guide_data_exploration_charts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,16 @@ Let's start with pies and histograms. Drawing the pie or histogram of a categori

.. code-block::
# Setting the plotting lib
vp.set_option("plotting_lib", "highcharts")
titanic = load_titanic()
titanic["pclass"].bar()
.. ipython:: python
:suppress:
# Setting the plotting lib
vp.set_option("plotting_lib", "highcharts")
titanic = load_titanic()
fig = titanic["pclass"].bar()
Expand Down Expand Up @@ -141,28 +144,28 @@ You can also change the occurences by another aggregation with the `method` and
.. raw:: html
:file: /project/data/VerticaPy/docs/figures/user_guides_data_exploration_titanic_age_hist_avs.html


VerticaPy uses the same process for other graphics, like 2-dimensional histograms and bar charts.

Let us showcase another plotting library for these plots.


.. code-block::
# Setting the plotting lib
vp.set_option("plotting_lib", "plotly")
titanic.bar(["pclass", "survived"])
.. ipython:: python
:suppress:
# Setting the plotting lib
vp.set_option("plotting_lib", "plotly")
fig = titanic.bar(["pclass", "survived"])
fig.write_html("/project/data/VerticaPy/docs/figures/user_guides_data_exploration_titanic_bar_pclass_surv.html")
.. raw:: html
:file: /project/data/VerticaPy/docs/figures/user_guides_data_exploration_titanic_bar_pclass_surv.html


.. note:: VerticaPy has three main plotting libraries. Look at :ref:`chart_gallery` section for all the different plots.

.. code-block::
Expand Down Expand Up @@ -242,7 +245,7 @@ Box plots are useful for understanding statistical dispersion.
.. raw:: html
:file: /project/data/VerticaPy/docs/figures/user_guides_data_exploration_titanic_boxplot_one.html

Scatter and bubble plots are also useful for identifying patterns in your data. Note, however, that these methods don't use aggregations; VerticaPy downsamples the data before plotting. You can use the 'max_nb_points' to limit the number of points and avoid unnecessary memory usage.
Scatter and bubble plots are also useful for identifying patterns in your data. Note, however, that these methods don't use aggregations; VerticaPy downsamples the data before plotting. You can use the `max_nb_points` to limit the number of points and avoid unnecessary memory usage.

.. code-block::
Expand Down Expand Up @@ -323,8 +326,10 @@ For more information on scatter look at :py:mod:`verticapy.vDataFrame.scatter`.
Hexbin plots can be useful for generating heatmaps. These summarize data in a similar way to scatter plots, but compute aggregations to get the final results.

.. ipython:: python
# Setting the plotting lib
vp.set_option("plotting_lib", "matplotlib")
@savefig user_guides_data_exploration_iris_hexbin.png
iris.hexbin(
["SepalLengthCm", "SepalWidthCm"],
Expand All @@ -337,6 +342,7 @@ Hexbin, scatter, and bubble plots also allow you to provide a background image.
.. code-block:: python
africa = load_africa_education()
# displaying avg students score in Africa
africa.hexbin(
["lon", "lat"],
Expand All @@ -349,6 +355,7 @@ Hexbin, scatter, and bubble plots also allow you to provide a background image.
:suppress:
africa = load_africa_education()
# displaying avg students score in Africa
@savefig user_guides_data_exploration_africa_hexbin.png
africa.hexbin(
Expand All @@ -360,17 +367,6 @@ Hexbin, scatter, and bubble plots also allow you to provide a background image.
It is also possible to use SHP datasets to draw maps.

.. code-block:: python
africa = load_africa_education()
# displaying avg students score in Africa
africa.hexbin(
["lon", "lat"],
method = "avg",
of = "zralocp",
img = "img/africa.png",
)
.. ipython:: python
# Africa Dataset
Expand Down Expand Up @@ -412,7 +408,9 @@ Since time-series plots do not aggregate the data, it's important to choose the
:suppress:
:okwarning:
# Setting the plotting lib
vp.set_option("plotting_lib", "plotly")
fig = amazon["number"].plot(
ts = "date",
by = "state",
Expand Down
20 changes: 15 additions & 5 deletions docs/source/user_guide_data_exploration_descriptive_statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The :py:func:`~verticapy.vDataFrame.agg` method is the best way to compute multi
help(vp.vDataFrame.agg)
This is a tremendously useful function for understanding your data.
Let's use the `churn dataset <https://github.com/vertica/VerticaPy/tree/master/docs/source/notebooks/data_exploration/correlations/data>`_
Let's use the `churn dataset <https://github.com/vertica/VerticaPy/blob/master/examples/business/churn/customers.csv>`_

.. code-block::
Expand Down Expand Up @@ -122,7 +122,9 @@ You can also use the 'groupby' method to compute customized aggregations.
"gender",
"Contract",
],
["AVG(DECODE(Churn, 'Yes', 1, 0)) AS Churn"],
[
"AVG(DECODE(Churn, 'Yes', 1, 0)) AS Churn",
],
)
.. ipython:: python
Expand All @@ -133,7 +135,9 @@ You can also use the 'groupby' method to compute customized aggregations.
"gender",
"Contract",
],
["AVG(DECODE(Churn, 'Yes', 1, 0)) AS Churn"],
[
"AVG(DECODE(Churn, 'Yes', 1, 0)) AS Churn",
],
)
html_file = open("/project/data/VerticaPy/docs/figures/user_guides_data_exploration_descriptive_stats_group_by.html", "w")
html_file.write(res._repr_html_())
Expand All @@ -148,7 +152,10 @@ You can also use the 'groupby' method to compute customized aggregations.
import verticapy.sql.functions as fun
vdf.groupby(
["gender", "Contract"],
[
"gender",
"Contract",
],
[
fun.min(vdf["tenure"])._as("min_tenure"),
fun.max(vdf["tenure"])._as("max_tenure"),
Expand All @@ -161,7 +168,10 @@ You can also use the 'groupby' method to compute customized aggregations.
import verticapy.sql.functions as fun
res = vdf.groupby(
["gender", "Contract"],
[
"gender",
"Contract",
],
[
fun.min(vdf["tenure"])._as("min_tenure"),
fun.max(vdf["tenure"])._as("max_tenure"),
Expand Down
17 changes: 7 additions & 10 deletions docs/source/user_guide_data_ingestion.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ To ingest the file into Vertica, remove the `genSQL` parameter from the above co
:file: /project/data/VerticaPy/docs/figures/user_guide_data_ingestion_iris.html

When the file to ingest is not located on your local machine, and is on the server instead, then you must set the `ingest_local` parameter to False.

`ingest_local` is True by default.

.. note:: In some cases where the CSV file has a very complex structure, local ingestion might fail. If this occurs, you will have to move the file into the database and then ingest the file from that location.
Expand All @@ -130,8 +131,7 @@ syntax in the path parameter (in this case for multiple CSV files): `path = "pat
Ingest CSV files
----------------

In addition to :py:func:`~verticapy.read_file`, you can also ingest CSV files with the :py:func:`~verticapy.read_csv` function,
which ingests the file using flex tables. This function provides options not available in :py:func:`~verticapy.read_file`, such as:
In addition to :py:func:`~verticapy.read_file`, you can also ingest CSV files with the :py:func:`~verticapy.read_csv` function, which ingests the file using flex tables. This function provides options not available in :py:func:`~verticapy.read_file`, such as:

- `sep`: specify the column separator.
- `parse_nrows`: the function creates a file of nrows from the data file to identify
Expand All @@ -140,18 +140,15 @@ the data types. This file is then dropped and the entire data file is ingested.

For a full list of supported options, see :py:func:`~verticapy.read_csv` or use the :py:func:`~verticapy.help` function.

In the following example, we will use :py:func:`~verticapy.read_csv` to ingest a
subset of the Titanic dataset. To begin, load the entire Titanic dataset using the
:py:func:`~verticapy.datasets.load_titanic` function:
In the following example, we will use :py:func:`~verticapy.read_csv` to ingest a subset of the Titanic dataset. To begin, load the entire Titanic dataset using the :py:func:`~verticapy.datasets.load_titanic` function:

.. ipython:: python
from verticapy.datasets import load_titanic
titanic = load_titanic()
To convert a subset of the dataset to a CSV file, select the desired rows in
the dataset and use the :py:func:`~verticapy.to_csv` vDataFrame method:
To convert a subset of the dataset to a CSV file, select the desired rows in the dataset and use the :py:func:`~verticapy.to_csv` ``vDataFrame`` method:

.. ipython:: python
Expand All @@ -163,7 +160,8 @@ Before ingesting the above CSV file, we can check its columns and their data typ

.. ipython:: python
vp.pcsv(path = "titanic_subset.csv",
vp.pcsv(
path = "titanic_subset.csv",
sep = ",",
na_rep = "",
)
Expand Down Expand Up @@ -212,8 +210,7 @@ For a full list of supported options, see the :py:func:`~verticapy.read_json` or

VerticaPy also provides a :py:func:`~verticapy.pjson` function to parse JSON files to identify columns and their respective data types.

In the following example, we load the iris dataset using the :py:func:`~verticapy.datasets.load_iris` dataset,
convert the vDataFrame to JSON format with the :py:func:`~verticapy.to_json` method, then ingest the JSON file into Vetica:
In the following example, we load the iris dataset using the :py:func:`~verticapy.datasets.load_iris` dataset, convert the vDataFrame to JSON format with the :py:func:`~verticapy.to_json` method, then ingest the JSON file into Vetica:

.. code-block:: python
Expand Down
Loading

0 comments on commit 512900d

Please sign in to comment.