Reorganise query execution functions. (#580)

Reorganise query execution functions. Puts all the various ways to execute an SQL command into 3 categories: 1. "exec" functions return `result` (generally). 2. "query" functions wrap "exec" ones, plus string conversion. 3. "stream" functions are like "query" ones, but stream the query. To make these things fall into place, I'm renaming the recently added `for_each()` to `for_stream()`, and providing a `for_query()` cousin. Eventually, I hope `pqxx::result` can just _disappear_ from most users' consciousness. The normal ways to execute a query will be... * _exec0_ just for queries that return no data, * _query_ functions for small result sets or exotic queries, and * _stream_ functions for regular queries returning large result sets. (As a separate effort, I would like to integrate use of parameterised statements into the regular execution functions, so you just pass some `pqxx::params` to those basic functions. Un-parameterised statements will be nothing but a hidden optimisation.)
jtv · Jul 6, 2022 · 38cf12a · 38cf12a
1 parent 05a45d2
commit 38cf12a
Show file tree

Hide file tree

Showing 7 changed files with 319 additions and 114 deletions.
diff --git a/NEWS b/NEWS
@@ -1,6 +1,11 @@
 7.7.4
- - New ways to query a single row!  `query01()` and `query1()`.
+ - `transaction_base::for_each()` is now called `for_stream()`. (#580)
+ - New `transaction_base::for_query()` is similar, but non-streaming. (#580)
+ - Query data and iterate directly as client-side types: `query()`. (#580)
+ - New ways to query a single row!  `query01()` and `query1()`. (#580)
+ - We now have 3 kinds of execution: "exec", "query", and "stream" functions.
  - Use C++23 `std::unreachable()` where available.
+ - `result::iter()` return value now keeps its `result` alive.
 7.7.3
  - Fix up more damage done by auto-formatting.
  - New `result::for_each()`: simple iteration and conversion of rows.  (#528)

diff --git a/README.md b/README.md
@@ -92,12 +92,16 @@ in standard C++ style (as in `<iostream>` etc.), but an editor will still
 recognize them as files containing C++ code.
 
 Continuing the list of classes, you may also need the result class
-(`pqxx/result.hxx`).  In a nutshell, you create a `connection` based on a
-Postgres connection string (see below), create a `work` in the context of that
-connection, and run one or more queries on the work which return `result`
-objects.  The results are containers of rows of data, each of which you can
-treat as an array of strings: one for each field in the row.  But there are
-other ways to query the database.
+(`pqxx/result.hxx`).  In a nutshell, you create a pqxx::connection based on a
+Postgres connection string (see below), create a pqxx::work (a transaction
+object) in the context of that connection, and run one or more queries and/or
+SQL commands on that.
+
+Depending on how you execute a query, it can return a stream of `std::tuple`
+(each representing one row); or a pqxx::result object which holds both the
+result data and additional metadata: how many rows your query returned and/or
+modified, what the column names are, and so on.  A pqxx::result is a container
+of pqxx::row, and a pqxx::row is a container of pqxx::field.
 
 Here's an example with all the basics to get you going:
 
@@ -111,52 +115,50 @@ Here's an example with all the basics to get you going:
         {
             // Connect to the database.  You can have multiple connections open
             // at the same time, even to the same database.
-            pqxx::connection C;
-            std::cout << "Connected to " << C.dbname() << '\n';
+            pqxx::connection c;
+            std::cout << "Connected to " << c.dbname() << '\n';
 
             // Start a transaction.  A connection can only have one transaction
             // open at the same time, but after you finish a transaction, you
             // can start a new one on the same connection.
-            pqxx::work W{C};
-
-            // Perform a query and retrieve all results.
-            pqxx::result R{W.exec("SELECT name FROM employee")};
+            pqxx::work tx{c};
 
-            // Iterate over results.
-            std::cout << "Found " << R.size() << "employees:\n";
-            for (auto row: R)
-                std::cout << row[0].c_str() << '\n';
+            // Query data of two columns, converting them to std::string and
+            // int respectively.  Iterate the rows.
+            for (auto [name, salary] : tx.query<std::string, int>(
+                "SELECT name, salary FROM employee ORDER BY name"))
+            {
+                std::cout << name << " earns " << salary << ".\n";
+            }
 
             // For large amounts of data, "streaming" the results is more
             // efficient.  It does not work for all types of queries though.
-            // What's really nice is that you don't need to iterate result
-            // objects.  This just converts the fields straight to the C++
-            // types you need.
             //
-            // You can use std::string_view for fields here, which is not
+            // You can read fields as std::string_view here, which is not
             // something you can do in most places.  A string_view becomes
             // meaningless when the underlying string ceases to exist.  In this
             // one situation, you can convert a field to string_view and it
             // will be valid for just that one iteration of the loop.  The next
             // iteration may overwrite or deallocate its buffer space.
-            for (auto [name, salary] : W.stream<std::string_view, int>(
+            for (auto [name, salary] : tx.stream<std::string_view, int>(
                 "SELECT name, salary FROM employee"))
             {
                 std::cout << name << " earns " << salary << ".\n";
             }
 
-            // Execute a statement (and check that it returns 0 rows of data).
+            // Execute a statement, and check that it returns 0 rows of data.
+            // This will throw pqxx::unexpected_rows if the query returns rows.
             std::cout << "Doubling all employees' salaries...\n";
-            W.exec0("UPDATE employee SET salary = salary*2");
+            tx.exec0("UPDATE employee SET salary = salary*2");
 
-            // Easy way to query a value from the database.
-            int my_salary = W.query_value<int>(
+            // Shorthand: conveniently query a single value from the database.
+            int my_salary = tx.query_value<int>(
                 "SELECT salary FROM employee WHERE name = 'Me'");
             std::cout << "I now earn " << my_salary << ".\n";
 
-            // Or, query one whole row.  This will throw an exception unless
-            // the result contains exactly 1 row.
-            auto [top_name, top_salary] = W.query1<std::string, int>(
+            // Or, query one whole row.  This function will throw an exception
+            // unless the result contains exactly 1 row.
+            auto [top_name, top_salary] = tx.query1<std::string, int>(
                 R"(
                     SELECT salary
                     FROM employee
@@ -166,14 +168,23 @@ Here's an example with all the basics to get you going:
             std::cout << "Top earner is " << top_name << " with a salary of "
                       << top_salary << ".\n";
 
-            // Commit the transaction.
+            // If you need to access the result metadata, not just the actual
+            // field values, use the "exec" functions.  Most of them return
+            // pqxx::result objects.
+            pqxx::result res = tx.exec("SELECT * FROM employee");
+            std::cout << "Columns:\n";
+            for (pqxx::row_size col = 0; col < res.columns(); ++col)
+                std::cout << res.column_name(col) << '\n';
+
+            // Commit the transaction.  If you don't do this, the database will
+            // undo any changes you made in the transaction.
             std::cout << "Making changes definite: ";
-            W.commit();
+            tx.commit();
             std::cout << "OK.\n";
         }
         catch (std::exception const &e)
         {
-            std::cerr << e.what() << '\n';
+            std::cerr << "ERROR: " << e.what() << '\n';
             return 1;
         }
         return 0;

diff --git a/include/pqxx/doc/accessing-results.md b/include/pqxx/doc/accessing-results.md
@@ -1,30 +1,111 @@
 Accessing results and result rows                   {#accessing-results}
 =================================
 
-When you execute a query using one of the transaction "exec*" functions, you
-normally get a `result` object back.  A `result` is a container of `row`s.
+A query produces a result set consisting of rows, and each row consists of
+fields.  There are several ways to receive this data.
 
-(There are exceptions: `exec1` expects exactly one row of data, so it returns
-just a `row`, not a full `result`.  And `exec0` expects no data at all, so it
-returns nothing.)
+The fields are "untyped."  That is to say, libpqxx has no opinion on what their
+types are.  The database sends the data in a very flexible textual format.
+When you read a field, you specify what type you want it to be, and libpqxx
+converts the text format to that type for you.
 
-Result objects are an all-or-nothing affair.  An `exec*` function waits until
-it's received all the result data, and only then will it return.  _(There is a
-faster, easier way of executing queries with large result sets, so see
-"streaming rows" below as well.)_
+If a value does not conform to the format for the type you specify, the
+conversion fails.  For example, if you have strings that all happen to contain
+numbers, you can read them as `int`.  But if any of the values is empty, or
+it's null (for a type that doesn't support null), or it's some string that does
+not look like an integer, or it's too large, you can't convert it to `int`.
 
-For example, your code might do:
+So usually, reading result data from the database means not just retrieving the
+data; it also means converting it to some target type.
+
+
+Querying rows of data
+---------------------
+
+The simplest way to query rows of data is to call one of a transaction's
+"query" functions, passing as template arguments the types of columns you want
+to get back (e.g. `int`, `std::string`, `double`, and so on) and as a regular
+argument the query itself.
+
+You can then iterate over the result to go over the rows of data:
 
 ```cxx
-    pqxx::result r = tx.exec("SELECT * FROM mytable");
+    for (auto [id, value] :
+        tx.query<int, std::string>("SELECT id, name FROM item"))
+    {
+        std::cout << id << '\t' << value << '\n';
+    }
 ```
 
-Now, how do you access the data inside `r`?
+The "query" functions execute your query, load the complete result data from
+the database, and then as you iterate, convert each row it received to a tuple
+of C++ types that you indicated.
+
+There are different query functions for querying any number of rows (`query()`);
+querying just one row of data as a `std::tuple` and throwing an error if there's
+more than one row (`query1()`); or querying
+
+Streaming rows
+--------------
 
-Result sets act as standard C++ containers of rows.  Rows act as standard
-C++ containers of fields.  So the easiest way to go through them is:
+There's another way to go through the rows coming out of a query.  It's
+usually easier and faster if there are a lot of rows, but there are drawbacks.
+
+**One,** you start getting rows before all the data has come in from the
+database.  That speeds things up, but what happens if you lose your network
+connection while transferring the data?  Your application may already have
+processed some of the data before finding out that the rest isn't coming.  If
+that is a problem for your application, streaming may not be the right choice.
+
+**Two,** streaming only works for some types of query.  The `stream()` function
+wraps your query in a PostgreSQL `COPY` command, and `COPY` only supports a few
+commands: `SELECT`, `VALUES`, or an `INSERT`, `UPDATE`, or `DELETE` with a
+`RETURNING` clause.  See the `COPY` documentation here:
+[
+    https://www.postgresql.org/docs/current/sql-copy.html
+](https://www.postgresql.org/docs/current/sql-copy.html).
+
+**Three,** when you convert a field to a "view" type (such as
+`std::string_view` or `std::basic_string_view<std::byte>`), the view points to
+underlying data which only stays valid until you iterate to the next row or
+exit the loop.  So if you want to use that data for longer than a single
+iteration of the streaming loop, you'll have to store it somewhere yourself.
+
+Now for the good news.  Streaming does make it very easy to query data and loop
+over it:
 
 ```cxx
+    for (auto [id, name, x, y] :
+        tx.stream<int, std::string_view, float, float>(
+            "SELECT id, name, x, y FROM point"))
+      process(id + 1, "point-" + name, x * 10.0, y * 10.0);
+```
+
+The conversion to C++ types (here `int`, `std::string_view`, and two `float`s)
+is built into the function.  You never even see `row` objects, `field` objects,
+iterators, or conversion methods.  You just put in your query and you receive
+your data.
+
+
+
+Results with metadata
+---------------------
+
+Sometimes you want more from a query result than just rows of data.  You may
+need to know right away how many rows of result data you received, or how many
+rows your `UPDATE` statement has affected, or the names of the columns, etc.
+
+For that, use the transaction's "exec" query execution functions.  Apart from a
+few exceptions, these return a `pqxx::result` object.  A `result` is a container
+of `pqxx::row` objects, so you can iterate them as normal, or index them like
+you would index an array.  Each `row` in turn is a container of `pqxx::field`,
+Each `field` holds a value, but doesn't know its type.  You specify the type
+when you read the value.
+
+For example, your code might do:
+
+```cxx
+    pqxx::result r = tx.exec("SELECT * FROM mytable");
     for (auto const &row: r)
     {
        for (auto const &field: row) std::cout << field.c_str() << '\t';
@@ -116,45 +197,3 @@ This becomes really helpful with the array-indexing operator.  With regular
 C++ iterators you would need ugly expressions like `(*row)[0]` or
 `row->operator[](0)`.  With the iterator types defined by the result and
 row classes you can simply say `row[0]`.
-
-
-Streaming rows
---------------
-
-There's another way to go through the rows coming out of a query.  It's
-usually easier and faster, but there are drawbacks.
-
-**One,** you start getting rows before all the data has come in from the
-database.  That speeds things up, but what happens if you lose your network
-connection while transferring the data?  Your application may already have
-processed some of the data before finding out that the rest isn't coming.  If
-that is a problem for your application, streaming may not be the right choice.
-
-**Two,** streaming only works for some types of query.  The `stream()` function
-wraps your query in a PostgreSQL `COPY` command, and `COPY` only supports a few
-commands: `SELECT`, `VALUES`, or an `INSERT`, `UPDATE`, or `DELETE` with a
-`RETURNING` clause.  See the `COPY` documentation here:
-[
-    https://www.postgresql.org/docs/current/sql-copy.html
-](https://www.postgresql.org/docs/current/sql-copy.html).
-
-**Three,** when you convert a field to a "view" type (such as
-`std::string_view` or `std::basic_string_view<std::byte>`), the view points to
-underlying data which only stays valid until you iterate to the next row or
-exit the loop.  So if you want to use that data for longer than a single
-iteration of the streaming loop, you'll have to store it somewhere yourself.
-
-Now for the good news.  Streaming does make it very easy to query data and loop
-over it:
-
-```cxx
-    for (auto [id, name, x, y] :
-        tx.stream<int, std::string_view, float, float>(
-            "SELECT id, name, x, y FROM point"))
-      process(id + 1, "point-" + name, x * 10.0, y * 10.0);
-```
-
-The conversion to C++ types (here `int`, `std::string_view`, and two `float`s)
-is built into the function.  You never even see `row` objects, `field` objects,
-iterators, or conversion methods.  You just put in your query and you receive
-your data.
diff --git a/include/pqxx/doc/streams.md b/include/pqxx/doc/streams.md
@@ -55,9 +55,9 @@ then you begin processing.  With `stream_from` you can be processing data on
 the client side while the server is still sending you the rest.
 
 You don't actually need to create a `stream_from` object yourself, though you
-can.  Two shorthand functions, @ref pqxx::transaction_base::stream
-and @ref pqxx::transaction_base::for_each, can create the streams for you with
-a minimum of overhead.
+can if you want to.  Two shorthand functions,
+@ref pqxx::transaction_base::stream and @ref pqxx::transaction_base::for_stream,
+can each create the streams for you with a minimum of overhead.
 
 Not all kinds of queries will work in a stream.  Internally the streams make
 use of PostgreSQL's `COPY` command, so see the PostgreSQL documentation for

diff --git a/include/pqxx/internal/result_iter.hxx b/include/pqxx/internal/result_iter.hxx
@@ -91,7 +91,7 @@ public:
   iterator end() const { return {}; }
 
 private:
-  pqxx::result const &m_home;
+  pqxx::result const m_home;
 };
 } // namespace pqxx::internal