`groupBy` support for multi-hop query

## Background

After some offline discussion, we identified the need for a groupBy capability in ActionbaseQuery. When running multi-hop queries, the raw result is a flat list of edges. In many use cases (e.g. "for each friend, collect their wishlist items"), the caller needs to group and aggregate those results by a key — currently this has to be done client-side.

This proposal adds a top-level `groupBy` field to the v3 query request that groups the output of a specified hop and collects values into per-group lists.

## Why a Top-Level Field (Not Aggregator / PostProcessor)?

Both `Aggregator` and `PostProcessor` currently follow a **"flat DataFrame in → flat DataFrame out, cells are primitives"** contract. `groupBy` produces a nested structure — cells containing arrays — which breaks this contract.

- **Aggregator** — designed for scalar reductions (`Count`, `Sum`). Supporting `collect_list` would either break the return type (`Mono<DataFrame>`) and the `applyAggregators` chain, or require a new `DataType.ARRAY` that ripples through `DataFrame` / `Row` / `StructType` / `toJsonFormat` / `PostProcessor` — classic YAGNI.
- **PostProcessor** — designed for row-wise reshaping (`JsonObject`: 1 row → 1 row, `SplitExplode`: 1 row → N rows). `groupBy` is row-reduction (N rows → fewer rows), a fundamentally different semantic.
- **Top-level field** — applied at the final boundary of `ActionbaseQueryExecutor.query()`, so the "flat DataFrame" invariant is preserved throughout the pipeline. The nested output never feeds into a subsequent hop, keeping the internal contract intact.


## Proposed Design

### Setup

**hop1: SCAN follows** (index: `created_at_desc`)

| source | target | createdAt |
|--------|-------|-----------|
| 1000   | 2001   | 200           |
| 1000   | 2000  | 100            |

**hop2: CACHE wishlist** (cache: `recent_wishlist`)

| source | target | createdAt |
|--------|-------|-----------|
| 2001    | 5002  | 500           |
| 2001    | 5001  | 400           |
| 2000   | 5000  | 300           |

### Request

```bash
curl -X POST http://localhost:8080/graph/v3/query \
  -H 'Content-Type: application/json' \
  -d '{
    "query": [
      {
        "type": "SCAN",
        "name": "hop1",
        "database": "social",
        "table": "follows",
        "source": {"type": "VALUE", "value": [1000]},
        "direction": "OUT",
        "index": "created_at_desc"
      },
      {
        "type": "CACHE",
        "name": "hop2",
        "database": "commerce",
        "table": "wishlist",
        "source": {"type": "REF", "ref": "hop1", "field": "target"},
        "direction": "OUT",
        "cache": "recent_wishlist",
        "limit": 10,
        "include": true
      }
    ],
    "groupBy": {
      "target": "hop2",
      "keys": ["source"],
      "collect": [
        {"field": "target", "alias": "wishes"}
      ]
    }
  }'
```

| Field | Description |
|-------|-------------|
| `target` | The `name` of the query item to apply grouping to. The item must have `include: true`. |
| `keys` | List of field names to group by. |
| `collect` | Fields whose values are accumulated into per-group lists. |
| `collect[].field` | Column name to collect values from. |
| `collect[].alias` | *(optional, defaults to `field`)* Output name for the collected list. |


### Response

```json
{
  "items": [
    {
      "name": "hop2",
      "data": [
        {"source": 2001, "wishes": [5002, 5001]},
        {"source": 2000, "wishes": [5000]}
      ],
      "rows": 2,
      "stats": [],
      "offset": null,
      "hasNext": false
    }
  ]
}
```
- `offset` and `hasNext` are not meaningful but required by `NamedJsonFormat`.
---

Feedback welcome on the schema shape and semantics before we start implementation.


Field	Description
`target`	The `name` of the query item to apply grouping to. The item must have `include: true`.
`keys`	List of field names to group by.
`collect`	Fields whose values are accumulated into per-group lists.
`collect[].field`	Column name to collect values from.
`collect[].alias`	(optional, defaults to `field`) Output name for the collected list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`groupBy` support for multi-hop query #246

Background

Why a Top-Level Field (Not Aggregator / PostProcessor)?

Proposed Design

Setup

Request

Response

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

groupBy support for multi-hop query #246

Description

Background

Why a Top-Level Field (Not Aggregator / PostProcessor)?

Proposed Design

Setup

Request

Response

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`groupBy` support for multi-hop query #246