Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions apl/aggregation-function/phrases.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
title: 'phrases'
description: 'This page explains how to use the phrases aggregation function in APL.'
---

The `phrases` aggregation extracts and counts common phrases or word sequences from text fields across a dataset. It analyzes text content to identify frequently occurring phrases, helping you discover patterns, trends, and common topics in your data.

You can use this aggregation to identify common user queries, discover trending topics, extract key phrases from logs, or analyze conversation patterns in AI applications.

## For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

<AccordionGroup>
<Accordion title="Splunk SPL users">

In Splunk SPL, there’s no built-in phrases function, but you might use the `rare` or `top` commands on tokenized text.

<CodeGroup>
```sql Splunk example
| rex field=message "(?<words>\w+)"
| top words
```

```kusto APL equivalent
['sample-http-logs']
| summarize phrases(uri, 10)
```
</CodeGroup>

</Accordion>
<Accordion title="ANSI SQL users">

In ANSI SQL, you would need complex string manipulation and grouping to extract common phrases.

<CodeGroup>
```sql SQL example
SELECT
phrase,
COUNT(*) as frequency
FROM (
SELECT UNNEST(SPLIT(message, ' ')) as phrase
FROM logs
)
GROUP BY phrase
ORDER BY frequency DESC
LIMIT 10
```

```kusto APL equivalent
['sample-http-logs']
| summarize phrases(uri, 10)
```
</CodeGroup>

</Accordion>
</AccordionGroup>

## Usage

### Syntax

```kusto
summarize phrases(column, max_phrases)
```

### Parameters

- **column** (string, required): The column containing text data from which to extract phrases.
- **max_phrases** (long, optional): The maximum number of top phrases to return. Default is 10.

### Returns

Returns a dynamic array containing the most common phrases found in the specified column, ordered by frequency.

## Use case examples

<Tabs>
<Tab title="Log analysis">

Extract common URL patterns to understand which endpoints are most frequently accessed.

**Query**

```kusto
['sample-http-logs']
| where status == '404'
| summarize common_404_paths = phrases(uri, 20)
```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20where%20status%20%3D%3D%20'404'%20%7C%20summarize%20common_404_paths%20%3D%20phrases(uri%2C%2020)%22%7D)

**Output**

| common_404_paths |
|------------------|
| ["/api/v1/users/profile", "/assets/old-logo.png", "/docs/deprecated", ...] |

This query identifies the most common 404 error paths, helping you fix broken links or redirect old URLs.

</Tab>
<Tab title="OpenTelemetry traces">

Analyze common operation names across traces to understand service usage patterns.

**Query**

```kusto
['otel-demo-traces']
| where ['service.name'] == 'frontend'
| summarize common_operations = phrases(name, 15) by ['service.name']
```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B'otel-demo-traces'%5D%20%7C%20where%20%5B'service.name'%5D%20%3D%3D%20'frontend'%20%7C%20summarize%20common_operations%20%3D%20phrases(name%2C%2015)%20by%20%5B'service.name'%5D%22%7D)

**Output**

| service.name | common_operations |
|--------------|-------------------|
| frontend | ["HTTP GET", "cart.checkout", "product.view", "user.login", ...] |

This query reveals the most common operations in your frontend service, helping you understand usage patterns.

</Tab>
<Tab title="Security logs">

Identify common patterns in potentially malicious requests by analyzing suspicious URIs.

**Query**

```kusto
['sample-http-logs']
| where status in ('403', '401') or uri contains '..'
| summarize suspicious_patterns = phrases(uri, 25)
```

[Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20where%20status%20in%20('403'%2C%20'401')%20or%20uri%20contains%20'..'%20%7C%20summarize%20suspicious_patterns%20%3D%20phrases(uri%2C%2025)%22%7D)

**Output**

| suspicious_patterns |
|---------------------|
| ["../../../etc/passwd", "/admin/login", "/.env", "/wp-admin", ...] |

This query identifies common attack patterns in your logs, helping you understand security threats and improve defenses.

</Tab>
</Tabs>

## List of related functions

- [make_list](/apl/aggregation-function/make-list): Creates an array of all values. Use this when you need all occurrences rather than common phrases.
- [make_set](/apl/aggregation-function/make-set): Creates an array of unique values. Use this for distinct values without frequency analysis.
- [topk](/apl/aggregation-function/topk): Returns top K values by a specific aggregation. Use this for numerical top values rather than phrase extraction.
- [count](/apl/aggregation-function/count): Counts occurrences. Combine with group by for manual phrase counting if you need more control.
- [dcount](/apl/aggregation-function/dcount): Counts distinct values. Use this to understand the variety of phrases before extracting top ones.

19 changes: 19 additions & 0 deletions apl/apl-features.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ keywords: ['axiom documentation', 'documentation', 'axiom', 'APL', 'axiom proces
| Aggregation function | [topkif](/apl/aggregation-function/topkif) | Calculates the top values of an expression in records for which the predicate evaluates to true. |
| Aggregation function | [variance](/apl/aggregation-function/variance) | Calculates the variance of an expression across the group. |
| Aggregation function | [varianceif](/apl/aggregation-function/varianceif) | Calculates the variance of an expression in records for which the predicate evaluates to true. |
| Aggregation function | [phrases](/apl/aggregation-function/phrases) | Extracts and counts common phrases or word sequences from text fields. |
| Array function | [array_concat](/apl/scalar-functions/array-functions/array-concat) | Concatenates arrays into one. |
| Array function | [array_extract](/apl/scalar-functions/array-functions/array-extract) | Extracts values from a nested array. |
| Array function | [array_iff](/apl/scalar-functions/array-functions/array-iff) | Filters array by condition. |
Expand Down Expand Up @@ -103,6 +104,24 @@ keywords: ['axiom documentation', 'documentation', 'axiom', 'APL', 'axiom proces
| Datetime function | [unixtime_nanoseconds_todatetime](/apl/scalar-functions/datetime-functions/unixtime-nanoseconds-todatetime) | Converts nanosecond Unix timestamp to datetime. |
| Datetime function | [unixtime_seconds_todatetime](/apl/scalar-functions/datetime-functions/unixtime-seconds-todatetime) | Converts second Unix timestamp to datetime. |
| Datetime function | [week_of_year](/apl/scalar-functions/datetime-functions/week-of-year) | Returns the ISO 8601 week number from a datetime expression. |
| GenAI function | [genai_concat_contents](/apl/scalar-functions/genai-functions/genai-concat-contents) | Concatenates message contents from a GenAI conversation array. |
| GenAI function | [genai_conversation_turns](/apl/scalar-functions/genai-functions/genai-conversation-turns) | Counts the number of conversation turns in GenAI messages. |
| GenAI function | [genai_cost](/apl/scalar-functions/genai-functions/genai-cost) | Calculates the total cost for input and output tokens. |
| GenAI function | [genai_estimate_tokens](/apl/scalar-functions/genai-functions/genai-estimate-tokens) | Estimates the number of tokens in a text string. |
| GenAI function | [genai_extract_assistant_response](/apl/scalar-functions/genai-functions/genai-extract-assistant-response) | Extracts the assistant’s response from a GenAI conversation. |
| GenAI function | [genai_extract_function_results](/apl/scalar-functions/genai-functions/genai-extract-function-results) | Extracts function call results from GenAI messages. |
| GenAI function | [genai_extract_system_prompt](/apl/scalar-functions/genai-functions/genai-extract-system-prompt) | Extracts the system prompt from a GenAI conversation. |
| GenAI function | [genai_extract_tool_calls](/apl/scalar-functions/genai-functions/genai-extract-tool-calls) | Extracts tool calls from GenAI messages. |
| GenAI function | [genai_extract_user_prompt](/apl/scalar-functions/genai-functions/genai-extract-user-prompt) | Extracts the user prompt from a GenAI conversation. |
| GenAI function | [genai_get_content_by_index](/apl/scalar-functions/genai-functions/genai-get-content-by-index) | Gets message content by index position. |
| GenAI function | [genai_get_content_by_role](/apl/scalar-functions/genai-functions/genai-get-content-by-role) | Gets message content by role. |
| GenAI function | [genai_get_pricing](/apl/scalar-functions/genai-functions/genai-get-pricing) | Gets pricing information for a specific AI model. |
| GenAI function | [genai_get_role](/apl/scalar-functions/genai-functions/genai-get-role) | Gets the role of a message at a specific index. |
| GenAI function | [genai_has_tool_calls](/apl/scalar-functions/genai-functions/genai-has-tool-calls) | Checks if GenAI messages contain tool calls. |
| GenAI function | [genai_input_cost](/apl/scalar-functions/genai-functions/genai-input-cost) | Calculates the cost for input tokens. |
| GenAI function | [genai_is_truncated](/apl/scalar-functions/genai-functions/genai-is-truncated) | Checks if a GenAI response was truncated. |
| GenAI function | [genai_message_roles](/apl/scalar-functions/genai-functions/genai-message-roles) | Extracts all message roles from a GenAI conversation. |
| GenAI function | [genai_output_cost](/apl/scalar-functions/genai-functions/genai-output-cost) | Calculates the cost for output tokens. |
| Hash function | [hash_md5](/apl/scalar-functions/hash-functions#hash-md5) | Returns MD5 hash. |
| Hash function | [hash_sha1](/apl/scalar-functions/hash-functions#hash-sha1) | Returns SHA-1 hash. |
| Hash function | [hash_sha256](/apl/scalar-functions/hash-functions#hash-sha256) | Returns SHA256 hash. |
Expand Down
74 changes: 74 additions & 0 deletions apl/scalar-functions/genai-functions.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
title: 'GenAI functions'
description: 'This page provides an overview of GenAI functions in APL for analyzing and processing GenAI conversation data.'
---

GenAI functions in APL help you analyze and process GenAI conversation data, including messages, token usage, costs, and conversation metadata. These functions are useful when working with logs or data from large language models (LLMs) and AI systems.

## When to use GenAI functions

Use GenAI functions when you need to:

- Extract specific information from AI conversation logs, such as user prompts, assistant responses, or system prompts
- Calculate token costs and usage metrics for LLM API calls
- Analyze conversation structure and flow, including turn counts and message roles
- Process and filter conversation messages based on roles or content
- Determine pricing information for different AI models
- Detect truncation or tool calls in AI responses

## Available GenAI functions

| Function | Description |
|:---------|:------------|
| [genai_concat_contents](/apl/scalar-functions/genai-functions/genai-concat-contents) | Concatenates message contents from a conversation array |
| [genai_conversation_turns](/apl/scalar-functions/genai-functions/genai-conversation-turns) | Counts the number of conversation turns |
| [genai_cost](/apl/scalar-functions/genai-functions/genai-cost) | Calculates the total cost for input and output tokens |
| [genai_estimate_tokens](/apl/scalar-functions/genai-functions/genai-estimate-tokens) | Estimates the number of tokens in a text string |
| [genai_extract_assistant_response](/apl/scalar-functions/genai-functions/genai-extract-assistant-response) | Extracts the assistant’s response from a conversation |
| [genai_extract_function_results](/apl/scalar-functions/genai-functions/genai-extract-function-results) | Extracts function call results from messages |
| [genai_extract_system_prompt](/apl/scalar-functions/genai-functions/genai-extract-system-prompt) | Extracts the system prompt from a conversation |
| [genai_extract_tool_calls](/apl/scalar-functions/genai-functions/genai-extract-tool-calls) | Extracts tool calls from messages |
| [genai_extract_user_prompt](/apl/scalar-functions/genai-functions/genai-extract-user-prompt) | Extracts the user prompt from a conversation |
| [genai_get_content_by_index](/apl/scalar-functions/genai-functions/genai-get-content-by-index) | Gets message content by index position |
| [genai_get_content_by_role](/apl/scalar-functions/genai-functions/genai-get-content-by-role) | Gets message content by role |
| [genai_get_pricing](/apl/scalar-functions/genai-functions/genai-get-pricing) | Gets pricing information for a specific model |
| [genai_get_role](/apl/scalar-functions/genai-functions/genai-get-role) | Gets the role of a message at a specific index |
| [genai_has_tool_calls](/apl/scalar-functions/genai-functions/genai-has-tool-calls) | Checks if messages contain tool calls |
| [genai_input_cost](/apl/scalar-functions/genai-functions/genai-input-cost) | Calculates the cost for input tokens |
| [genai_is_truncated](/apl/scalar-functions/genai-functions/genai-is-truncated) | Checks if a response was truncated |
| [genai_message_roles](/apl/scalar-functions/genai-functions/genai-message-roles) | Extracts all message roles from a conversation |
| [genai_output_cost](/apl/scalar-functions/genai-functions/genai-output-cost) | Calculates the cost for output tokens |

## Common use cases

### Analyzing conversation costs

Calculate the total cost of AI conversations across different models and usage patterns.

```kusto
['ai-logs']
| extend total_cost = genai_cost(model, input_tokens, output_tokens)
| summarize sum(total_cost) by model
```

### Extracting conversation components

Extract specific parts of conversations for analysis or debugging.

```kusto
['ai-logs']
| extend user_query = genai_extract_user_prompt(messages)
| extend ai_response = genai_extract_assistant_response(messages)
| project _time, user_query, ai_response
```

### Monitoring token usage

Track and analyze token consumption patterns.

```kusto
['ai-logs']
| extend estimated_tokens = genai_estimate_tokens(content)
| summarize avg(estimated_tokens), max(estimated_tokens) by model
```

Loading