Merge pull request #720 from RizaFarheen/main

AI Task Reference Doc Updates
orkes-io · Jul 5, 2024 · d8f9d6f · d8f9d6f
2 parents 5a66dca + 942835b
commit d8f9d6f
Show file tree

Hide file tree

Showing 17 changed files with 79 additions and 72 deletions.
diff --git a/docs/reference-docs/ai-tasks/llm-chat-complete.md b/docs/reference-docs/ai-tasks/llm-chat-complete.md
@@ -39,19 +39,18 @@ A system task to complete the chat query. It can be used to instruct the model's
 
 ## Input Parameters
 
-| Attribute | Description |
+| Parameter | Description |
 | --------- | ----------- |
-| llmProvider | Choose the required LLM provider. You can only choose providers to which you have access for at least one model from that provider.<br/><br/>**Note:**If you haven’t configured your AI / LLM provider on your Orkes console, navigate to the **Integrations** tab and configure your required provider. Refer to this doc on [how to integrate the LLM providers with Orkes console and provide access to required groups](https://orkes.io/content/category/integrations/ai-llm). | 
-| model | Choose from the available language model for the chosen LLM provider. You can only choose models for which you have access.<br/><br/>For example, If your LLM provider is Azure Open AI & you’ve configured *text-davinci-003* as the language model, you can choose it under this field. |
-| instructions | Set the ground rule/instructions for the chat so the model responds to only specific queries and will not deviate from the objective.<br/><br/>Under this field, choose the AI prompt created. You can only use the prompts for which you have access.<br/><br/>**Note:**If you haven’t created an AI prompt for your language model, refer to this documentation on [how to create AI Prompts in Orkes Conductor and provide access to required groups](https://orkes.io/content/reference-docs/ai-tasks/prompt-template). |
+| llmProvider | Select the required LLM provider. You can only choose providers to which you have access for at least one model from that provider.<br/><br/>**Note:** If you haven’t configured your AI / LLM provider on your Orkes console, navigate to the **Integrations** tab and set it up. Refer to the documentation for [integrating LLM providers with Orkes console and providing access to required groups](https://orkes.io/content/category/integrations/ai-llm). | 
+| model | Choose from the available language models provided by the selected LLM provider. You can only choose models for which you have access.<br/><br/>For example, If your LLM provider is Azure Open AI and you’ve configured *text-davinci-003* as the language model, you can select it here. |
+| instructions | Set the ground rule/instructions for the chat so the model responds to only specific queries and will not deviate from the objective.<br/><br/>Under this field, you can also choose the AI prompt created. You can only use the prompts for which you have access.<br/><br/>**Note:**If you haven’t created an AI prompt for your language model, refer to this documentation on [how to create AI Prompts in Orkes Conductor and provide access to required groups](https://orkes.io/content/reference-docs/ai-tasks/prompt-template). |
 | promptVariables | The instructions/prompts can include **_promptVariables_**, allowing for dynamic input. These variables support multiple data types, including string, number, boolean, null, and object/array. |
 | messages | Choose the role and messages to complete the chat query.<p align="center"><img src="/content/img/llm-chat-complete-messages.png" alt="Role and messages in LLM Chat complete task" width="50%" height="auto"></img></p><ul><li>Under ‘Role,’ choose the required role for the chat completion. It can take values such as *user*, *assistant*, *system*, or *human*.<ul><li>The roles “user” and “human” represent the user asking questions or initiating the conversation.</li><li>The roles “assistant” and “system” refer to the model responding to the user queries.</li></ul></li><li>Under “Message”, choose the corresponding input to be provided. It can also be [passed as variables](https://orkes.io/content/developer-guides/passing-inputs-to-task-in-conductor). </li></ul> | 
 | temperature | A parameter to control the randomness of the model’s output. Higher temperatures, such as 1.0, make the output more random and creative. Whereas a lower value makes the output more deterministic and focused.<br/><br/>Example: If you're using a text blurb as input and want to categorize it based on its content type, opt for a lower temperature setting. Conversely, if you're providing text inputs and intend to generate content like emails or blogs, it's advisable to use a higher temperature setting. |
 | stopWords | Provide the stop words to be omitted during the text generation process. It can be string or object/array.<br/><br/>In LLM, stop words may be filtered out or given less importance during the text generation process to ensure that the generated text is coherent and contextually relevant. |
 | topP | Another parameter to control the randomness of the model’s output. This parameter defines a probability threshold and then chooses tokens whose cumulative probability exceeds this threshold.<br/><br/>For example: Imagine you want to complete the sentence: “She walked into the room and saw a ______.” Now, the top 4 words the LLM model would consider based on the highest probabilities would be:<ul><li>Cat - 35%</li><li>Dog - 25% </li><li>Book - 15% </li><li>Chair - 10%</li></ul>If you set the top-p parameter to 0.70, the AI will consider tokens until their cumulative probability reaches or exceeds 70%. Here's how it works:<ul><li>Adding "Cat" (35%) to the cumulative probability.</li><li>Adding "Dog" (25%) to the cumulative probability, totaling 60%.</li><li>Adding "Book" (15%) to the cumulative probability, now at 75%.</li></ul>At this point, the cumulative probability is 75%, exceeding the set top-p value of 70%. Therefore, the AI will randomly select one of the tokens from the list of "Cat," "Dog," and "Book" to complete the sentence because these tokens collectively account for approximately 75% of the likelihood. |
-| maxTokens<br/><br/>(Referred as **_Token limit_** in UI) | The maximum number of tokens to be generated by the LLM and returned as part of the result. A token should be approximately 4 characters. |
-| cacheConfig | Enabling this option allows saving the cache output of the task. On enabling, you can provide the following parameters:<ul><li>ttlInSecond - Provide the time to live in seconds. You can also [pass this parameter as a variable](https://orkes.io/content/developer-guides/passing-inputs-to-task-in-conductor).</li><li>key - Provide the cache key, which is a string with parameter substitution based on the task input. You can also [pass this parameter as a variable](https://orkes.io/content/developer-guides/passing-inputs-to-task-in-conductor).</li></ul> |
-| optional | Enabling this option renders the task optional. The workflow continues unaffected by the task's outcome, whether it fails or remains incomplete. | 
+| maxTokens<br/><br/>(Referred as **_Token limit_** in UI) | The maximum number of tokens to be generated by the LLM and returned as part of the result. A token should be approximately four characters. |
+
 
 ## Output Parameters
 

diff --git a/docs/reference-docs/ai-tasks/llm-generate-embeddings.md b/docs/reference-docs/ai-tasks/llm-generate-embeddings.md
@@ -29,16 +29,17 @@ A system task to generate embeddings from the input data provided. Embeddings ar
 
 ## Input Parameters
 
-| Attribute | Description |
+| Parameter | Description |
 | --------- | ----------- |
-| llmProvider | Choose the required LLM provider. You can only choose providers to which you have access for at least one model from that provider.<br/><br/>**Note**:If you haven’t configured your AI / LLM provider on your Orkes console, navigate to the **Integrations** tab and configure your required provider. Refer to this doc on [how to integrate the LLM providers with Orkes console and provide access to required groups](/content/category/integrations/ai-llm).| 
-| model | Choose from the available language model for the chosen LLM provider. You can only choose models for which you have access.<br/><br/>For example, If your LLM provider is Azure Open AI & you’ve configured *text-davinci-003* as the language model, you can choose it under this field. |
-| text | Provide the text to be converted and stored as a vector. The text can also be [passed as parameters to the workflow](https://orkes.io/content/developer-guides/passing-inputs-to-task-in-conductor).|
-| optional | Enabling this option renders the task optional. The workflow continues unaffected by the task's outcome, whether it fails or remains incomplete. | 
+| llmProvider | Select the required LLM provider. You can only choose providers to which you have access for at least one model from that provider.<br/><br/>**Note**:If you haven’t configured your AI / LLM provider on your Orkes console, navigate to the **Integrations** tab and set it up. Refer to the documentation for [integrating LLM providers with Orkes console and providing access to required groups.](/content/category/integrations/ai-llm)| 
+| model | Choose from the available language models provided by the selected LLM provider. You can only choose models for which you have access.<br/><br/>For example, If your LLM provider is Azure Open AI and you’ve configured *text-davinci-003* as the language model, you can select it here. |
+| text | Provide the text to be converted and stored as a vector. The text can also be [passed as parameters](https://orkes.io/content/developer-guides/passing-inputs-to-task-in-conductor).|
 
 ## Output Parameters
 
-The task output is a JSON array containing the vectors of the indexed data.
+| Parameter | Description |
+| --------- | ----------- |
+| result | A JSON array containing the vectors of the indexed data. | 
 
 ## Examples
 

diff --git a/docs/reference-docs/ai-tasks/llm-get-document.md b/docs/reference-docs/ai-tasks/llm-get-document.md
@@ -24,10 +24,10 @@ A system task to retrieve the content of the document provided and use it for fu
 
 ## Input Parameters
 
-| Attribute | Description |
+| Parameter | Description |
 | --------- | ----------- |
-| url | Provide the URL of the document to be retrieved.<br/><br/>Check out our documentation on [how to pass parameters to tasks](https://orkes.io/content/developer-guides/passing-inputs-to-task-in-conductor). |
-| mediaType | Select the media type of the file to be retrieved. Currently, supported media types include:<ul><li>application/pdf</li><li>text/html</li><li>text/plain</li><li>json</li></ul> | 
+| url | Provide the URL of the document to be retrieved. This can also be [passed as variables.](https://orkes.io/content/developer-guides/passing-inputs-to-task-in-conductor). |
+| mediaType | Select the media type of the file to be retrieved. Currently, supported media types include:<ul><li>application/java-archive</li><li>application/EDI-X12</li><li>application/EDIFACT</li><li>application/javascript</li><li>application/octet-stream</li><li>application/ogg</li><li>application/pdf</li><li>application/xhtml+xml</li><li>application/x-shockwave-flash</li><li>application/json</li><li>application/ld+json</li><li>application/xml</li><li>application/zip</li><li>application/x-www-form-urlencoded</li><li>audio/mpeg</li><li>audio/x-ms-wma</li><li>audio/vnd.rn-realaudio</li><li>audio/x-wav</li><li>image/gif</li><li>image/jpeg</li><li>image/png</li><li>image/tiff</li><li>image/vnd.microsoft.icon</li><li>image/x-icon</li><li>image/vnd.djvu</li><li>image/svg+xml</li></ul> | 
 
 ## Examples
 

diff --git a/docs/reference-docs/ai-tasks/llm-get-embeddings.md b/docs/reference-docs/ai-tasks/llm-get-embeddings.md
@@ -6,7 +6,8 @@ import TabItem from '@theme/TabItem';
 
 # LLM Get Embeddings
 
-A system task to get the numerical vector representations of words, phrases, sentences, or documents that have been previously learned or generated by the model. Unlike the process of generating embeddings ([LLM Generate Embeddings](/content/reference-docs/ai-tasks/llm-generate-embeddings) task), which involves creating vector representations from input data, this task deals with the retrieval of pre-existing embeddings and uses them to search for data in vector databases. 
+A system task to retrieve numerical vector representations of words, phrases, sentences, or documents that have been previously generated or learned by the model. Unlike the process of generating embeddings ([LLM Generate Embeddings task](https://orkes.io/content/reference-docs/ai-tasks/llm-generate-embeddings)), which involves creating vector representations from input data, this task focuses on efficiently accessing pre-existing embeddings. This is particularly useful when you have already computed and stored embeddings and want to utilize them without regeneration.
+
 
 ## Definitions
 
@@ -26,20 +27,23 @@ A system task to get the numerical vector representations of words, phrases, sen
 
 ## Input Parameters
 
-| Attribute | Description |
+| Parameter | Description |
 | --------- | ----------- | 
-| vectorDB | Choose the required vector database.<br/><br/>**Note**:If you haven’t configured the vector database on your Orkes console, navigate to the Integrations tab and configure your required provider. Refer to this doc on [how to integrate Vector Databases with Orkes console](/content/category/integrations/vector-databases). |
-| namespace | Choose from the available namespace configured within the chosen vector database.<br/><br/>Namespaces are separate isolated environments within the database to manage and organize vector data effectively.<br/><br/>**Note**:Namespace field is applicable only for Pinecone integration and is not applicable to Weaviate integration.|
-| index | Choose the index in your vector database where indexed text or data was stored.<br/><br/> **Note:**For Weaviate integration, this field refers to the class name, while in Pinecone integration, it denotes the index name itself.|
-| embeddings | Choose the embeddings from which the stored data is to be retrieved. It needs to be from the same embedding model that was used to create the other embeddings that are stored in the same index. |
-| optional | Enabling this option renders the task optional. The workflow continues unaffected by the task's outcome, whether it fails or remains incomplete. | 
+| vectorDB | Choose the vector database from which data is to be retrieved.<br/><br/>**Note**:If you haven’t configured the vector database on your Orkes console, navigate to the Integrations tab and configure your required provider. Refer to the documentation on [how to integrate Vector Databases with Orkes console](/content/category/integrations/vector-databases). |
+| namespace | Choose from the available namespace configured within the chosen vector database.<br/><br/>Namespaces are separate isolated environments within the database to manage and organize vector data effectively.<br/><br/>**Note**: The **_namespace_** field has different names and applicability based on the integration:<ul><li>For Pinecone integration, the namespace field is applicable.</li><li>For Weaviate integration, the namespace field is not applicable.</li><li>For MongoDB integration, the namespace field is referred to as “Collection” in MongoDB.</li><li>For Postgres integration, the namespace field is referred to as “Table” in Postgres.</li></ul>|
+| index | Choose the index in your vector database where indexed text or data was stored.<br/><br/> **Note:**For Weaviate integration, this field refers to the class name, while for other integrations, it denotes the index name.|
+| embeddings | Select the embeddings from which the stored data is to be retrieved. This should be from the same embedding model used to create the embeddings stored in the specified index. |
 
 ## Output Parameters
 
-| Attribute | Description | 
+| Parameter | Description | 
 | --------- | ----------- | 
-| score | Represents a value that quantifies the degree of likeness between a specific item and a query vector, facilitating the ranking and ordering of results. Higher scores denote a stronger resemblance or relevance of a data point to the query vector.|
-| docId | Displays the docId from where the text is queried.|
+| result | A JSON array containing the results of the query.|
+| score | Represents a value that quantifies the degree of likeness between a specific item and a query vector, facilitating the ranking and ordering of results. Higher scores denote a stronger resemblance or relevance to the query vector. |
+| metadata | An object containing additional metadata related to the retrieved document.|
+| docId | Displays the unique identifier of the document queried.|
+| parentDocId | Another identifier that might denote a parent document in hierarchical or relational data structures. |
+| text | Actual content of the document retrieved. | 
 
 ## Examples