Skip to content

Commit

Permalink
Improve docs for foundational topics (spring-projects#1279)
Browse files Browse the repository at this point in the history
* Improve syntax and grammar
* Fix examples using latest APIs
* Add missing info about newer features

Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>
  • Loading branch information
ThomasVitale authored Aug 27, 2024
1 parent e1884d1 commit 37c3450
Show file tree
Hide file tree
Showing 9 changed files with 130 additions and 133 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,16 @@ Here is the link:https://github.com/spring-projects/spring-ai/blob/main/spring-a
----
public interface ChatModel extends Model<Prompt, ChatResponse> {
default String call(String message) {// implementation omitted
}
default String call(String message) {...}
@Override
ChatResponse call(Prompt prompt);
}
----

The `call` method with a `String` parameter simplifies initial use, avoiding the complexities of the more sophisticated `Prompt` and `ChatResponse` classes.
In real-world applications, it is more common to use the `call` method that takes a `Prompt` instance and returns an `ChatResponse`.
The `call()` method with a `String` parameter simplifies initial use, avoiding the complexities of the more sophisticated `Prompt` and `ChatResponse` classes.
In real-world applications, it is more common to use the `call()` method that takes a `Prompt` instance and returns a `ChatResponse`.

=== StreamingChatModel

Expand All @@ -42,17 +41,20 @@ Here is the link:https://github.com/spring-projects/spring-ai/blob/main/spring-a
[source,java]
----
public interface StreamingChatModel extends StreamingModel<Prompt, ChatResponse> {
default Flux<String> stream(String message) {...}
@Override
Flux<ChatResponse> stream(Prompt prompt);
}
----

The `stream` method takes a `Prompt` request similar to `ChatModel` but it streams the responses using the reactive Flux API.
The `stream()` method takes a `String` or `Prompt` parameter similar to `ChatModel` but it streams the responses using the reactive Flux API.

=== Prompt

The https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/chat/prompt/Prompt.java[Prompt] is a `ModelRequest` that encapsulates a list of https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/chat/messages/Message.java[Message] objects and optional model request options.
The following listing shows a truncated version of the Prompt class, excluding constructors and other utility methods:
The following listing shows a truncated version of the `Prompt` class, excluding constructors and other utility methods:

[source,java]
----
Expand All @@ -63,7 +65,7 @@ public class Prompt implements ModelRequest<List<Message>> {
private ChatOptions modelOptions;
@Override
public ChatOptions getOptions() {..}
public ChatOptions getOptions() {...}
@Override
public List<Message> getInstructions() {...}
Expand All @@ -74,7 +76,7 @@ public class Prompt implements ModelRequest<List<Message>> {

==== Message

The `Message` interface encapsulates a Prompt textual content and a collection of metadata attributes and a categorization known as `MessageType`.
The `Message` interface encapsulates a `Prompt` textual content, a collection of metadata attributes, and a categorization known as `MessageType`.

The interface is defined as follows:

Expand Down Expand Up @@ -108,7 +110,7 @@ image::spring-ai-message-api.jpg[Spring AI Message API, width=800, align="center

The chat completion endpoint, distinguish between message categories based on conversational roles, effectively mapped by the `MessageType`.

For instance, OpenAI recognizes message categories for distinct conversational roles such as `system`,`user`, `function` or `assistant`.
For instance, OpenAI recognizes message categories for distinct conversational roles such as `system`, `user`, `function`, or `assistant`.

While the term `MessageType` might imply a specific message format, in this context it effectively designates the role a message plays in the dialogue.

Expand All @@ -120,23 +122,26 @@ To understand the practical application and the relationship between `Prompt` an
Represents the options that can be passed to the AI model. The `ChatOptions` class is a subclass of `ModelOptions` and is used to define few portable options that can be passed to the AI model.
The `ChatOptions` class is defined as follows:


[source,java]
----
public interface ChatOptions extends ModelOptions {
String getModel();
Float getFrequencyPenalty();
Integer getMaxTokens();
Float getPresencePenalty();
List<String> getStopSequences();
Float getTemperature();
void setTemperature(Float temperature);
Float getTopP();
void setTopP(Float topP);
Integer getTopK();
void setTopK(Integer topK);
Float getTopP();
ChatOptions copy();
}
----

Additionally, every model specific ChatModel/StreamingChatModel implementation can have its own options that can be passed to the AI model. For example, the OpenAI Chat Completion model has its own options like `presencePenalty`, `frequencyPenalty`, `bestOf` etc.
Additionally, every model specific ChatModel/StreamingChatModel implementation can have its own options that can be passed to the AI model. For example, the OpenAI Chat Completion model has its own options like `logitBias`, `seed`, and `user`.

This is a powerful feature that allows developers to use model specific options when starting the application and then override them with at runtime using the Prompt request:
This is a powerful feature that allows developers to use model-specific options when starting the application and then override them at runtime using the `Prompt` request:

image::chat-options-flow.jpg[align="center", width="800px"]

Expand Down Expand Up @@ -169,13 +174,13 @@ The `ChatResponse` class also carries a `ChatResponseMetadata` metadata about th
[[Generation]]
=== Generation

Finally, the https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/chat/model/Generation.java[Generation] class extends from the `ModelResult` to represent the output assistant message response and related metadata about this result:
Finally, the https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/chat/model/Generation.java[Generation] class extends from the `ModelResult` to represent the model output (assistant message) and related metadata:

[source,java]
----
public class Generation implements ModelResult<AssistantMessage> {
private AssistantMessage assistantMessage;
private final AssistantMessage assistantMessage;
private ChatGenerationMetadata chatGenerationMetadata;
@Override
Expand All @@ -194,9 +199,9 @@ The `ChatModel` and `StreamingChatModel` implementations are provided for the fo

image::spring-ai-chat-completions-clients.jpg[align="center", width="800px"]

* xref:api/chat/openai-chat.adoc[OpenAI Chat Completion] (streaming & function-calling support)
* xref:api/chat/openai-chat.adoc[OpenAI Chat Completion] (streaming, multi-modality & function-calling support)
* xref:api/chat/azure-openai-chat.adoc[Microsoft Azure Open AI Chat Completion] (streaming & function-calling support)
* xref:api/chat/ollama-chat.adoc[Ollama Chat Completion]
* xref:api/chat/ollama-chat.adoc[Ollama Chat Completion] (streaming, multi-modality & function-calling support)
* xref:api/chat/huggingface.adoc[Hugging Face Chat Completion] (no streaming support)
* xref:api/chat/vertexai-palm2-chat.adoc[Google Vertex AI PaLM2 Chat Completion] (no streaming support)
* xref:api/chat/vertexai-gemini-chat.adoc[Google Vertex AI Gemini Chat Completion] (streaming, multi-modality & function-calling support)
Expand All @@ -207,11 +212,11 @@ image::spring-ai-chat-completions-clients.jpg[align="center", width="800px"]
** xref:api/chat/bedrock/bedrock-anthropic.adoc[Anthropic Chat Completion]
** xref:api/chat/bedrock/bedrock-jurassic2.adoc[Jurassic2 Chat Completion]
* xref:api/chat/mistralai-chat.adoc[Mistral AI Chat Completion] (streaming & function-calling support)
* xref:api/chat/anthropic-chat.adoc[Anthropic Chat Completion] (streaming)
* xref:api/chat/anthropic-chat.adoc[Anthropic Chat Completion] (streaming & function-calling support)

== Chat Model API

The Spring AI Chat Model API is build on top of the Spring AI `Generic Model API` providing Chat specific abstractions and implementations. Following class diagram illustrates the main classes and interfaces of the Spring AI Chat Model API.
The Spring AI Chat Model API is built on top of the Spring AI `Generic Model API` providing Chat specific abstractions and implementations. The following class diagram illustrates the main classes and interfaces of the Spring AI Chat Model API.

image::spring-ai-chat-api.jpg[align="center", width="900px"]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The integration of function support in AI models, permits the model to request t

image::function-calling-basic-flow.jpg[Function calling, width=700, align="center"]

Spring AI currently supports Function invocation for the following AI Models
Spring AI currently supports function invocation for the following AI Models:

* Anthropic Claude: Refer to the xref:api/chat/functions/anthropic-chat-functions.adoc[Anthropic Claude function invocation docs].
* Azure OpenAI: Refer to the xref:api/chat/functions/azure-open-ai-chat-functions.adoc[Azure OpenAI function invocation docs].
Expand All @@ -14,5 +14,5 @@ Spring AI currently supports Function invocation for the following AI Models
* Mistral AI: Refer to the xref:api/chat/functions/mistralai-chat-functions.adoc[Mistral AI function invocation docs].
// * MiniMax : Refer to the xref:api/chat/functions/minimax-chat-functions.adoc[MiniMax function invocation docs].
* Ollama: Refer to the xref:api/chat/functions/ollama-chat-functions.adoc[Ollama function invocation docs] (streaming not supported yet).
* OpenAI: Refer to the xref:api/chat/functions/openai-chat-functions.adoc[Open AI function invocation docs].
* OpenAI: Refer to the xref:api/chat/functions/openai-chat-functions.adoc[OpenAI function invocation docs].
// * ZhiPu AI : Refer to the xref:api/chat/functions/zhipuai-chat-functions.adoc[ZhiPu AI function invocation docs].
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ image::spring-ai-generic-model-api.jpg[width=900, align="center"]

== Model

The Model interface provides a generic API for invoking AI models. It is designed to handle the interaction with various types of AI models by abstracting the process of sending requests and receiving responses. The interface uses Java generics to accommodate different types of requests and responses, enhancing flexibility and adaptability across different AI model implementations.
The `Model` interface provides a generic API for invoking AI models. It is designed to handle the interaction with various types of AI models by abstracting the process of sending requests and receiving responses. The interface uses Java generics to accommodate different types of requests and responses, enhancing flexibility and adaptability across different AI model implementations.

The interface is defined below:

Expand All @@ -31,7 +31,7 @@ public interface Model<TReq extends ModelRequest<?>, TRes extends ModelResponse<

== StreamingModel

The StreamingModel interface provides a generic API for invoking an AI model with streaming response. It abstracts the process of sending requests and receiving a streaming response. The interface uses Java generics to accommodate different types of requests and responses, enhancing flexibility and adaptability across different AI model implementations.
The `StreamingModel` interface provides a generic API for invoking an AI model with streaming response. It abstracts the process of sending requests and receiving a streaming response. The interface uses Java generics to accommodate different types of requests and responses, enhancing flexibility and adaptability across different AI model implementations.

[source,java]
----
Expand All @@ -50,7 +50,7 @@ public interface StreamingModel<TReq extends ModelRequest<?>, TResChunk extends

== ModelRequest

Interface representing a request to an AI model. This interface encapsulates the necessary information required to interact with an AI model, including instructions or inputs (of generic type T) and additional model options. It provides a standardized way to send requests to AI models, ensuring that all necessary details are included and can be easily managed.
The `ModelRequest` interface represents a request to an AI model. It encapsulates the necessary information required to interact with an AI model, including instructions or inputs (of generic type `T`) and additional model options. It provides a standardized way to send requests to AI models, ensuring that all necessary details are included and can be easily managed.

[source,java]
----
Expand All @@ -73,7 +73,7 @@ public interface ModelRequest<T> {

== ModelOptions

Interface representing the customizable options for AI model interactions. This marker interface allows for the specification of various settings and parameters that can influence the behavior and output of AI models. It is designed to provide flexibility and adaptability in different AI scenarios, ensuring that the AI models can be fine-tuned according to specific requirements.
The `ModelOptions` interface represents the customizable options for AI model interactions. This marker interface allows for the specification of various settings and parameters that can influence the behavior and output of AI models. It is designed to provide flexibility and adaptability in different AI scenarios, ensuring that the AI models can be fine-tuned according to specific requirements.

[source,java]
----
Expand All @@ -84,7 +84,7 @@ public interface ModelOptions {

== ModelResponse

Interface representing the response received from an AI model. This interface provides methods to access the main result or a list of results generated by the AI model, along with the response metadata. It serves as a standardized way to encapsulate and manage the output from AI models, ensuring easy retrieval and processing of the generated information.
The `ModelResponse` interface represents the response received from an AI model. This interface provides methods to access the main result or a list of results generated by the AI model, along with the response metadata. It serves as a standardized way to encapsulate and manage the output from AI models, ensuring easy retrieval and processing of the generated information.

[source,java]
----
Expand All @@ -111,10 +111,9 @@ public interface ModelResponse<T extends ModelResult<?>> {
}
----


== ModelResult

This interface provides methods to access the main output of the AI model and the metadata associated with this result. It is designed to offer a standardized and comprehensive way to handle and interpret the outputs generated by AI models.
The `ModelResult` interface provides methods to access the main output of the AI model and the metadata associated with this result. It is designed to offer a standardized and comprehensive way to handle and interpret the outputs generated by AI models.

[source,java]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,28 +27,27 @@ The Spring AI Message API provides all necessary abstractions to support multimo

image::spring-ai-message-api.jpg[Spring AI Message API, width=800, align="center"]

The UserMessage’s `content` field is used as primarily text inputs, while the, optional, `media` field allows adding one or more additional content of different modalities such as images, audio and video.
The UserMessage’s `content` field is used primarily for text inputs, while the optional `media` field allows adding one or more additional content of different modalities such as images, audio and video.
The `MimeType` specifies the modality type.
Depending on the used LLMs the Media's data field can be either encoded raw media content or an URI to the content.
Depending on the used LLMs, the `Media` data field can be either the raw media content as a `Resource` object or a `URI` to the content.

NOTE: The media field is currently applicable only for user input messages (e.g., `UserMessage`). It does not hold significance for system messages. The `AssistantMessage`, which includes the LLM response, provides text content only. To generate non-text media outputs, you should utilize one of dedicated, single modality models.*
NOTE: The media field is currently applicable only for user input messages (e.g., `UserMessage`). It does not hold significance for system messages. The `AssistantMessage`, which includes the LLM response, provides text content only. To generate non-text media outputs, you should utilize one of the dedicated, single-modality models.*


For example we can take the following picture (*multimodal.test.png*) as an input and ask the LLM to explain what it sees in the picture.
For example, we can take the following picture (`multimodal.test.png`) as an input and ask the LLM to explain what it sees.

image::multimodal.test.png[Multimodal Test Image, 200, 200, align="left"]

From most of the multimodal LLMs, the Spring AI code would look something like this:
For most of the multimodal LLMs, the Spring AI code would look something like this:

[source,java]
----
byte[] imageData = new ClassPathResource("/multimodal.test.png").getContentAsByteArray();
var imageResource = new ClassPathResource("/multimodal.test.png");
var userMessage = new UserMessage(
"Explain what do you see in this picture?", // content
List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageData))); // media
new Media(MimeTypeUtils.IMAGE_PNG, imageResource)); // media
ChatResponse response = chatModel.call(new Prompt(List.of(userMessage)));
ChatResponse response = chatModel.call(new Prompt(userMessage));
----

or with the fluent xref::api/chatclient.adoc[ChatClient] API:
Expand All @@ -57,21 +56,20 @@ or with the fluent xref::api/chatclient.adoc[ChatClient] API:
----
String response = ChatClient.create(chatModel).prompt()
.user(u -> u.text("Explain what do you see on this picture?")
.media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/multimodal.test.png")))
.media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/multimodal.test.png")))
.call()
.content();
----


and produce a response like:

> This is an image of a fruit bowl with a simple design. The bowl is made of metal with curved wire edges that create an open structure, allowing the fruit to be visible from all angles. Inside the bowl, there are two yellow bananas resting on top of what appears to be a red apple. The bananas are slightly overripe, as indicated by the brown spots on their peels. The bowl has a metal ring at the top, likely to serve as a handle for carrying. The bowl is placed on a flat surface with a neutral-colored background that provides a clear view of the fruit inside.

Latest version of Spring AI provides multimodal support for the following Chat Clients:
Spring AI provides multimodal support for the following chat models:

* xref:api/chat/openai-chat.adoc#_multimodal[Open AI - (GPT-4-Vision and GPT-4o models)]
* xref:api/chat/ollama-chat.adoc#_multimodal[Ollama - (LlaVa and Baklava models)]
* xref:api/chat/vertexai-gemini-chat.adoc#_multimodal[Vertex AI Gemini - (gemini-1.5-pro-001, gemini-1.5-flash-001 models)]
* xref:api/chat/openai-chat.adoc#_multimodal[OpenAI (e.g. GPT-4 and GPT-4o models)]
* xref:api/chat/ollama-chat.adoc#_multimodal[Ollama (e.g. LlaVa and Baklava models)]
* xref:api/chat/vertexai-gemini-chat.adoc#_multimodal[Vertex AI Gemini (e.g. gemini-1.5-pro-001, gemini-1.5-flash-001 models)]
* xref:api/chat/anthropic-chat.adoc#_multimodal[Anthropic Claude 3]
* xref:api/chat/bedrock/bedrock-anthropic3.adoc#_multimodal[AWS Bedrock Anthropic Claude 3]
* xref:api/chat/azure-openai-chat.adoc#_multimodal[Azure Open AI - (GPT-4o models)]
* xref:api/chat/azure-openai-chat.adoc#_multimodal[Azure Open AI (e.g. GPT-4o models)]
Loading

0 comments on commit 37c3450

Please sign in to comment.