feat: OpenAI Integration changes

oleander · Feb 8, 2025 · e3f5117 · e3f5117
1 parent af0899e
commit e3f5117
Show file tree

Hide file tree

Showing 4 changed files with 296 additions and 58 deletions.
diff --git a/.gitignore b/.gitignore
@@ -5,3 +5,4 @@ http-cacache/*
 .env.local
 ${env:TMPDIR}
 bin/
+tmp/
diff --git a/resources/prompt.md b/resources/prompt.md
@@ -1,18 +1,40 @@
-You are an AI assistant that generates concise and meaningful git commit messages based on provided diffs. Please adhere to the following guidelines:
+You are an AI assistant that generates concise and precise git commit messages based solely on the provided diffs. Please adhere to the following enhanced guidelines:
 
-- Structure: Begin with a clear, present-tense summary.
-- Content: While you should use the surrounding context to understand the changes, your commit message should ONLY describe the lines marked with + or -.
-- Understanding: Use the context (unmarked lines) to understand the purpose and impact of the changes, but do not mention unchanged code in the commit message.
-- Changes: Only describe what was actually changed (added, removed, or modified).
-- Consistency: Maintain uniformity in tense, punctuation, and capitalization.
-- Accuracy: Ensure the message accurately reflects the changes and their purpose.
-- Present tense, imperative mood. (e.g., "Add x to y" instead of "Added x to y")
-- Max {{max_commit_length}} chars in the output
+- **Structure**: Begin with a clear, present-tense summary of the change in the non-conventional commit format. Use a single-line summary for the change, followed by a blank line. As a best practice, consider including only one bullet point detailing context if essential, but refrain from excessive elaboration.
 
-## Output:
+- **Content**: Commit messages must strictly describe the lines marked with + or - in the diff. Avoid including surrounding context, unmarked lines, or irrelevant details. Explicitly refrain from mentioning implications, reasoning, motivations, or any external context not explicitly reflected in the diff. Make sure to avoid any interpretations or assumptions beyond what is clearly stated.
 
-Your output should be a commit message generated from the input diff and nothing else. While you should use the surrounding context to understand the changes, your message should only describe what was actually modified (+ or - lines).
+- **Changes**: Clearly articulate what was added, removed, or modified based solely on what is visible in the diff. Use phrases such as "Based only on the changes visible in the diff, this commit..." to emphasize an evidence-based approach while outlining changes directly.
 
-## Input:
+- **Consistency**: Ensure uniformity in tense, punctuation, and capitalization throughout the message. Use present tense and imperative form, such as "Add x to y" instead of "Added x to y".
+
+- **Clarity & Brevity**: Craft messages that are clear and easy to understand, succinctly capturing the essence of the changes. Limit the message to a maximum of {{max_commit_length}} characters for the first line, while ensuring enough detail is provided on the primary action taken. Avoid jargon; provide plain definitions for any necessary technical terms.
+
+- **Accuracy & Hallucination Prevention**: Rigorously reflect only the changes visible in the diff. Avoid any speculation or inclusion of content not substantiated by the diff. Restate the necessity for messages to focus exclusively on aspects evident in the diff and to completely avoid extrapolation or assumptions about motivations or implications.
+
+- **Binary Files & Special Cases**: When handling binary files or cases where diff content is not readable:
+  1. NEVER output error messages or apologies in the commit message
+  2. Use the format "Add/Update/Delete binary file <filename>" for binary files
+  3. Include file size in parentheses if available
+  4. If multiple binary files are changed, list them separated by commas
+  5. For unreadable diffs, focus on the file operation (add/modify/delete) without speculating about content
+
+- **Error Prevention**:
+  1. NEVER include phrases like "I'm sorry", "I apologize", or any error messages
+  2. NEVER leave commit messages incomplete or truncated
+  3. If unable to read diff content, default to describing the file operation
+  4. Always ensure the message is a valid git commit message
+  5. When in doubt about content, focus on the file operation type
+
+- **Review Process**: Before finalizing each commit message:
+  1. Verify that the message accurately reflects only the changes in the diff
+  2. Confirm the commit type matches the actual changes
+  3. Check that the message follows the structure and formatting guidelines
+  4. Ensure no external context or assumptions are included
+  5. Validate that the message is clear and understandable to other developers
+  6. Verify no error messages or apologies are included
+  7. Confirm the message describes file operations even if content is unreadable
+
+- **Important**: The output will be used as a git commit message, so it must be a valid git commit message.
 
 INPUT:
diff --git a/src/model.rs b/src/model.rs
@@ -7,61 +7,109 @@ use serde::{Deserialize, Serialize};
 use tiktoken_rs::get_completion_max_tokens;
 use tiktoken_rs::model::get_context_size;
 
-const GPT4: &str = "gpt-4";
-const GPT4O: &str = "gpt-4o";
-const GPT4OMINI: &str = "gpt-4o-mini";
+use crate::profile;
 
+// Model identifiers - using screaming case for constants
+const MODEL_GPT4: &str = "gpt-4";
+const MODEL_GPT4_OPTIMIZED: &str = "gpt-4o";
+const MODEL_GPT4_MINI: &str = "gpt-4o-mini";
+
+/// Represents the available AI models for commit message generation.
+/// Each model has different capabilities and token limits.
 #[derive(Debug, PartialEq, Eq, Hash, Copy, Clone, Serialize, Deserialize, Default)]
 pub enum Model {
+  /// Standard GPT-4 model
   GPT4,
+  /// Optimized GPT-4 model for better performance
   GPT4o,
+  /// Default model - Mini version of optimized GPT-4 for faster processing
   #[default]
   GPT4oMini
 }
 
 impl Model {
+  /// Counts the number of tokens in the given text for the current model.
+  /// This is used to ensure we stay within the model's token limits.
+  ///
+  /// # Arguments
+  /// * `text` - The text to count tokens for
+  ///
+  /// # Returns
+  /// * `Result<usize>` - The number of tokens or an error
   pub fn count_tokens(&self, text: &str) -> Result<usize> {
+    profile!("Count tokens");
+    let model_str: &str = self.into();
     Ok(
       self
         .context_size()
-        .saturating_sub(get_completion_max_tokens(self.into(), text)?)
+        .saturating_sub(get_completion_max_tokens(model_str, text)?)
     )
   }
 
+  /// Gets the maximum context size for the current model.
+  ///
+  /// # Returns
+  /// * `usize` - The maximum number of tokens the model can process
   pub fn context_size(&self) -> usize {
-    get_context_size(self.into())
+    profile!("Get context size");
+    let model_str: &str = self.into();
+    get_context_size(model_str)
   }
 
-  pub(crate) fn truncate(&self, diff: &str, max_tokens: usize) -> Result<String> {
-    self.walk_truncate(diff, max_tokens, usize::MAX)
+  /// Truncates the given text to fit within the specified token limit.
+  ///
+  /// # Arguments
+  /// * `text` - The text to truncate
+  /// * `max_tokens` - The maximum number of tokens allowed
+  ///
+  /// # Returns
+  /// * `Result<String>` - The truncated text or an error
+  pub(crate) fn truncate(&self, text: &str, max_tokens: usize) -> Result<String> {
+    profile!("Truncate text");
+    self.walk_truncate(text, max_tokens, usize::MAX)
   }
 
-  pub(crate) fn walk_truncate(&self, diff: &str, max_tokens: usize, within: usize) -> Result<String> {
-    log::debug!("max_tokens: {}", max_tokens);
-    log::debug!("diff: {}", diff);
-    log::debug!("within: {}", within);
+  /// Recursively truncates text to fit within token limits while maintaining coherence.
+  /// Uses a binary search-like approach to find the optimal truncation point.
+  ///
+  /// # Arguments
+  /// * `text` - The text to truncate
+  /// * `max_tokens` - The maximum number of tokens allowed
+  /// * `within` - The maximum allowed deviation from target token count
+  ///
+  /// # Returns
+  /// * `Result<String>` - The truncated text or an error
+  pub(crate) fn walk_truncate(&self, text: &str, max_tokens: usize, within: usize) -> Result<String> {
+    profile!("Walk truncate iteration");
+    log::debug!("max_tokens: {}, within: {}", max_tokens, within);
+
+    let truncated = {
+      profile!("Split and join text");
+      text
+        .split_whitespace()
+        .take(max_tokens)
+        .collect::<Vec<&str>>()
+        .join(" ")
+    };
 
-    let str = diff
-      .split_whitespace()
-      .take(max_tokens)
-      .collect::<Vec<&str>>()
-      .join(" ");
-    let offset = self.count_tokens(&str)?.saturating_sub(max_tokens);
+    let token_count = self.count_tokens(&truncated)?;
+    let offset = token_count.saturating_sub(max_tokens);
 
     if offset > within || offset == 0 {
-      Ok(str) // TODO: check if this is correct
+      Ok(truncated)
     } else {
-      self.walk_truncate(diff, max_tokens + offset, within)
+      // Recursively adjust token count to get closer to target
+      self.walk_truncate(text, max_tokens + offset, within)
     }
   }
 }
 
 impl From<&Model> for &str {
   fn from(model: &Model) -> Self {
     match model {
-      Model::GPT4o => GPT4O,
-      Model::GPT4 => GPT4,
-      Model::GPT4oMini => GPT4OMINI
+      Model::GPT4o => MODEL_GPT4_OPTIMIZED,
+      Model::GPT4 => MODEL_GPT4,
+      Model::GPT4oMini => MODEL_GPT4_MINI
     }
   }
 }
@@ -71,10 +119,10 @@ impl FromStr for Model {
 
   fn from_str(s: &str) -> Result<Self> {
     match s.trim().to_lowercase().as_str() {
-      GPT4O => Ok(Model::GPT4o),
-      GPT4 => Ok(Model::GPT4),
-      GPT4OMINI => Ok(Model::GPT4oMini),
-      model => bail!("Invalid model: {}", model)
+      MODEL_GPT4_OPTIMIZED => Ok(Model::GPT4o),
+      MODEL_GPT4 => Ok(Model::GPT4),
+      MODEL_GPT4_MINI => Ok(Model::GPT4oMini),
+      model => bail!("Invalid model name: {}", model)
     }
   }
 }
@@ -85,6 +133,7 @@ impl Display for Model {
   }
 }
 
+// Implement conversion from string types to Model with fallback to default
 impl From<&str> for Model {
   fn from(s: &str) -> Self {
     s.parse().unwrap_or_default()
-Original file line number
+Diff line change
@@ Expand Up / @@ -5,3 +5,4 @@ http-cacache/* @@
     .env.local
     ${env:TMPDIR}
     bin/
+    tmp/