-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tech: Refactored action to support multiple Git providers (GitHub, Gi… #60
Conversation
…tLab), enhanced with provider-specific API interactions, streamlined setup, and improved dependency management; migrated to a modular architecture with interfaces for easier extensibility and maintainability, includes robust error handling.
cp -R corivai-main/src . | ||
cp corivai-main/requirements.txt . | ||
shell: bash | ||
# - name: Download Action Code and Requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This section should be replaced with a more efficient and secure method of installing dependencies. Consider using a virtual environment and pip to install the requirements.txt file. Avoid using curl and unzip directly in the action if possible.
@@ -8,6 +8,12 @@ cryptography==44.0.0 | |||
Deprecated==1.2.17 | |||
distro==1.9.0 | |||
gitdb==4.0.12 | |||
google-ai-generativelanguage==0.6.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding version pinning for all dependencies to ensure reproducibility and prevent unexpected behavior due to updates.
@@ -28,7 +34,9 @@ PyGithub==2.5.0 | |||
PyJWT==2.10.1 | |||
PyNaCl==1.5.0 | |||
pyparsing==3.2.1 | |||
python-gitlab==5.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Specify a version for python-gitlab to ensure consistent behavior across different environments.
requests==2.32.3 | ||
requests-toolbelt==1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Adding a version number is good practice for dependency management. Consider using a more recent version if available and compatible.
@@ -0,0 +1,61 @@ | |||
import os | |||
from typing import Dict, List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The create_review_comment
function is not implemented. It should create a review comment on GitHub. Consider adding error handling and input validation.
from src.pr_reviewer import PRReviewer | ||
import logging | ||
from src.git_github import GitGithub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Good practice to import specific classes instead of using a wildcard import.
|
||
logging.basicConfig(level=logging.INFO) | ||
logging.basicConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding a filename to the log format for easier debugging. Example: format='%(asctime)s - %(filename)s - %(lineno)d - %(levelname)s - %(message)s'
logger = logging.getLogger(__name__) | ||
|
||
def main(): | ||
try: | ||
reviewer = PRReviewer() | ||
reviewer.process_pr() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The error handling is good, clearly indicating missing environment variables and catching ReviewError. Consider adding more specific error handling for other potential exceptions like network issues or GitHub API rate limits.
except Exception as e: | ||
logger.error(f"Failed to complete review: {str(e)}") | ||
exit(1) | ||
logger.error(f"Unexpected error: {str(e)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Good addition of a catch-all for unexpected errors. Consider logging the exception traceback for more detailed debugging information. Example: logger.exception(f"Unexpected error:")
sys.exit(main()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This is a good way to exit the script with the appropriate return code from main().
from src.exceptions import ReviewError | ||
from src.generator_review_interface import AIReviewGenerator | ||
from src.models import ReviewResponse | ||
from .git_interface import GitInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding a docstring to explain the purpose of importing GitInterface.
src/pr_reviewer.py
Outdated
def __init__(self): | ||
self.github_token = os.getenv('GITHUB_TOKEN') | ||
self.repo_name = os.getenv('GITHUB_REPOSITORY') | ||
def __init__(self, git_interface: GitInterface): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Good use of type hinting. Consider adding a docstring to explain the purpose of the init method.
@@ -103,16 +73,16 @@ def extract_code_block(self, lines: List[str], start_idx: int, current_file: str | |||
|
|||
return '\n'.join(code_lines), i, changed_blocks | |||
|
|||
def create_structured_diff(self, pr: PullRequest, diff_content: str) -> Dict: | |||
def create_structured_diff(self, request, diff_content: str) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The return type annotation Dict is too generic. Consider using a more specific type, such as typing.List[Dict[str, Any]] if the return value is a list of dictionaries.
structured_diff = {"diff": []} | ||
current_file = None | ||
diff_position = 0 | ||
|
||
comments = pr.get_review_comments() | ||
comments = self.git_interface.get_review_comments(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: No issues found.
existing_paths = [comment.path for comment in comments] | ||
existing_changes = [self._normalize_code(comment.diff_hunk) for comment in comments] | ||
existing_line = [comment.position for comment in comments] | ||
existing_paths = [comment['path'] for comment in comments] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding error handling for cases where comments might be missing keys ('path', 'diff_hunk', 'position'). A try-except block would make the code more robust.
@@ -144,13 +114,15 @@ def create_structured_diff(self, pr: PullRequest, diff_content: str) -> Dict: | |||
if block['changes'].strip(): | |||
file_path = block['file_path'] | |||
changes = block['changes'] | |||
line = diff_position + (block['start_line'] - i) | |||
line_num = diff_position + (block['start_line'] - i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This line calculates the line number. Consider adding a comment explaining the logic behind the calculation, especially the use of diff_position
, block['start_line']
, and i
.
|
||
if file_path not in existing_paths and self._normalize_code(changes) not in existing_changes and line not in existing_line: | ||
if (file_path not in existing_paths and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This conditional statement checks for uniqueness. Consider extracting this logic into a separate, well-named function for better readability and maintainability. Also, add a comment explaining the purpose of each check.
structured_diff["diff"].append({ | ||
"file_path": file_path, | ||
"changes": changes, | ||
"line": line, | ||
"line": line_num, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The line number is added here. Ensure that line_num
is always a valid integer and handle potential exceptions.
@@ -168,41 +140,38 @@ def chunk_diff_data(self, diff_data: Dict[str, List[Dict]]) -> Iterator[Dict[str | |||
chunk = diff_items[i:i + self.chunk_size] | |||
yield {"diff": chunk} | |||
|
|||
def process_chunk(self, chunk: Dict, pr, current_head_sha: str) -> None: | |||
def process_chunk(self, chunk: Dict, request, current_head_sha: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The type hint Dict
is too general. Specify the expected keys and types within the chunk
dictionary for better clarity and maintainability. Also, consider adding a docstring explaining the purpose of this function and its parameters.
try: | ||
chunk_json = json.dumps(chunk, indent=2) | ||
review_response = self.generator.generate(chunk_json) | ||
|
||
github_comments = self.apply_review_comments(review_response, chunk, pr) | ||
comments = self.apply_review_comments(review_response, chunk, request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This line applies review comments. Add a comment explaining how review_response
, chunk
, and request
are used in this function. Consider adding error handling for potential exceptions during comment application.
comments=github_comments | ||
) | ||
logger.info(f"Posted {len(github_comments)} comments for chunk") | ||
if comments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This section looks good. Consider adding a check to ensure that comments
is not None before iterating.
|
||
except Exception as e: | ||
logger.error(f"Error processing chunk: {str(e)}") | ||
return | ||
|
||
def validate_code_changes(self, pr, file_path: str, line_content: str, position: int) -> bool: | ||
def validate_code_changes(self, request, file_path: str, line_content: str, position: int) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The type hints are helpful here. Good job!
try: | ||
existing_reviews = pr.get_review_comments() | ||
comments = self.git_interface.get_review_comments(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Ensure that request
is properly handled and contains the necessary information for retrieving comments.
if (comment.path == file_path and | ||
comment.position == position and | ||
self._normalize_code(comment.diff_hunk) == normalized_content): | ||
for comment in comments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This is a good approach for comparing comments. Consider adding logging for debugging purposes.
return False | ||
return True | ||
except Exception as e: | ||
logger.error(f"Error validating code changes: {str(e)}") | ||
return True | ||
|
||
def apply_review_comments(self, review_response: ReviewResponse, diff_chunk: Dict, pr) -> List[dict]: | ||
github_comments = [] | ||
def apply_review_comments(self, review_response: ReviewResponse, diff_chunk: Dict, request) -> List[dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The function name is clear. Consider adding a docstring to explain the function's purpose and parameters.
@@ -214,56 +183,59 @@ def apply_review_comments(self, review_response: ReviewResponse, diff_chunk: Dic | |||
f"Skipping comment for {comment.file_path}: Invalid position {diff_entry['line']}") | |||
continue | |||
|
|||
if not self.validate_code_changes(pr, | |||
if not self.validate_code_changes(request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding a more descriptive variable name than 'request'. For example, 'pull_request' or 'pr_data'.
diff_entry["file_path"], | ||
diff_entry["changes"], | ||
diff_entry["line"]): | ||
logger.info( | ||
f"Skipping duplicate comment for {diff_entry['file_path']} at position {diff_entry['line']}") | ||
continue | ||
|
||
github_comments.append({ | ||
comments.append({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The structure of this comment is unclear. Please provide more context or detail. What information is being added to the comment?
"path": comment.file_path, | ||
"position": diff_entry["line"], | ||
"body": f"**Finding**: {comment.comment}" | ||
}) | ||
break | ||
|
||
return github_comments | ||
return comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Should this function return an empty list instead of None? This would be more consistent with the other functions.
|
||
def _normalize_code(self, code: str) -> str: | ||
return '\n'.join(line.strip() for line in code.split('\n') if line.strip()) | ||
if not code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This function could be simplified. The use of str(code).split('\n') is unnecessary. Consider using code.splitlines() instead.
|
||
def process_pr(self) -> None: | ||
def process_request(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: The function signature indicates that this function returns None, but it doesn't appear to return anything. Should it return a value?
pr_number = self.get_pr_number() | ||
pr = self.repo.get_pull(pr_number) | ||
current_head_sha = pr.head.sha | ||
request_number = self.git_interface.get_request_number() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding error handling for cases where get_request_number(), get_request(), or get_head_sha() might fail. For example, the request might not exist or there might be a network issue.
|
||
diff_content = self.get_pr_diff(pr) | ||
diff_content = self.git_interface.get_diff(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Add error handling for potential exceptions during diff retrieval. What happens if the diff is unavailable?
if len(diff_content) > self.max_diff_size: | ||
logger.warning(f"Diff size ({len(diff_content)} bytes) exceeds limit") | ||
return | ||
|
||
structured_diff = self.create_structured_diff(pr, diff_content) | ||
structured_diff = self.create_structured_diff(request, diff_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Ensure that create_structured_diff handles cases where diff_content is empty or invalid. Add logging for debugging purposes.
total_chunks = (len(structured_diff["diff"]) + self.chunk_size - 1) // self.chunk_size | ||
|
||
logger.info(f"Processing {len(structured_diff['diff'])} changes in {total_chunks} chunks") | ||
|
||
for i, chunk in enumerate(self.chunk_diff_data(structured_diff), 1): | ||
logger.info(f"Processing chunk {i}/{total_chunks}") | ||
self.process_chunk(chunk, pr, current_head_sha) | ||
self.process_chunk(chunk, request, current_head_sha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Add logging to track the processing of each chunk. This will help in debugging.
|
||
if i < total_chunks: | ||
logger.debug(f"Waiting {self.chunk_delay} seconds before next chunk") | ||
time.sleep(self.chunk_delay) | ||
|
||
pr.create_issue_comment( | ||
self.git_interface.create_issue_comment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Error handling is needed here. The create_issue_comment method might fail due to network issues or rate limits. Consider adding retry logic.
f"@corivai-review Last Processed SHA: {current_head_sha}\n" | ||
f"Review completed at: {time.strftime('%Y-%m-%d %H:%M:%S UTC')}" | ||
) | ||
logger.info("Review completed successfully") | ||
|
||
except Exception as e: | ||
logger.error(f"Critical error in PR review: {str(e)}") | ||
raise | ||
logger.error(f"Critical error in request review: {str(e)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding more specific exception handling instead of a generic except Exception as e
. This will make debugging easier and improve the robustness of your code. For example, catch requests.exceptions.RequestException
for issues with the HTTP request.
@corivai-review Last Processed SHA: e83a5c9 |
…ored AI review generator for provider agnosticism, enhanced main entry points for both GitHub and GitLab, and modularized configurations.
…xecution on open, reopen, and update events, while skipping approvals; also included merge request comments; upgraded dependencies, and initiated main-gitlab.py with OpenAI API endpoint.
… to .gitlab-ci.yml, maintaining functionality.
…PI key, OpenAI URL, max diff size, and model name, ensuring flexibility and security; refined main-gitlab.py to use these variables for configuration, enhancing adaptability.
…r API key, OpenAI URL, max diff size, and model name, enhancing configuration flexibility and security.
…g secure API access using env vars for API key, OpenAI URL, max diff size, and model name.
…TOKEN for API access, ensuring secure authentication, with env vars for API key, OpenAI URL, max diff size, and model name.
…env vars, replacing CI_JOB_TOKEN, maintaining secure API access with other env vars for API key, OpenAI URL, max diff size, and model name.
…and CI_PROJECT_ID, enhancing API access and security, alongside other env vars for API key, OpenAI URL, max diff size, and model name.
…access, alongside project ID and other env vars for API key, OpenAI URL, max diff size, and model name.
…, and reviewer API key echos for debugging and verification, alongside utilizing env vars for secure API access and model configuration.
…N, and REVIEWER_API_KEY using '$' for env var access, ensuring proper debugging and secure API interaction, and streamlined script execution.
…n, and API key with clear labels for improved debugging, maintaining secure API access via env vars.
…s, utilizing 'env' command for comprehensive environment variable display, maintaining secure API access via env vars, and streamlined script execution.
…g 'env', while also showing API key and CI token values with labels for debugging; ensured secure API access via env vars and maintained script execution flow.
… labels using echo, instead of 'env', maintaining secure API access via env vars and streamlined script execution.
… 'REVIEWER_API_KEY' and 'GITLAB_CI_TOKEN', alongside existing env vars, improving debugging visibility and maintaining secure API access via env vars.
…tom variable echos for 'REVIEWER_API_KEY' and 'GITLAB_CI_TOKEN', maintaining secure API access via env vars.
…using env vars for 'GITLAB_CI_TOKEN', 'CI_PROJECT_ID', and 'REVIEWER_API_KEY', enhanced security by not hardcoding, and maintained API access through environment variables, while keeping OpenAI URL, max diff size, and model name configurations.
…ECT_ID', and 'REVIEWER_API_KEY' via env vars for enhanced security, while maintaining OpenAI URL, max diff size, and model name configurations.
…intaining secure API access via env vars, alongside existing configurations.
…SHAs, improving comment placement accuracy and debug visibility.
…ad SHAs, enhancing comment accuracy and debug information; updated main-gitlab.py to use env vars for OpenAI URL, model name, and max diff size.
…sition details; ensures accurate comment placement using base, start, and head SHAs, improving debug visibility; added logger for better traceability.
… normalized code content, position, and file path with existing review comments, adding diff hunk logging for improved accuracy and debug visibility.
…tailed logging and validation with normalized code content, position, and file path, ensuring accuracy and visibility; improved diff hunk validation.
…pagination support, ensuring comprehensive review data collection; improved comment filtering for DiffNote types, enhancing review accuracy.
… logging, and refined PR reviewer's duplicate comment detection by streamlining validation with normalized code content, position, and file path, ensuring accuracy and efficiency.
…tegrating OpenAI for AI responses, streamlined PR comment handling, and leveraging GitGithub for API interactions; enhanced configuration loading from environment variables.
@@ -1,96 +1,102 @@ | |||
import json | |||
import os | |||
from typing import List, Dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding a more descriptive comment explaining the purpose of these imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Explain more
|
||
from github import Github | ||
from openai import OpenAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Ensure that the openai library is correctly installed and accessible. Add error handling for cases where the library is not found.
|
||
# OpenAI configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Using os.getenv is good for security. Consider adding validation to ensure that required environment variables are set. Also, provide default values for optional variables with clear documentation.
return int(pr_ref.split('/')[-2]) | ||
except (IndexError, ValueError) as e: | ||
raise ReviewError(f"Invalid PR reference format: {str(e)}") | ||
# Initialize OpenAI client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Add error handling for the case where the API key is invalid or missing.
|
||
# Initialize GitGithub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Ensure that the GitGithub class is defined and that the token and repo_identifier are correctly handled. Add error handling for invalid tokens or repositories.
pr.create_review_comment_reply( | ||
comment_id=reply_to_id, | ||
body=response | ||
def generate_ai_response(self, messages: List[Dict]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Missing closing parenthesis )
) | ||
|
||
return response.choices[0].message.content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Consider adding error handling for cases where parent or comment is not found. Also, ensure that parent.diff_hunk, parent.body, reply['body'], parent.path, and parent.position are always defined before use.
|
||
def main(): | ||
get_review_comments() | ||
reviewer = CommentProcessor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: This is a good starting point, but consider adding more robust error handling and logging to improve the reliability of the script.
|
||
from openai import OpenAI, BaseModel | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding: Good practice to import CorivaiConfig. Consider adding a comment explaining why this import is necessary.
@corivai-review Last Processed SHA: bec7a2a |
…tLab), enhanced with provider-specific API interactions, streamlined setup, and improved dependency management; migrated to a modular architecture with interfaces for easier extensibility and maintainability, includes robust error handling.