Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tech: Refactored action to support multiple Git providers (GitHub, Gi… #60

Merged
merged 30 commits into from
Jan 29, 2025

Conversation

utsmannn
Copy link
Owner

…tLab), enhanced with provider-specific API interactions, streamlined setup, and improved dependency management; migrated to a modular architecture with interfaces for easier extensibility and maintainability, includes robust error handling.

…tLab), enhanced with provider-specific API interactions, streamlined setup, and improved dependency management; migrated to a modular architecture with interfaces for easier extensibility and maintainability, includes robust error handling.
cp -R corivai-main/src .
cp corivai-main/requirements.txt .
shell: bash
# - name: Download Action Code and Requirements

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This section should be replaced with a more efficient and secure method of installing dependencies. Consider using a virtual environment and pip to install the requirements.txt file. Avoid using curl and unzip directly in the action if possible.

@@ -8,6 +8,12 @@ cryptography==44.0.0
Deprecated==1.2.17
distro==1.9.0
gitdb==4.0.12
google-ai-generativelanguage==0.6.15

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding version pinning for all dependencies to ensure reproducibility and prevent unexpected behavior due to updates.

@@ -28,7 +34,9 @@ PyGithub==2.5.0
PyJWT==2.10.1
PyNaCl==1.5.0
pyparsing==3.2.1
python-gitlab==5.6.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Specify a version for python-gitlab to ensure consistent behavior across different environments.

requests==2.32.3
requests-toolbelt==1.0.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Adding a version number is good practice for dependency management. Consider using a more recent version if available and compatible.

@@ -0,0 +1,61 @@
import os
from typing import Dict, List

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The create_review_comment function is not implemented. It should create a review comment on GitHub. Consider adding error handling and input validation.

from src.pr_reviewer import PRReviewer
import logging
from src.git_github import GitGithub

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Good practice to import specific classes instead of using a wildcard import.


logging.basicConfig(level=logging.INFO)
logging.basicConfig(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding a filename to the log format for easier debugging. Example: format='%(asctime)s - %(filename)s - %(lineno)d - %(levelname)s - %(message)s'

logger = logging.getLogger(__name__)

def main():
try:
reviewer = PRReviewer()
reviewer.process_pr()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The error handling is good, clearly indicating missing environment variables and catching ReviewError. Consider adding more specific error handling for other potential exceptions like network issues or GitHub API rate limits.

except Exception as e:
logger.error(f"Failed to complete review: {str(e)}")
exit(1)
logger.error(f"Unexpected error: {str(e)}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Good addition of a catch-all for unexpected errors. Consider logging the exception traceback for more detailed debugging information. Example: logger.exception(f"Unexpected error:")

sys.exit(main())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This is a good way to exit the script with the appropriate return code from main().

from src.exceptions import ReviewError
from src.generator_review_interface import AIReviewGenerator
from src.models import ReviewResponse
from .git_interface import GitInterface

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding a docstring to explain the purpose of importing GitInterface.

def __init__(self):
self.github_token = os.getenv('GITHUB_TOKEN')
self.repo_name = os.getenv('GITHUB_REPOSITORY')
def __init__(self, git_interface: GitInterface):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Good use of type hinting. Consider adding a docstring to explain the purpose of the init method.

@@ -103,16 +73,16 @@ def extract_code_block(self, lines: List[str], start_idx: int, current_file: str

return '\n'.join(code_lines), i, changed_blocks

def create_structured_diff(self, pr: PullRequest, diff_content: str) -> Dict:
def create_structured_diff(self, request, diff_content: str) -> Dict:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The return type annotation Dict is too generic. Consider using a more specific type, such as typing.List[Dict[str, Any]] if the return value is a list of dictionaries.

structured_diff = {"diff": []}
current_file = None
diff_position = 0

comments = pr.get_review_comments()
comments = self.git_interface.get_review_comments(request)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: No issues found.

existing_paths = [comment.path for comment in comments]
existing_changes = [self._normalize_code(comment.diff_hunk) for comment in comments]
existing_line = [comment.position for comment in comments]
existing_paths = [comment['path'] for comment in comments]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding error handling for cases where comments might be missing keys ('path', 'diff_hunk', 'position'). A try-except block would make the code more robust.

@@ -144,13 +114,15 @@ def create_structured_diff(self, pr: PullRequest, diff_content: str) -> Dict:
if block['changes'].strip():
file_path = block['file_path']
changes = block['changes']
line = diff_position + (block['start_line'] - i)
line_num = diff_position + (block['start_line'] - i)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This line calculates the line number. Consider adding a comment explaining the logic behind the calculation, especially the use of diff_position, block['start_line'], and i.


if file_path not in existing_paths and self._normalize_code(changes) not in existing_changes and line not in existing_line:
if (file_path not in existing_paths and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This conditional statement checks for uniqueness. Consider extracting this logic into a separate, well-named function for better readability and maintainability. Also, add a comment explaining the purpose of each check.

structured_diff["diff"].append({
"file_path": file_path,
"changes": changes,
"line": line,
"line": line_num,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The line number is added here. Ensure that line_num is always a valid integer and handle potential exceptions.

@@ -168,41 +140,38 @@ def chunk_diff_data(self, diff_data: Dict[str, List[Dict]]) -> Iterator[Dict[str
chunk = diff_items[i:i + self.chunk_size]
yield {"diff": chunk}

def process_chunk(self, chunk: Dict, pr, current_head_sha: str) -> None:
def process_chunk(self, chunk: Dict, request, current_head_sha: str) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The type hint Dict is too general. Specify the expected keys and types within the chunk dictionary for better clarity and maintainability. Also, consider adding a docstring explaining the purpose of this function and its parameters.

try:
chunk_json = json.dumps(chunk, indent=2)
review_response = self.generator.generate(chunk_json)

github_comments = self.apply_review_comments(review_response, chunk, pr)
comments = self.apply_review_comments(review_response, chunk, request)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This line applies review comments. Add a comment explaining how review_response, chunk, and request are used in this function. Consider adding error handling for potential exceptions during comment application.

comments=github_comments
)
logger.info(f"Posted {len(github_comments)} comments for chunk")
if comments:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This section looks good. Consider adding a check to ensure that comments is not None before iterating.


except Exception as e:
logger.error(f"Error processing chunk: {str(e)}")
return

def validate_code_changes(self, pr, file_path: str, line_content: str, position: int) -> bool:
def validate_code_changes(self, request, file_path: str, line_content: str, position: int) -> bool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The type hints are helpful here. Good job!

try:
existing_reviews = pr.get_review_comments()
comments = self.git_interface.get_review_comments(request)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Ensure that request is properly handled and contains the necessary information for retrieving comments.

if (comment.path == file_path and
comment.position == position and
self._normalize_code(comment.diff_hunk) == normalized_content):
for comment in comments:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This is a good approach for comparing comments. Consider adding logging for debugging purposes.

return False
return True
except Exception as e:
logger.error(f"Error validating code changes: {str(e)}")
return True

def apply_review_comments(self, review_response: ReviewResponse, diff_chunk: Dict, pr) -> List[dict]:
github_comments = []
def apply_review_comments(self, review_response: ReviewResponse, diff_chunk: Dict, request) -> List[dict]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The function name is clear. Consider adding a docstring to explain the function's purpose and parameters.

@@ -214,56 +183,59 @@ def apply_review_comments(self, review_response: ReviewResponse, diff_chunk: Dic
f"Skipping comment for {comment.file_path}: Invalid position {diff_entry['line']}")
continue

if not self.validate_code_changes(pr,
if not self.validate_code_changes(request,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding a more descriptive variable name than 'request'. For example, 'pull_request' or 'pr_data'.

diff_entry["file_path"],
diff_entry["changes"],
diff_entry["line"]):
logger.info(
f"Skipping duplicate comment for {diff_entry['file_path']} at position {diff_entry['line']}")
continue

github_comments.append({
comments.append({

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The structure of this comment is unclear. Please provide more context or detail. What information is being added to the comment?

"path": comment.file_path,
"position": diff_entry["line"],
"body": f"**Finding**: {comment.comment}"
})
break

return github_comments
return comments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Should this function return an empty list instead of None? This would be more consistent with the other functions.


def _normalize_code(self, code: str) -> str:
return '\n'.join(line.strip() for line in code.split('\n') if line.strip())
if not code:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This function could be simplified. The use of str(code).split('\n') is unnecessary. Consider using code.splitlines() instead.


def process_pr(self) -> None:
def process_request(self) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: The function signature indicates that this function returns None, but it doesn't appear to return anything. Should it return a value?

pr_number = self.get_pr_number()
pr = self.repo.get_pull(pr_number)
current_head_sha = pr.head.sha
request_number = self.git_interface.get_request_number()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding error handling for cases where get_request_number(), get_request(), or get_head_sha() might fail. For example, the request might not exist or there might be a network issue.


diff_content = self.get_pr_diff(pr)
diff_content = self.git_interface.get_diff(request)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Add error handling for potential exceptions during diff retrieval. What happens if the diff is unavailable?

if len(diff_content) > self.max_diff_size:
logger.warning(f"Diff size ({len(diff_content)} bytes) exceeds limit")
return

structured_diff = self.create_structured_diff(pr, diff_content)
structured_diff = self.create_structured_diff(request, diff_content)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Ensure that create_structured_diff handles cases where diff_content is empty or invalid. Add logging for debugging purposes.

total_chunks = (len(structured_diff["diff"]) + self.chunk_size - 1) // self.chunk_size

logger.info(f"Processing {len(structured_diff['diff'])} changes in {total_chunks} chunks")

for i, chunk in enumerate(self.chunk_diff_data(structured_diff), 1):
logger.info(f"Processing chunk {i}/{total_chunks}")
self.process_chunk(chunk, pr, current_head_sha)
self.process_chunk(chunk, request, current_head_sha)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Add logging to track the processing of each chunk. This will help in debugging.


if i < total_chunks:
logger.debug(f"Waiting {self.chunk_delay} seconds before next chunk")
time.sleep(self.chunk_delay)

pr.create_issue_comment(
self.git_interface.create_issue_comment(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Error handling is needed here. The create_issue_comment method might fail due to network issues or rate limits. Consider adding retry logic.

f"@corivai-review Last Processed SHA: {current_head_sha}\n"
f"Review completed at: {time.strftime('%Y-%m-%d %H:%M:%S UTC')}"
)
logger.info("Review completed successfully")

except Exception as e:
logger.error(f"Critical error in PR review: {str(e)}")
raise
logger.error(f"Critical error in request review: {str(e)}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding more specific exception handling instead of a generic except Exception as e. This will make debugging easier and improve the robustness of your code. For example, catch requests.exceptions.RequestException for issues with the HTTP request.

Copy link

@corivai-review Last Processed SHA: e83a5c9
Review completed at: 2025-01-29 11:02:28 UTC

…ored AI review generator for provider agnosticism, enhanced main entry points for both GitHub and GitLab, and modularized configurations.
…xecution on open, reopen, and update events, while skipping approvals; also included merge request comments; upgraded dependencies, and initiated main-gitlab.py with OpenAI API endpoint.
… to .gitlab-ci.yml, maintaining functionality.
…PI key, OpenAI URL, max diff size, and model name, ensuring flexibility and security; refined main-gitlab.py to use these variables for configuration, enhancing adaptability.
…r API key, OpenAI URL, max diff size, and model name, enhancing configuration flexibility and security.
…g secure API access using env vars for API key, OpenAI URL, max diff size, and model name.
…TOKEN for API access, ensuring secure authentication, with env vars for API key, OpenAI URL, max diff size, and model name.
…env vars, replacing CI_JOB_TOKEN, maintaining secure API access with other env vars for API key, OpenAI URL, max diff size, and model name.
…and CI_PROJECT_ID, enhancing API access and security, alongside other env vars for API key, OpenAI URL, max diff size, and model name.
…access, alongside project ID and other env vars for API key, OpenAI URL, max diff size, and model name.
…, and reviewer API key echos for debugging and verification, alongside utilizing env vars for secure API access and model configuration.
…N, and REVIEWER_API_KEY using '$' for env var access, ensuring proper debugging and secure API interaction, and streamlined script execution.
…n, and API key with clear labels for improved debugging, maintaining secure API access via env vars.
…s, utilizing 'env' command for comprehensive environment variable display, maintaining secure API access via env vars, and streamlined script execution.
…g 'env', while also showing API key and CI token values with labels for debugging; ensured secure API access via env vars and maintained script execution flow.
… labels using echo, instead of 'env', maintaining secure API access via env vars and streamlined script execution.
… 'REVIEWER_API_KEY' and 'GITLAB_CI_TOKEN', alongside existing env vars, improving debugging visibility and maintaining secure API access via env vars.
…tom variable echos for 'REVIEWER_API_KEY' and 'GITLAB_CI_TOKEN', maintaining secure API access via env vars.
…using env vars for 'GITLAB_CI_TOKEN', 'CI_PROJECT_ID', and 'REVIEWER_API_KEY', enhanced security by not hardcoding, and maintained API access through environment variables, while keeping OpenAI URL, max diff size, and model name configurations.
…ECT_ID', and 'REVIEWER_API_KEY' via env vars for enhanced security, while maintaining OpenAI URL, max diff size, and model name configurations.
…intaining secure API access via env vars, alongside existing configurations.
…SHAs, improving comment placement accuracy and debug visibility.
…ad SHAs, enhancing comment accuracy and debug information; updated main-gitlab.py to use env vars for OpenAI URL, model name, and max diff size.
…sition details; ensures accurate comment placement using base, start, and head SHAs, improving debug visibility; added logger for better traceability.
… normalized code content, position, and file path with existing review comments, adding diff hunk logging for improved accuracy and debug visibility.
…tailed logging and validation with normalized code content, position, and file path, ensuring accuracy and visibility; improved diff hunk validation.
…pagination support, ensuring comprehensive review data collection; improved comment filtering for DiffNote types, enhancing review accuracy.
… logging, and refined PR reviewer's duplicate comment detection by streamlining validation with normalized code content, position, and file path, ensuring accuracy and efficiency.
…tegrating OpenAI for AI responses, streamlined PR comment handling, and leveraging GitGithub for API interactions; enhanced configuration loading from environment variables.
@@ -1,96 +1,102 @@
import json
import os
from typing import List, Dict

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding a more descriptive comment explaining the purpose of these imports.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Explain more


from github import Github
from openai import OpenAI

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Ensure that the openai library is correctly installed and accessible. Add error handling for cases where the library is not found.


# OpenAI configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Using os.getenv is good for security. Consider adding validation to ensure that required environment variables are set. Also, provide default values for optional variables with clear documentation.

return int(pr_ref.split('/')[-2])
except (IndexError, ValueError) as e:
raise ReviewError(f"Invalid PR reference format: {str(e)}")
# Initialize OpenAI client

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Add error handling for the case where the API key is invalid or missing.


# Initialize GitGithub

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Ensure that the GitGithub class is defined and that the token and repo_identifier are correctly handled. Add error handling for invalid tokens or repositories.

pr.create_review_comment_reply(
comment_id=reply_to_id,
body=response
def generate_ai_response(self, messages: List[Dict]) -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Missing closing parenthesis )

)

return response.choices[0].message.content

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Consider adding error handling for cases where parent or comment is not found. Also, ensure that parent.diff_hunk, parent.body, reply['body'], parent.path, and parent.position are always defined before use.


def main():
get_review_comments()
reviewer = CommentProcessor()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: This is a good starting point, but consider adding more robust error handling and logging to improve the reliability of the script.


from openai import OpenAI, BaseModel

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding: Good practice to import CorivaiConfig. Consider adding a comment explaining why this import is necessary.

Copy link

@corivai-review Last Processed SHA: bec7a2a
Review completed at: 2025-01-29 13:59:36 UTC

@utsmannn utsmannn merged commit bec7a2a into main Jan 29, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant