Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,9 @@ llvm-*
*.zip

*/tmp

src/config.yaml

src/kparser

commits/test.txt
144 changes: 144 additions & 0 deletions prompt_template/patch2semgrep.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Instruction

You will be provided with a patch in a software repository.
Please analyze the patch and find out the **bug pattern** in this patch.
A **bug pattern** is the root cause of this bug, meaning that programs with this pattern will have a great possibility of having the same bug.
Note that the bug pattern should be specific and accurate, which can be used to identify the buggy code provided in the patch.

Then, please help to write a Semgrep rule to detect the specific bug pattern.
The rule should be written in YAML format and follow Semgrep syntax conventions.

**Please read `Suggestions` section before writing the rule!**

# Examples

{{examples}}

# Target Patch

{{input_patch}}

# Suggestions

1. Semgrep rules use YAML format. Each rule should have an `id`, `pattern`, `languages`, `message`, and `severity`.

2. Use `$VAR` for metavariables to match any variable name, `$FUNC` for function names, `$EXPR` for expressions.

3. Use `pattern-not` to exclude patterns that should not trigger the rule (especially the fixed version).

4. Use `pattern-either` for OR conditions when you need to match multiple variations.

5. Use `pattern-inside` to limit matches to specific contexts (e.g., inside a function or class).

6. Use `pattern-not-inside` to exclude specific contexts where the rule should not match.

7. Use `...` to match zero or more statements/expressions between patterns.

8. The `languages` field should specify the target programming language accurately (e.g., ["c"], ["cpp"], ["javascript"], ["python"]).

9. The `message` should be clear and actionable, explaining:
- What the vulnerability is
- Why it's dangerous
- How to fix it

10. Use appropriate `severity` levels: INFO for style issues, WARNING for potential problems, ERROR for definite bugs.

11. For memory management issues in C/C++, be specific about pointer operations and null checks.

12. Consider edge cases and variations of the pattern that should also be caught.

13. Add relevant metadata like CWE numbers, OWASP categories, and source URLs.

14. Make patterns as specific as possible to minimize false positives while catching variations.

15. Use `metavariable-pattern` to add constraints on variables when needed.

# SEMGREP PATTERN SYNTAX GUIDE

- Use `$VARNAME` to match any expression or variable
- Use `...` to match any sequence of statements
- Use `pattern-inside` to limit matches to specific code blocks
- Use `pattern-not` to exclude specific patterns (like the fixed version)
- Use `pattern-either` to match multiple alternative patterns
- Use `metavariable-pattern` to add constraints on metavariables

# Rule Template

```yaml
rules:
- id: your-rule-id
pattern: |
your pattern here
pattern-not: |
exclusion pattern here (fixed version)
pattern-inside: |
context pattern here
languages: ["target-language"]
message: |
Detailed description of:
- What the vulnerability is
- Why it's dangerous
- How to fix it
severity: ERROR
metadata:
category: security
cwe:
- "CWE-XXX"
owasp:
- "A1:2017-Injection"
technology:
- target-language
references:
- "https://example.com/documentation"
```

# Required Fields

1. **id**: Unique identifier (use lowercase, numbers, hyphens only)
2. **pattern**: Main pattern to match the vulnerable code
3. **languages**: Array of target programming languages
4. **message**: Clear, actionable description of the issue and fix
5. **severity**: One of [ERROR, WARNING, INFO]

# Recommended Fields

- **pattern-not**: Pattern for the fixed version (to avoid false positives)
- **pattern-inside**: Context where the rule should apply
- **pattern-not-inside**: Context where the rule should not apply
- **metadata**: Additional context including CWE, OWASP, references

# Important Guidelines

1. Make patterns specific enough to minimize false positives
2. Include `pattern-not` for the fixed version when possible
3. Add relevant metadata like CWE numbers and OWASP categories
4. Write clear, actionable messages explaining both problem and solution
5. Consider different variations of the vulnerable pattern
6. Test your pattern mentally against both positive cases (should match) and negative cases (should not match)

# Formatting

Please show me the completed Semgrep rule in proper YAML format.

Your response should be a single YAML document like:

```yaml
rules:
- id: rule-name
pattern: |
pattern content
pattern-not: |
fixed version pattern
languages: ["language"]
message: |
Detailed description of the vulnerability and how to fix it.
severity: ERROR
metadata:
category: security
cwe:
- "CWE-XXX"
technology:
- language
```

Remember to adapt the patterns to match the specific vulnerability while keeping them general enough to catch variations of the same issue.
72 changes: 72 additions & 0 deletions prompt_template/pattern2semplan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Instruction

Please organize a elaborate plan to help write a Semgrep rule to detect the **bug pattern**.

You will be provided with a **bug pattern** description and the corresponding patch to help you understand this bug pattern.

**Please read `Suggestions` section before writing the plan!**

# Examples

{{examples}}

# Target Patch

{{input_patch}}

# Target Pattern

{{input_pattern}}

{{failed_plan_examples}}

# Suggestions

1. Semgrep rules use pattern matching syntax. Use `$VAR` for metavariables to match any variable.

2. Use `pattern` for the main pattern to match, `pattern-not` to exclude certain patterns, and `pattern-either` for OR conditions.

3. For function calls, use `$FUNC(...)` to match any function call, or `$FUNC($ARG1, $ARG2)` for specific arguments.

4. Use `...` to match any number of statements or expressions between patterns.

5. The `languages` field should specify the target programming language (e.g., ["c"], ["javascript"], ["python"]).

6. The `message` should be **short** and clear, describing what the rule detects.

7. Use appropriate `severity` levels: INFO, WARNING, ERROR.

8. Consider using `pattern-inside` to limit matches to specific contexts (e.g., inside a function).

9. For pointer dereferences, memory management, and similar C/C++ issues, be specific about the context.

10. Use `metavariable-regex` when you need to match specific naming patterns.

# Formatting

Your plan should contain the following information:

1. Identify the main pattern that needs to be detected (the buggy code pattern).

2. Determine what variations of the pattern should be caught.

3. Specify what legitimate code patterns should be excluded (using `pattern-not`).

4. Choose appropriate metavariables for the rule.

5. Determine the context where the rule should apply (e.g., inside functions, specific file types).

6. Decide on the message and severity level.

You only need to tell me the way to implement this Semgrep rule, extra information like testing or documentation is unnecessary.

**Please try to use the simplest approach and fewer patterns to achieve your goal. But for every step, your response should be as concrete as possible so that I can easily follow your guidance and write a correct Semgrep rule!**

# Plan

Your plan should follow the format of example plans.
Note, your plan should be concise and clear. Do not include unnecessary information or example implementation code snippets.

```
Your plan here
```
151 changes: 151 additions & 0 deletions prompt_template/plan2semgrep.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Instruction

You are proficient in writing Semgrep rules.

Please help me write a Semgrep rule to detect a specific bug pattern.
You can refer to the `Target Bug Pattern` and `Target Patch` sections to help you understand the bug pattern.
Please make sure your rule can detect the bug shown in the buggy code pattern.
Please refer to the `Plan` section to implement the Semgrep rule.

**Please read `Suggestions` section before writing the rule!**

# Examples

{{examples}}

# Target Bug Pattern

{{input_pattern}}

# Target Patch

{{input_patch}}

# Target Plan

{{input_plan}}

# Suggestions

1. Semgrep rules use YAML format. Each rule should have an `id`, `pattern`, `languages`, `message`, and `severity`.

2. Use `$VAR` for metavariables to match any variable name, `$FUNC` for function names, `$EXPR` for expressions.

3. Use `pattern-not` to exclude patterns that should not trigger the rule (especially the fixed version).

4. Use `pattern-either` for OR conditions when you need to match multiple variations.

5. Use `pattern-inside` to limit matches to specific contexts (e.g., inside a function or class).

6. Use `pattern-not-inside` to exclude specific contexts where the rule should not match.

7. Use `...` to match zero or more statements/expressions between patterns.

8. The `languages` field should specify the target programming language accurately (e.g., ["c"], ["cpp"], ["javascript"], ["python"]).

9. The `message` should be clear and actionable, explaining:
- What the vulnerability is
- Why it's dangerous
- How to fix it

10. Use appropriate `severity` levels: INFO for style issues, WARNING for potential problems, ERROR for definite bugs.

11. For memory management issues in C/C++, be specific about pointer operations and null checks.

12. Consider edge cases and variations of the pattern that should also be caught.

13. Add relevant metadata like CWE numbers, OWASP categories, and source URLs.

14. Make patterns as specific as possible to minimize false positives while catching variations.

15. Use `metavariable-pattern` to add constraints on variables when needed.

# SEMGREP PATTERN SYNTAX GUIDE

- Use `$VARNAME` to match any expression or variable
- Use `...` to match any sequence of statements
- Use `pattern-inside` to limit matches to specific code blocks
- Use `pattern-not` to exclude specific patterns (like the fixed version)
- Use `pattern-either` to match multiple alternative patterns
- Use `metavariable-pattern` to add constraints on metavariables

# Rule Template

```yaml
rules:
- id: your-rule-id
pattern: |
your pattern here
pattern-not: |
exclusion pattern here (fixed version)
pattern-inside: |
context pattern here
languages: ["target-language"]
message: |
Detailed description of:
- What the vulnerability is
- Why it's dangerous
- How to fix it
severity: ERROR
metadata:
category: security
cwe:
- "CWE-XXX"
owasp:
- "A1:2017-Injection"
technology:
- target-language
references:
- "https://example.com/documentation"
```

# Required Fields

1. **id**: Unique identifier (use lowercase, numbers, hyphens only)
2. **pattern**: Main pattern to match the vulnerable code
3. **languages**: Array of target programming languages
4. **message**: Clear, actionable description of the issue and fix
5. **severity**: One of [ERROR, WARNING, INFO]

# Recommended Fields

- **pattern-not**: Pattern for the fixed version (to avoid false positives)
- **pattern-inside**: Context where the rule should apply
- **pattern-not-inside**: Context where the rule should not apply
- **metadata**: Additional context including CWE, OWASP, references

# Important Guidelines

1. Make patterns specific enough to minimize false positives
2. Include `pattern-not` for the fixed version when possible
3. Add relevant metadata like CWE numbers and OWASP categories
4. Write clear, actionable messages explaining both problem and solution
5. Consider different variations of the vulnerable pattern
6. Test your pattern mentally against both positive cases (should match) and negative cases (should not match)

# Formatting

Please show me the completed Semgrep rule in proper YAML format.

Your response should be a single YAML document like:

```yaml
rules:
- id: rule-name
pattern: |
pattern content
pattern-not: |
fixed version pattern
languages: ["language"]
message: |
Detailed description of the vulnerability and how to fix it.
severity: ERROR
metadata:
category: security
cwe:
- "CWE-XXX"
technology:
- language
```

Remember to adapt the patterns to match the specific vulnerability while keeping them general enough to catch variations of the same issue.
Loading