Name: Semgrep
Rating: 5 (2056 reviews)
Author: trailofbits

Semgrep

Semgrep is a highly efficient static analysis tool for finding low-complexity bugs and locating specific code patterns. Because of its ease of use, no need to build the code, multiple built-in rules, and convenient creation of custom rules, it is usually the first tool to run on an audited codebase. Furthermore, Semgrep's integration into the CI/CD pipeline makes it a good choice for ensuring code quality.

Key benefits:

Prevents re-entry of known bugs and security vulnerabilities
Enables large-scale code refactoring, such as upgrading deprecated APIs
Easily added to CI/CD pipelines
Custom Semgrep rules mimic the semantics of actual code
Allows for secure scanning without sharing code with third parties
Scanning usually takes minutes (not hours/days)
Easy to use and accessible for both developers and security professionals

When to Use

Use Semgrep when:

Looking for bugs with easy-to-identify patterns
Analyzing single files (intraprocedural analysis)
Detecting systemic bugs (multiple instances across codebase)
Enforcing secure defaults and code standards
Performing rapid initial security assessment
Scanning code without building it first

Consider alternatives when:

Multiple files are required for analysis → Consider Semgrep Pro Engine or CodeQL
Complex flow analysis is needed → Consider CodeQL
Advanced taint tracking across files → Consider CodeQL or Semgrep Pro
Custom in-house framework analysis → May need specialized tooling

Quick Reference

| Task | Command | |------|---------| | Scan with auto-detection | semgrep --config auto | | Scan with specific ruleset | semgrep --config="p/trailofbits" | | Scan with custom rules | semgrep -f /path/to/rules | | Output to SARIF format | semgrep -c p/default --sarif --output scan.sarif | | Test custom rules | semgrep --test | | Disable metrics | semgrep --metrics=off --config=auto | | Filter by severity | semgrep --config=auto --severity ERROR | | Show dataflow traces | semgrep --dataflow-traces -f rule.yml |

Installation

Prerequisites

Python 3.7 or later (for pip installation)
macOS, Linux, or Windows
Homebrew (optional, for macOS/Linux)

Install Steps

Via Python Package Installer:

python3 -m pip install semgrep

Via Homebrew (macOS/Linux):

brew install semgrep

Via Docker:

docker pull returntocorp/semgrep

Keeping Semgrep Updated

# Check current version
semgrep --version

# Update via pip
python3 -m pip install --upgrade semgrep

# Update via Homebrew
brew upgrade semgrep

Verification

semgrep --version

Core Workflow

Step 1: Initial Scan

Start with an auto-configuration scan to evaluate Semgrep's effectiveness:

semgrep --config auto

Important: Auto mode submits metrics online. To disable:

export SEMGREP_SEND_METRICS=off
# OR
semgrep --metrics=off --config auto

Step 2: Select Targeted Rulesets

Use the Semgrep Registry to select rulesets:

# Security-focused rulesets
semgrep --config="p/trailofbits"
semgrep --config="p/cwe-top-25"
semgrep --config="p/owasp-top-ten"

# Language-specific
semgrep --config="p/javascript"

# Multiple rulesets
semgrep --config="p/trailofbits" --config="p/r2c-security-audit"

Step 3: Review and Triage Results

Filter results by severity:

semgrep --config=auto --severity ERROR

Use output formats for easier analysis:

# SARIF for VS Code SARIF Explorer
semgrep -c p/default --sarif --output scan.sarif

# JSON for automation
semgrep -c p/default --json --output scan.json

Step 4: Configure Ignored Files

Create .semgrepignore file to exclude paths:

# Ignore specific files/directories
path/to/ignore/file.ext
path_to_ignore/

# Ignore by extension
*.ext

# Include .gitignore patterns
:include .gitignore

Note: By default, Semgrep skips /tests, /test, and /vendors folders.

How to Customize

Writing Custom Rules

Semgrep rules are YAML files with pattern-matching syntax. Basic structure:

rules:
  - id: rule-id
    languages: [go]
    message: Some message
    severity: ERROR # INFO / WARNING / ERROR
    pattern: test(...)

Running Custom Rules

# Single file
semgrep --config custom_rule.yaml

# Directory of rules
semgrep --config path/to/rules/

Key Syntax Reference

| Syntax/Operator | Description | Example | |-----------------|-------------|---------| | ... | Match zero or more arguments/statements | func(..., arg=value, ...) | | $X, $VAR | Metavariable (captures and tracks values) | $FUNC($INPUT) | | <... ...> | Deep expression operator (nested matching) | if <... user.is_admin() ...>: | | pattern-inside | Match only within context | Pattern inside a loop | | pattern-not | Exclude specific patterns | Negative matching | | pattern-either | Logical OR (any pattern matches) | Multiple alternatives | | patterns | Logical AND (all patterns match) | Combined conditions | | metavariable-pattern | Nested metavariable constraints | Constrain captured values | | metavariable-comparison | Compare metavariable values | $X > 1337 |

Example: Detecting Insecure Request Verification

rules:
  - id: requests-verify-false
    languages: [python]
    message: requests.get with verify=False disables SSL verification
    severity: WARNING
    pattern: requests.get(..., verify=False, ...)

Example: Taint Mode for SQL Injection

rules:
  - id: sql-injection
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
    pattern-sanitizers:
      - pattern: int(...)
    message: Potential SQL injection with unsanitized user input
    languages: [python]
    severity: ERROR

Testing Custom Rules

Create test files with annotations:

# ruleid: requests-verify-false
requests.get(url, verify=False)

# ok: requests-verify-false
requests.get(url, verify=True)

Run tests:

semgrep --test ./path/to/rules/

For autofix testing, create .fixed files (e.g., test.py → test.fixed.py):

semgrep --test
# Output: 1/1: ✓ All tests passed
#         1/1: ✓ All fix tests passed

Configuration

Configuration File

Semgrep doesn't require a central config file. Configuration is done via:

Command-line flags
Environment variables
.semgrepignore for path exclusions

Ignore Patterns

Create .semgrepignore in repository root:

# Ignore directories
tests/
vendor/
node_modules/

# Ignore file types
*.min.js
*.generated.go

# Include .gitignore patterns
:include .gitignore

Suppressing False Positives

Add inline comments to suppress specific findings:

# nosemgrep: rule-id
risky_function()

Best practices:

Specify the exact rule ID (not generic # nosemgrep)
Explain why the rule is disabled
Report false positives to improve rules

Metadata in Custom Rules

Include metadata for better context:

rules:
  - id: example-rule
    metadata:
      cwe: "CWE-89"
      confidence: HIGH
      likelihood: MEDIUM
      impact: HIGH
      subcategory: vuln
    # ... rest of rule

Advanced Usage

Tips and Tricks

| Tip | Why It Helps | |-----|--------------| | Use --time flag | Identifies slow rules and files for optimization | | Limit ellipsis usage | Reduces false positives and improves performance | | Use pattern-inside for context | Creates clearer, more focused findings | | Enable autocomplete | Speeds up command-line workflow | | Use focus-metavariable | Highlights specific code locations in output |

Scanning Non-Standard Extensions

Force language interpretation for unusual file extensions:

semgrep --config=/path/to/config --lang python --scan-unknown-extensions /path/to/file.xyz

Dataflow Tracing

Use --dataflow-traces to understand how values flow to findings:

semgrep --dataflow-traces -f taint_rule.yml test.py

Example output:

Taint comes from:
  test.py
    2┆ data = get_user_input()

This is how taint reaches the sink:
  test.py
    3┆ return output(data)

Polyglot File Scanning

Scan embedded languages (e.g., JavaScript in HTML):

rules:
  - id: eval-in-html
    languages: [html]
    message: eval in JavaScript
    patterns:
      - pattern: <script ...>$Y</script>
      - metavariable-pattern:
          metavariable: $Y
          language: javascript
          patterns:
            - pattern: eval(...)
    severity: WARNING

Constant Propagation

Match instances where metavariables hold specific values:

rules:
  - id: high-value-check
    languages: [python]
    message: $X is higher than 1337
    patterns:
      - pattern: function($X)
      - metavariable-comparison:
          metavariable: $X
          comparison: $X > 1337
    severity: WARNING

Autofix Feature

Add automatic fixes to rules:

rules:
  - id: ioutil-readdir-deprecated
    languages: [golang]
    message: ioutil.ReadDir is deprecated. Use os.ReadDir instead.
    severity: WARNING
    pattern: ioutil.ReadDir($X)
    fix: os.ReadDir($X)

Preview fixes without applying:

semgrep -f rule.yaml --dryrun --autofix

Apply fixes:

semgrep -f rule.yaml --autofix

Performance Optimization

Analyze performance:

semgrep --config=auto --time

Optimize rules:

Use paths to narrow file scope
Minimize ellipsis usage
Use pattern-inside to establish context first
Remove unnecessary metavariables

Managing Third-Party Rules

Use semgrep-rules-manager to collect third-party rules:

pip install semgrep-rules-manager
mkdir -p $HOME/custom-semgrep-rules
semgrep-rules-manager --dir $HOME/custom-semgrep-rules download
semgrep -f $HOME/custom-semgrep-rules

CI/CD Integration

GitHub Actions

Recommended Approach

Full scan on main branch with broad rulesets (scheduled)
Diff-aware scanning for pull requests with focused rules
Block PRs with unresolved findings (once mature)

Example Workflow

name: Semgrep
on:
  pull_request: {}
  push:
    branches: ["master", "main"]
  schedule:
    - cron: '0 0 1 * *' # Monthly

jobs:
  semgrep-schedule:
    if: ((github.event_name == 'schedule' || github.event_name == 'push' || github.event.pull_request.merged == true)
        && github.actor != 'dependabot[bot]')
    name: Semgrep default scan
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep
    steps:
      - name: Checkout main repository
        uses: actions/checkout@v4
      - run: semgrep ci
        env:
          SEMGREP_RULES: p/default

  semgrep-pr:
    if: (github.event_name == 'pull_request' && github.actor != 'dependabot[bot]')
    name: Semgrep PR scan
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep
    steps:
      - uses: actions/checkout@v4
      - run: semgrep ci
        env:
          SEMGREP_RULES: >
            p/cwe-top-25
            p/owasp-top-ten
            p/r2c-security-audit
            p/trailofbits

Adding Custom Rules in CI

Rules in same repository:

env:
  SEMGREP_RULES: p/default custom-semgrep-rules-dir/

Rules in private repository:

env:
  SEMGREP_PRIVATE_RULES_REPO: semgrep-private-rules
steps:
  - name: Checkout main repository
    uses: actions/checkout@v4
  - name: Checkout private custom Semgrep rules
    uses: actions/checkout@v4
    with:
      repository: ${{ github.repository_owner }}/${{ env.SEMGREP_PRIVATE_RULES_REPO }}
      token: ${{ secrets.SEMGREP_RULES_TOKEN }}
      path: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}
  - run: semgrep ci
    env:
      SEMGREP_RULES: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}

Testing Rules in CI

name: Test Semgrep rules

on: [push, pull_request]

jobs:
  semgrep-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with:
          python-version: "3.11"
          cache: "pip"
      - run: python -m pip install -r requirements.txt
      - run: semgrep --test --test-ignore-todo ./path/to/rules/

Common Mistakes

| Mistake | Why It's Wrong | Correct Approach | |---------|----------------|------------------| | Using --config auto on private code | Sends metadata to Semgrep servers | Use --metrics=off or specific rulesets | | Forgetting .semgrepignore | Scans excluded directories like /vendor | Create .semgrepignore file | | Not testing rules with false positives | Rules generate noise | Add # ok: test cases | | Using generic # nosemgrep | Makes code review harder | Use # nosemgrep: rule-id with explanation | | Overusing ellipsis ... | Degrades performance and accuracy | Use specific patterns when possible | | Not including metadata in rules | Makes triage difficult | Add CWE, confidence, impact fields |

Limitations

Single-file analysis: Cannot track data flow across files without Semgrep Pro Engine
No build required: Cannot analyze compiled code or resolve dynamic dependencies
Pattern-based: May miss vulnerabilities requiring deep semantic understanding
Limited taint tracking: Complex taint analysis is still evolving
Custom frameworks: In-house proprietary frameworks may not be well-supported

Related Skills

| Skill | When to Use Together | |-------|---------------------| | codeql | For cross-file taint tracking and complex data flow analysis | | sarif-parsing | For processing Semgrep SARIF output in pipelines |

Resources

Key External Resources

Trail of Bits public Semgrep rules Community-contributed Semgrep rules for security audits, with contribution guidelines and quality standards.

Semgrep Registry Official registry of Semgrep rules, searchable by language, framework, and security category.

Semgrep Playground Interactive online tool for writing and testing Semgrep rules. Use "simple mode" for easy pattern combination.

Learn Semgrep Syntax Comprehensive guide on Semgrep rule-writing fundamentals.

Trail of Bits Blog: How to introduce Semgrep to your organization Seven-step plan for organizational adoption of Semgrep, including pilot testing, evangelization, and CI/CD integration.

Trail of Bits Blog: Discovering goroutine leaks with Semgrep Real-world example of writing custom rules to detect Go-specific issues.

Semgrep

Skill Content

Semgrep

When to Use

Quick Reference

Installation

Prerequisites

Install Steps

Keeping Semgrep Updated

Verification

Core Workflow

Step 1: Initial Scan

Step 2: Select Targeted Rulesets

Step 3: Review and Triage Results

Step 4: Configure Ignored Files

How to Customize

Writing Custom Rules

Running Custom Rules

Key Syntax Reference

Example: Detecting Insecure Request Verification

Example: Taint Mode for SQL Injection

Testing Custom Rules

Configuration

Configuration File

Ignore Patterns

Suppressing False Positives

Metadata in Custom Rules

Advanced Usage

Tips and Tricks

Scanning Non-Standard Extensions

Dataflow Tracing

Polyglot File Scanning

Constant Propagation

Autofix Feature

Performance Optimization

Managing Third-Party Rules

CI/CD Integration

GitHub Actions

Recommended Approach

Example Workflow

Adding Custom Rules in CI

Testing Rules in CI

Common Mistakes

Limitations

Related Skills

Resources

Key External Resources

Video Resources

Installation