Introduction to Semgrep

For those of you not already familiar with Semgrep, here’s the TL;DR - Semgrep is basically grep with AST awareness and some super useful primitives for decomposing concrete implementation errors into abstractable, pipelineable antipatterns. Semgrep’s tutorial documentation does a much better job of presenting this than I will, so if you’re not already familiar, go check it out. It’s open-source! It’s fast, multi-language, and doesn’t require compilation.

Ok, cool, so what’s the big idea?

Semgrep & community ruleset

In addition to being a fast tool with no additional runtime requirements, Semgrep has an active community constantly generating rules for all kinds of stuff, including things like reverse shells. These are rare additions to source - for most organizations working on closed-source projects, people with write access to source have some level of trust. You are much more likely to see reverse shells as post-exploitation of application vulnerabilities like remote file inclusion. We would get much more mileage out of these rules if we could run them against production infrastructure.

Production exploitation

Many internal web-facing services don’t do much (if any) file I/O directly. Filesystem writes are likely to be interesting events, whether expected (user uploads) or unexpected (RFI/etc).

Filesystem Monitoring

Python has a simple module for monitoring filesystem changes: watchdog. Watchdog allows you to define handlers for all CRUD file events, but we’ll really only need write/modify.

Connecting the dots

So, we have a fast, compilation-free tool with an existing ruleset for identifying reverse shells, a fairly high-signal event (file writes) for detecting exploit attempts, and an event handler for filesystem events.

The event handler implementation is straightforward:

def on_change(event):
    path = event.src_path
    now_epoch = ( - datetime(1970, 1, 1)).total_seconds()
    result = run_semgrep_reverse_shell_check(path)
    if result:

Invoking Semgrep is currently easiest with a call. For the POC, we simply check for a non-empty result (a shell was detected by Semgrep)

def run_semgrep_reverse_shell_check(path):
    found_shell = False
    #invoke semgrep["semgrep",path,"--config","p/reverse-shells","--json"],capture_output=True)
        if len(run_result.stdout) > 0:
            result = json.loads(run_result.stdout)
            #no output, semgrep couldn't read
            return found_shell
    except JSONDecodeError:
        return found_shell
    if len(result['results']) > 0:
        found_shell = True
    return found_shell

If Semgrep returns a result, we then chmod the detected file to 222 - write-only for all groups, allowing us to alter permissions in the future for forensics but preventing the shell from being loaded or executed.

def fs_remediate(path,scan_epoch_timestamp):
    print(f"remediating {path}")
    if == "posix":
        print("non-posix systems not currently supported")

Finally, we can emit an event via webhook to a SIEM or SOAR platform to notify the security team that an exploitation attempt has occurred:

def call_webhook(event,scan_timestamp):
    #not implemented
    if webhook is not None:,json={"remediated_path":event.src_path,"scan_time":scan_timestamp})

The rest of the implementation is pretty standard Python CLI boilerplate code.

If you are interested in off-label usage of Semgrep like this, check out the project source here.

Future Work

Here’s an unsorted list of things that will make this project better:

  • Proper sidecar implementation. There’s a barebones Dockerfile implemented but an example Helm chart/Docker Compose implementation would be beneficial
  • Additional Semgrep rules for reverse shells. Currently there are rules for Java/POSIX shells. Rules for every shell on would be great.
  • Stability & performance