Fun With Semgrep
Introduction to Semgrep
For those of you not already familiar with Semgrep, here’s the TL;DR - Semgrep is basically grep
with AST awareness and some super useful primitives for decomposing concrete implementation errors into abstractable, pipelineable antipatterns. Semgrep’s tutorial documentation does a much better job of presenting this than I will, so if you’re not already familiar, go check it out. It’s open-source! It’s fast, multi-language, and doesn’t require compilation.
Ok, cool, so what’s the big idea?
Semgrep & community ruleset
In addition to being a fast tool with no additional runtime requirements, Semgrep has an active community constantly generating rules for all kinds of stuff, including things like reverse shells. These are rare additions to source - for most organizations working on closed-source projects, people with write access to source have some level of trust. You are much more likely to see reverse shells as post-exploitation of application vulnerabilities like remote file inclusion. We would get much more mileage out of these rules if we could run them against production infrastructure.
Production exploitation
Many internal web-facing services don’t do much (if any) file I/O directly. Filesystem writes are likely to be interesting events, whether expected (user uploads) or unexpected (RFI/etc).
Filesystem Monitoring
Python has a simple module for monitoring filesystem changes: watchdog. Watchdog allows you to define handlers for all CRUD file events, but we’ll really only need write/modify.
Connecting the dots
So, we have a fast, compilation-free tool with an existing ruleset for identifying reverse shells, a fairly high-signal event (file writes) for detecting exploit attempts, and an event handler for filesystem events.
The event handler implementation is straightforward:
def on_change(event):
path = event.src_path
now_epoch = (datetime.now() - datetime(1970, 1, 1)).total_seconds()
result = run_semgrep_reverse_shell_check(path)
if result:
fs_remediate(path,now_epoch)
call_webhook(event,now_epoch)
return
Invoking Semgrep is currently easiest with a subprocess.run
call. For the POC, we simply check for a non-empty result (a shell was detected by Semgrep)
def run_semgrep_reverse_shell_check(path):
found_shell = False
#invoke semgrep
run_result=subprocess.run(["semgrep",path,"--config","p/reverse-shells","--json"],capture_output=True)
try:
if len(run_result.stdout) > 0:
result = json.loads(run_result.stdout)
else:
#no output, semgrep couldn't read
return found_shell
except JSONDecodeError:
return found_shell
if len(result['results']) > 0:
found_shell = True
return found_shell
If Semgrep returns a result, we then chmod
the detected file to 222
- write-only for all groups, allowing us to alter permissions in the future for forensics but preventing the shell from being loaded or executed.
def fs_remediate(path,scan_epoch_timestamp):
print(f"remediating {path}")
if os.name == "posix":
os.chmod(path,0o222)
else:
print("non-posix systems not currently supported")
Finally, we can emit an event via webhook to a SIEM or SOAR platform to notify the security team that an exploitation attempt has occurred:
def call_webhook(event,scan_timestamp):
#not implemented
if webhook is not None:
requests.post(webhook,json={"remediated_path":event.src_path,"scan_time":scan_timestamp})
return
The rest of the implementation is pretty standard Python CLI boilerplate code.
If you are interested in off-label usage of Semgrep like this, check out the project source here.
Future Work
Here’s an unsorted list of things that will make this project better:
- Proper sidecar implementation. There’s a barebones Dockerfile implemented but an example Helm chart/Docker Compose implementation would be beneficial
- Additional Semgrep rules for reverse shells. Currently there are rules for Java/POSIX shells. Rules for every shell on https://www.revshells.com/ would be great.
- Stability & performance