Path: blob/master/tutorials-and-examples/deprecated-notebooks/A Getting Started Guide For Azure Sentinel Notebooks.ipynb
3253 views
Getting Started with Azure Notebooks and Microsoft Sentinel
Notebook Version: 1.0
Python Version: Python 3.6 (including Python 3.6 - AzureML)
Required Packages:
MSTICPy
Data Sources Required:
Log Analytics - SiginLogs (Optional)
VirusTotal
MaxMind
This notebook takes you through the basics needed to get started with Azure Notebooks and Microsoft Sentinel, and how to perform the basic actions of data acquisition, data enrichment, data analysis, and data visualization. These actions are the building blocks of threat hunting with notebooks and are useful to understand before running more complex notebooks. This notebook only lightly covers each topic but includes 'learn more' sections to provide you with the resource to deep dive into each of these topics.
This notebook assumes that you are running this in an Azure Notebooks environment, however it will work in other Jupyter environments.
Note: This notebooks uses SigninLogs from your Microsoft Sentinel Workspace. If you are not yet collecting SigninLogs configure this connector in the Microsoft Sentinel portal before running this notebook. This notebook also uses the VirusTotal API for data enrichment, for this you will require an API key which can be obtained by signing up for a free VirusTotal community account
------How to use a Jupyter notebook?
To use a Jupyter notebook you need a Jupyter server that will render the notebook and execute the code within it. This can take the form of a local Jupyter installation, or a remotely hosted version such as Azure Notebooks. If you are reading this it is highly likely that you already have a Jupyter server that this notebook is using. You can learn more about installing and running your own Jupyter server here.
Using Azure Notebooks
If you accessed this notebook from Microsoft Sentinel, you are probably using Azure Notebooks to run this notebook. Azure Notebooks runs in the same way that a local Jupyter server with, except with the additional feature of integrated project management and file storage. When you open a notebook in Azure Notebooks the user interface is nearly identical to a standard Jupyter notebook experience.
Before you can start running code in a notebook you need to make sure that it is connected to a Jupyter server and you have the correct type of kernel configured. For this notebook we are going to be using Python 3.6, hopefully Azure Notebooks has already loaded this kernel for you - you can check this by looking at the top left corner of the screen where you should see the currently connected kernel.

If this does not read Python 3.6 you can select the correct kernel by selecting Kernel > Change kernel from the top menu and clicking Python 3.6.
Note: the notebook works with Python 3.6, 3.7 or later. If you are using this notebook in Azure ML or another Jupyter environment you can choose any kernel that supports Python 3.6 or later

Once you have done this you should be ready to move onto a code cell.
Tip: You can identify which cells are code by selecting them and looking at the drop down box at the center of the top menu. It will either read 'Code' (for interactive code cells), 'Markdown' (for Markdown text cells like this one), or RawNBConvert (these are just raw data and not interpreted by Jupyter - they can be used by tools that process notebook files, such as nbconvert to render the data into HTML or LaTeX).
If you click on the cell below you should see this box change to 'Code'.
Learn More:
More details on Azure Notebooks can be found in the Azure Notebooks documentation and the Microsoft Sentinel documentation.
Running code
Once you have selected a code cell you can run it by clicking the run button at the menu bar at the top, or by pressing Ctrl+Enter.
Variables set within a code cell persist between cells meaning you can chain cells together
Learn More :
The Infosec Jupyter Book provides an infosec specific intro to Python.
Real Python is a comprehensive set of Python learnings and tutorials.
Now that you understand the basics we can move onto more complex code.
Setting up the environment
Code cells behave in the same way your code would in other environments, so you need to remember about common coding practices such as variable initialization and library imports. Before we execute more complex code we need to make sure the required packages are installed and libraries imported. At the top of many of the Microsoft Sentinel notebooks you will see large cells that will check kernel versions and then install and import all the libraries we are going to be using in the notebook, make sure you run this before running other cells in the notebook. If you are running notebooks locally or via dedicated compute in Azure Notebooks library installs will persist but this is not the case with Azure Notebooks free tier, so you will need to install each time you run. Even if running in a static environment imports are required for each run so make sure you run this cell regardless.
Configuration
Once we have set up our Jupyter environment with the libraries that we'll use in the notebook, we need to make sure we have some configuration in place. Some of the notebook components need addtional configuration to connect to external services (e.g. API keys to retrieve Threat Intelligence data). This includes configuration for connection to our Microsoft Sentinel workspace, as well as some threat intelligence providers we will use later. The easiest way to handle the configuration for these services is to store them in a msticpyconfig file (msticpyconfig.yaml). More details on msticpyconfig can be found here: https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html
Learn more:
In this notebook we will setup the basic config we need to get started. If you need a more complete walk-through we have a separate notebook to help you: https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb
The Azure-Sentinel-Notebooks GitHub repo contains an template msticpyconfig file ready to be populated. If you have run this notebook before you may have a msticpyconfig file already populated, the cell below allows you to checks if this file. If your config file does not contain details under Microsoft Sentinel > Workspaces, or TIProviders the following cells will populate these for you.
If you want to see an example of what a populated msticpyconfig file should look like a samples is included in the repo as msticpyconfig-sample.yaml.
If you do not have and msticpyconfig file we can populate one for you. Before you do this you will need a few things.
The first is the Workspace ID and Tenant ID of the Microsoft Sentinel Workspace you wish to connect to.
You can get the workspace ID by opening Microsoft Sentinel in the Azure Portal and selecting Settings > Workspace Settings. Your Workspace ID is displayed near the top of this page.
You can get your tenant ID (also referred to organization or directory ID) via Azure Active Directory
We are going to use VirusTotal to enrich our Microsoft Sentinel data. For this you will need a VirusTotal API key, one of these can be obtained for free (as a personnal key) via the VirusTotal website. We are using VirusTotal for this notebook but we also support a range of other threat intelligence providers: https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html
In addition we are going to plot IP address locations on a map, in order to do this we are going to use MaxMind to geolocate IP addresses which requires an API key. You can sign up for a free account and API key at https://www.maxmind.com/en/geolite2/signup.
Once you have these required items run the cell below and you will prompted to enter these elements:
The cell below will now populate a msticpyconfig file with these values:
We can now validate our configuration is correct.
Note you may see warnings for missing providers when running this cell. This is not an issue as we will not be using all providers in this notebook so long as you get thie message "No errors found." you are OK to proceed.
Getting Data
Now that we have configured the details necessary to connect to Microsoft Sentinel we can go ahead and get some data. We will do this with QueryProvider() from MSTICpy. You can use the QueryProvider class to connect to different data sources such as MDATP, the Security Graph API, and the one we will use here, Microsoft Sentinel.
Learn more:
More details on configuring and using QueryProviders can be found in the MSTICpy Documentation.
For now, we are going to set up a QueryProvider for Microsoft Sentinel, pass it the details for our workspace that we just stored in the msticpyconfig file, and connect. The connection process will ask us to authenticate to our Microsoft Sentinel workspace via device authorization with our Azure credentials. You can do this by clicking the device login code button that appears as the output of the next cell, or by navigating to https://microsoft.com/devicelogin and manually entering the code. Note that this authentication persists with the kernel you are using with the notebook, so if you restart the kernel you will need to re-authenticate.
Now that we have connected we can query Microsoft Sentinel for data, but before we do that we need to understand what data is avalaible to query. The QueryProvider object provides a way to get a list of tables as well as tables and table columns:
MSTICpy includes a number of built in queries that you can run.
You can list available queries with .list_queries() and get specific details about a query by calling it with "?" as a parameter
You can then run the query by calling it with the required parameters:
Another way to run queries is to pass a string format of a KQL query to the query provider, this will run the query against the workspace connected to above, and will return the data in a Pandas DataFrame. We will look at working with Pandas in a bit more detail later.
Learn more:
You can learn more about the MSTICpy pre-defined queries in the MSTICpy Documentation
Pandas
Our query results are returned in the form of a Pandas DataFrame. DataFrames are a core component of the Microsoft Sentinel notebooks and of MSTICpy and is used for both input and output formats. Pandas DataFrames are incredibly versitile data structures with a lot of useful features, we will cover a small number of them here and we recommend that you check out the Learn more section to learn more about Pandas features.
Displaying a DataFrame:
The first thing we want to do is display our DataFrame. You can either just run it or explicity display it by calling display(df).
Note if the dataframe variable (
dfin the example above) is the last statement in a code cell, Jupyter will automatically display it without using thedisplay()function. However, if you want to display a DataFrame in the middle of other code in a cell you must use thedisplay()function.
You may not want to display the whole DataFrame and instead display only a selection of items. There are numerous ways to do this and the cell below shows some of the most widely used functions.
We can also choose to select a subsection of our DataFrame based on the contents of the DataFrame:
Tip: the syntax in these examples is using a technique called boolean indexing.
df[<boolean expression>]returns all rows in the dataframe where the boolean expression is True
In the first example we telling pandas to return all rows where the column value of 'TargetUserName' matches 'MSTICAdmin'
Our DataFrame call also be extended to add new columns with additional data if reqired:
Learn more:
There is a lot more you can do with Pandas, the links below provide some useful resources:
Enriching data
Now that we have seen how to query for data, and do some basic manipulation we can look at enriching this data with additional data sources. For this we are going to use an external threat intelligence provider to give us some more details about an IP address we have in our dataset using the MSTICpy TIProvider feature.
Using the Pandas apply() feature we can get results for all the IP addresses in our data set and add the lookup severity score as a new column in our DataFrame for easier reference.
Learn more:
MSTICpy includes further threat intelligence capabilities as well as other data enrichment options. More details on these can be found in the documentation.
Analyzing data
With the data we have collected we may wish to perform some analysis on it in order to better understand it. MSTICpy includes a number of features to help with this, and there are a vast array of other data analysis capabilities available via Python ranging from simple processes to complex ML models. We will start here by keeping it simple and look at how we can decode some Base64 encoded command line strings we have in order to allow us to understand their content.
We can also use MSTICpy to extract Indicators of Compromise (IoCs) from a dataset, this makes it easy to extract and match on a set of IoCs within our data. In the example below we take a US Cybersecurity & Infrastructure Security Agency (CISA) report and extract all domains listed in the report:
Learn more:
There are a wide range of options when it comes to data analysis in notebooks using Python. Here are some useful resources to get you started:
Scikit-Learn is a popular Python ML data analysis library, which has a useful tutorial
Visualizing data
Visualizing data can provide an excellent way to analyse data, identify patterns and anomalies. Python has a wide range of data visualization capabilities each of which have thier own benefits and drawbacks. We will look at some basic capabilities as well as the in-build visualizations in MSTICpy.
Basic Graphs
Pandas and Matplotlib provide the easiest and simplest way to produce simple plots of data:
Bokeh is a powerful visualization library that allows you to create complex, interactive visualizations. MSTICpy includes a number of pre-built visualizations using Bokeh including a timeline feature that can be used to represent events over time. You can interact with the timeline by zooming and panning, using the range selector, as well as hovering over data points to see more details.
MSTICpy also includes a feature to allow you to map locations, this can be particularily useful when looking at the distribution of remote network connections or other events. Below we plot the locations of remote logons observed in our Azure AD data.
Learn more:
The Infosec Jupyterbook includes a section on data visualization.
Conclusion
This notebook has showed you the basics of using notebooks and Microsoft Sentinel for security investigaitons. There are many more things possible using notebooks and it is stronly encouraged to read the material we have referenced in the learn more sections in this notebook. You can also explore the other Microsoft Sentinel notebooks in order to take advantage of the pre-built hunting logic, and understand other analysis techniques that are possible.