Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Azure
GitHub Repository: Azure/Azure-Sentinel-Notebooks
Path: blob/master/ConfiguringNotebookEnvironment.ipynb
3249 views
Kernel: Python 3.10 - SDK v2

Notebook Environment Setup

This notebook takes you through detailed setup of your settings for Microsoft Sentinel Notebooks and the MSTICPy library. It covers:

  • Setting up your Python environment for notebooks

  • Creating and editing your msticpyconfig.yaml file

  • Understanding and managing you config.json file.

If you are using notebooks in the Microsoft Sentinel/Azure ML environment you can skip the first section "Configuring your Python Environment" entirely.

Warning. Due to rendering issues in Azure Machine Learning, we strongly recommend running this notebook in Jupyter Lab or VSCode.

To do this:
  • Click on the notebook toolbar menu - the ≣ symbol in top left of the notebook
  • Select the Editors option and choose either JupyterLab or VSCode
  • When prompted to stay in AML click the Continue button.
  • The notebook should open in another browser tab
The MSTICPy settings editor uses notebook widgets, which are not fully supported in AML notebooks.

The main part of this notebook involves setting up your msticpyconfig.yaml. While many of these settings are optional, if you do not configure them correctly you'll experience some loss of functionality. For example, using Threat Intelligence providers usually requires an API key. To save you having to type this in every time you look up an IP Address you should put this in a config file.

This section takes you through creating settings for

  • Microsoft Sentinel workspaces

  • Threat Intelligence providers

  • Geo-location providers

  • Other data providers (e.g. Azure APIs)

  • Key Vault

  • Auto-loading options.

You'll typically need the first three of these to use most of the notebooks fully.

Section 3, "The config.json file" can also be ignored if you are happy using msticpyconfig.yaml. It is included here for background.

Contents


Configuring your Python Environment

Python 3.6 or Later

If you are running in Jupyterhub environment such as Azure Notebooks, Python is already installed. When using any of the sample notebooks or copies of them you only need to ensure that the Python 3.8 (or later) kernel is selected.

If you are running the notebooks locally will you need to install Python 3.8 or later. The Ananconda distribution is a good starting point since it comes with many required packages already installed.

Creating a virtual environment

If you are running these notebooks locally, it is a good idea to create a clean Python virtual environment, before installing any of the packages . This will prevent installed packages conflicting with versions that you may need for other applications.

For standard python use the venv command. For Conda use the conda env command. In both cases be sure to activate the environment before running jupyter using venvpath/Scripts/activate or conda activate {my_env_name}.

# Run this cell to view requirements.txt %pfile requirements.txt

Installing in a Conda Environment

Although you can use pip inside a conda environment it is usually better to try to install conda packages whenever possible.

activate {my_env_name} conda config --append channels conda-forge conda install package1 package2

See Managing packages in Anaconda.

For packages that are not available as conda packages use pip from with a Conda prompt/shell to install the remaining packages.

Installing with --user option

If you are using a shared installation of Python (i.e. one installed by the administrator) you will need to add the --user option to your pip install commands. E.g.

pip install pkg_name --user --upgrade

This will avoid permission errors by installing into your user folder.

Note: the use of the --user option is usually not required in a Conda environment since the Python site packages are normally already installed in a per-user folder.

Install Packages from this Notebook

The first time this cell runs for a new Azure ML or Azure Notebooks notebook or other Python environment it will do the following things:

  1. Check the kernel version to ensure that a Python 3.6 or later kernel is running

  2. Check the msticpy version - if this is not installed or the version installed is less than the required version (in REQ_MSTICPY_VER) it will attempt to install a new version (you will be prompted whether you want to do this) The install can take several minutes depending on the versions of packages that you already have installed.

  3. Once msticpy is installed and imported, the init_notebook function is run. This:

    • imports common modules used in the notebook

    • installs additional packages

    • sets some global options

Note: In subsequent runs, this cell should run quickly since you will already have the required packages installed.

Warning: you may see some warnings about incompatibility with certain packages. This should not affect the functionality of this notebook but you may need to upgrade the packages producing the warnings to a more recent version.

from pathlib import Path import os import sys import warnings from IPython.display import display, HTML, Markdown REQ_PYTHON_VER = "3.10" REQ_MSTICPY_VER = "2.12.0" # If not using Azure Notebooks, install msticpy with # %pip install msticpy import msticpy as mp mp.init_notebook( namespace=globals(), );

MSTICPy Configuration File - msticpyconfig.yaml

MSTICPy is a Python package used in most of the Jupyter notebooks on Azure-Sentinel-Notebooks. It provides a lot of functionality specific to threat hunting and investigations, including:

  • Data querying against Microsoft Sentinel tables (also MDE, Splunk and other)

  • Threat Intelligence lookups using multiple TI providers (VirusTotal, AlienVault OTX and others)

  • Common enrichment functions (GeoIP, IoC extraction, WhoIs, etc.)

  • Visualization using event timelines, process trees and Geo-mapping

  • Advanced analysis such as Time Series decomposition, Anomaly detection and clustering.

Note: the configuration actions in this section are an abbreviated version of the MPSettingsEditor notebook
Use this notebook for a fuller guide on how to configure your settings.
Also, see these sections in the MSTICPy documentation:
MSTICPy Package Configuration
MSTICPy Settings Editor

config.json provides some basic configuration for connecting to your Microsoft Sentinel workspace. However, there are many features that require additional configuration information. Some examples are:

  • Threat Intelligence Provider connection information

  • GeoIP connection information

  • Keyvault configuration for storing secrets remotely

  • MDE and Azure API connection information.

  • Connection information for multiple Microsoft Sentinel workspaces.

Settings for these are stored in the msticpyconfig.yaml file. This file is read from the current directory or you can set an environment variable (MSTICPYCONFIG) pointing to its location. Form more information about msticpy configuration see msticpy Package Configuration.

The most commonly-used sections are described below.

Threat Intelligence Provider Setup

For more information on the msticpy Threat Intel lookup class see the documentation here.

Primary providers are used by default. Secondary providers are not run by default but can be invoked by using the providers parameter to lookup_ioc() or lookup_iocs(). Set the Primary config setting to True or False for each provider ID according to how you want to use them. The providers parameter should be a list of strings identifying the provider(s) to use.

  • The provider ID is given by the Provider: setting for each of the TI providers - do not alter this value.

  • Delete or comment out the section for any TI Providers that you do not wish to use.

  • For most providers you will usually need to supply an authorization (API) key and in some cases a user ID for each provider.

  • For the Microsoft Sentinel TI provider, you will need the workspace ID and tenant ID and will need to authenticate in order to access the data (although if you have an existing authenticated connection with the same workspace/tenant, this connection will be re-used).

GeoIP Providers

Like the TI providers these services normally need an API key to access. You can read more about configuration the supported providers here. msticpy GeoIP Providers

Browshot Setup

The functionality to screenshot a URL in msticpy.sectools.domain_utils relies on a service called BrowShot (https://browshot.com/). An API key is required to use this service and it needs to be defined in the msticpyconfig file as well. As this is not a threat intelligence provider it doesn't not fall under the TIProviders section of msticpyconfig but instead sits alone. See the cell below for example configuration.


Display your existing msticpyconfig.yaml

We'll be using some of the MSTICPy configuration tools: MPConfigEdit and MPConfigFile, so we'll import these first

from msticpy import MpConfigFile, MpConfigEdit

Then run MpConfig file to view your current settings.

mpconfig = MpConfigFile() mpconfig.load_default() mpconfig.view_settings()

If you see nothing but a pair of curly braces...

...in the settings view above it means that you probably need to create up a msticpyconfig.yaml

If you know that you have configured a msticpyconfig file, you can search for this file using MpConfigFile. Click on Load file. Once you've done that go to the Setting the path to your msticpyconfig.yaml


Import your Config.json and create a msticpyconfig.yaml [Microsoft Sentinel]

Follow these steps:

  1. Run MpConfigFile

  2. Locate your config.json

    • click Load file button

    • Browse - use the controls to navigate to find config.json

    • Search - set the starting directory to search and open the Search drop-down

    • When you see the file click on it and click Select File button (below the file browser)

    • optionally, click View Settings to confirm that this looks right

  3. Convert to convert to msticpyconfig format

    • click View Settings

  4. Save your msticpyconfig.yaml file

    • type a path into the Current file text box

    • Click on Save file

  5. You can set this file to always load by assigning the path to an environment variable. See Setting the path to your msticpyconfig.yaml


Setting the path to your msticpyconfig.yaml

This is a good point to set up an environment variable so that you can keep a single configuration file in a known location and always load the same settings. (Of course, you're free to use multiple configs if you need to use different settings for each notebook folder)

  • decide on a location for your msticpyconfig.yaml - this could be in "~/.msticpyconfig.yaml" or "%userprofile%/msticpyconfig.yaml"

  • copy the msticpyconfig.yaml file that you just created to this location.

  • set the MSTICPYCONFIG environment variable to point to that location:

Windows

Linux

In your .bashrc (or somewhere else convenient) add:

export MSTICPYCONFIG=~/.msticpyconfig.yaml

Azure ML

In Azure ML, you need to decide whether to store your msticpyconfig.yaml in the AML file store or on the Compute file system. If you have any secret key material in the file, we recommend storing on the Compute instance, since the AML file store is shared storage, whereas the Compute instance is accessible only by the user who created it.

If you are happy to leave the file in the AML file store, you should be set. The init_notebook function run at the start of the notebook will find it there in your root folder and set the MSTICPYCONFIG environment variable to point to it.

Pointing to a path on a compute instance

  1. Open a terminal in AML

  2. Verify your msticpyconfig.yaml is accessible

    Your current directory should be your AML file store home directory (this is mounted in the Compute Linux system) and the prompt will look something like the example below.

    If you created a msticpyconfig.yaml in the previous step, this should be visible if you type ls.

    azureuser@ianhelle-azml7:~/cloudfiles/code/Users/ianhelle$ ls msti* msticpyconfig.yaml
  3. Move the file to your home folder

    mv msticpyconfig.yaml ~
  4. Add an environment variable Because the Jupyter server is started before you connect its process will not inherit and environment variables from you .bashrc You can set it one of two places:

    • The kernel.json file for your Python kernel (there are kernels for both Python 3.6 and Python 3.8

    • Add a Python file nbuser_settings.py to the root of your user folder.

    These options are described in the following sections.

kernel.json

  • Python 3.8 location: /usr/local/share/jupyter/kernels/python38-azureml/kernel.json

  • Python 3.6 location: /usr/local/share/jupyter/kernels/python3-azureml/kernel.json

Make a copy of the file and open the original in an editor (you many need to use sudo to be able to overwrite this file). The file will look something like this

{ "argv": [ "/anaconda/envs/azureml_py38/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}" ], "display_name": "Python 3.8 - AzureML", "language": "python" }

Add the following line after the "language" item.

"env": { "MSTICPYCONFIG": "~/msticpyconfig.yaml" }

Your file should look like this (remember to add a comma at the end of the "language": "python" line

{ "argv": [ "/anaconda/envs/azureml_py38/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}" ], "display_name": "Python 3.8 - AzureML", "language": "python", "env": { "MSTICPYCONFIG": "~/msticpyconfig.yaml" } }

If you use both kernels you will need to edit both files.

nbuser_settings.py

Create this file (you can do this from the AML workspace) in the root of your user folder (i.e. inside the folder with your username) and add the following lines

import os os.environ["MSTICPYCONFIG"] = "~/msticpyconfig.yaml"

This file, if it exists, is imported by the nb_check.check_versions function at the start of the notebook. It will set the environment variable at the start of each notebook before any configuration is read. This is simpler and less intrusive than editing the kernel.json. However, it only works if you run check_versions. If you load a notebook without running this MSTICPy may not be able to find its configuration file.


Verify (or add) Microsoft Sentinel Workspace settings

If you loaded a config.json file into your msticpyconfig.yaml, you should see your workspace displayed when you run the following cell. If not, you can add one or more workspaces here. The Name, WorkspaceId and TenantId are mandatory. The other fields are helpful but not essential.

Use the Help drop-down panel to find more information about adding workspaces and finding the correct values for your workspace.

If this the workspace that you use frequently or all of the time, you may want to set this as the default. This creates a duplicate entry named "Default". This is used when you connect to AzureSentinel without needing to supply a workspace name. You can override this by specifying a workspace name at connect time, which you need to do if you are working with multiple workspaces.

When you've finished, type a file name (usually "msticpyconfig.yaml") into the Conf File text box and click Save File,

You can also try the Validate Settings button. This should show that you have a few missing sections (we'll fill these in later) but should show nothing under the the "Type Validation Results".

mpedit = MpConfigEdit(settings=mpconfig) mpedit.set_tab("AzureSentinel") mpedit

Adding Threat Intel (TI) Providers

You will likely want to do lookups of IP Addresses, URLs and other items to check for any Threat Intelligence reports. To do that you need to add the providers that you want to use. Most TI providers require that you have an account with them and supply an API key or other authentication items when you connect.

Most providers have a free use tier (or in cases like AlienVault OTX) are entirely free. Free tiers for paid providers usually impose a certain number of requests that you can make in a given time period.

For account creation, each provider does this slightly differently. Use the help links in the editor help to find where to go set each of these up.

Assuming that you have done this, we can configure a provider. Be sure to store any authentication keys somewhere safe (and memorable).

We are going to use VirusTotal (VT) as an example TI Provider. For this you will need a VirusTotal API key from the VirusTotal website.
We also support a range of other threat intelligence providers - you can read about this here MSTICPy TIProviders

Taking VirusTotal as our example.

  • Click on the TI Providers tab

  • Select "VirusTotal" from the New prov drop-down list

  • Click Add

This should show you the values that you need to provide:

  • a single item AuthKey (this is usually referred to as an "API Key"

You can paste the key into the Value field and click the Save button.

You can opt to store the VT AuthKey as an environment variable. This is a bit more secure than having it laying around in configuration files. Assuming that you have set you VT key as an environment variable

set VT_KEY=VGhpcyBzaG91bGQgc2hvdyB5b3UgdGhlIHZhbHVlcyB (Windows) export VT_KEY=VGhpcyBzaG91bGQgc2hvdyB5b3UgdGhlIHZhbHVlcyB (Linux/MAC)

Flip the Storage radio button to EnvironmentVar and type the name of the variable (VT_KEY in our example) into the value box.

You can also use Azure Key Vault to store secrets like these but we will need to set up the Key Vault settings before this will work.

Click the Save File button to save your changes.

mpedit.set_tab("TI Providers") mpedit

Adding GeoIP Providers

MSTICPy supports two Geo IP providers - Maxmind GeoIPLite and IP Stack. The main difference between the two is that Maxmind downloads and uses a local database, while IPStack is a purely online solution.

For either you need API keys to either download the free database from MaxMind or access the IPStack online lookup

We'll use GeoIPLite as our example. You can sign up for a free account and API key at https://www.maxmind.com/en/geolite2/signup. You'll need the API for the following steps.

  • Select "GeoIPLite" from the New Prov

  • Click Add

  • Paste your Maxmind key into the Value field

Set the maxmind data folder:

  • This defaults to "~/.msticpy"

    • On Windows this translates to the foldername %USERPROFILE%/.msticpy.

    • On Linux/Mac this translates to the folder .msticpy in your home folder.

  • This is where the downloaded GeopIP database will be stored.

  • Choose another folder name and location if you prefer.

Note: as with the TI providers you can opt to store your key as an environment variable or keep it in Key Vault.

mpedit.set_tab("GeoIP Providers") mpedit

Important Security Note

You might not be too comfortable leaving API keys stored in text files. You can opt to have these settings stored either:

  • as Environment Variables

  • in Azure Key Vault

To see how to do this see these resources


Optional Settings 1 - Azure Data and Microsoft Sentinel APIs

Azure API and Microsoft Sentinel API

To access Azure APIs (such as the Sentinel APIs or Azure resource APIs) you need to be able to use Azure Authentication. The setting is named "AzureCLI" for historical reasons - don't let that confuse you. We currently support two ways of authenticating:

  1. Chained chained authentication (recommended)

  2. With a client app ID and secret

The former can try up to four methods of authentication:

  • Using creds set in environment variables

  • Using creds available in an AzureCLI logon

  • Using the Managed Service Identity (MSI) credentials of the machine you are running the notebook kernel on

  • Interactive browser logon

To use chained authentication methods select the methods to want to use and leave the clientId/tenantiId/clientSecret fields empty.

mpedit.set_tab("Data Providers") mpedit

Optional Settings 2 - Autoload QueryProviders

This section controls which, if any query providers you want to load automatically when you run nbinit.init_notebook.

This can save a lot of time if you are frequently authoring new notebooks. It also allows the right providers to be loaded before other components that might use them such as

  • Pivot functions

  • Notebooklets (more about these in the next section)

There are two types of provider support:

  • Microsoft Sentinel - here you specify both the provider name and the workspace name that you want to connect to.

  • Other providers - for other query providers, just specify the name of the provider.

Available Microsoft Sentinel workspaces are taken from the items you configured in the Microsoft Sentinel tab. Other providers are taken from the list of available provider types in MSTICPy.

There are two options for each of these:

  • connect - if this is True (checked) MSTICPy will try to authenticate to the provider backend immediately after loading. This assumes that you've configured credentials for the provider in your settings. Note: if this is not set it defaults to True.

  • alias - when MSTICPy loads a provider it assigns it to a Python variable name. By default this is "qry_workspace_name" for Microsoft Sentinel providers and "qry_provider_name" for other providers. If you want to use something a bit shorter and easier to type/remember you can add a alias. The variable name created will be "qry_alias"

Note if you lose track of which providers have been loaded by this mechanism they are added to the current_providers attribute of msticpy

import msticpy msticpy.current_providers
mpedit.set_tab("Autoload QueryProvs") mpedit

Optional Settings 3 - Autoloaded Component

This section controls which, if other components you want to load automatically when you run nbinit.init_notebook().

This includes

  • TILookup - the Threat Intel provider library

  • GeopIP - the Geo ip provider that you want to use

  • AzureData - the module used to query details about Azure resources

  • AzureSentinelAPI - the module used to query the Microsoft Sentinel API

  • Notebooklets - loads notebooklets from the msticnb package

  • Pivot - pivot functions

These are loaded in this order, since the Pivot component needs query and other providers loaded in order to find the pivot functions that it will attach to entities. For more information see pivot functions

Some components do not require any parameters (e.g. TILookup and Pivot). Others do support or require additional settings:

GeoIpLookup

You must type the name of the GeoIP provider that you want to use - either "GeoLiteLookup" or "IPStack"

AzureData and AzureSentinelAPI

  • auth_methods - override the default settings for AzureCLI and connect using the selected methods

  • connnect - set to false to load but not connect

Notebooklets

This has a single parameter block AzureSentinel. At minumum you should specify the workspace name. This needs to be in the following format:

workspace:WORKSPACENAME

WORKSPACENAME must be one of the workspaces defined in the Microsoft Sentinel tab.

You can also add addition parameters to send to the notebooklets init function: Specify these as addition key:value pairs, separated by newlines.

workspace:WORKSPACENAME providers=["LocalData","geolitelookup"]

See the msticnb init documentation for more details

mpedit.set_tab("Autoload Components") mpedit

Save your file and add the MSTICPYCONFIG environment variable

Save your file, and, if you haven't yet done so, create an enviroment variable to point to it. See Setting the path to your msticpyconfig.yaml


Validating your msticpyconfig.yaml settings

MpConfigFile includes a validation function that can help you diagnose setup problems.

You can run this interactively or from Python.

The examples below assume that you have set MSTICPYCONFIG to point to you config file. If not, you will need to use the load_from_file() function (or Load File button) to load the file before validating.

mpconfig = MpConfigFile() mpconfig.load_default() mpconfig.validate_settings()

To validate interactively:

mpconfig = MpConfigFile() mpconfig.load_default() mpconfig
---
## The `config.json` file When you start a notebook from Microsoft Sentinel for the first time it will create a `config.json` file in your notebooks folder. This should be populated with your workspace and tenant IDs needed to authenticate to Microsoft Sentinel. If you are using notebooks in a different environment you may need to create a `config.json` or `msticpyconfig.yaml` (see below) to supply this information to your notebook. We recommend creating a `msticpyconfig.yaml` since this can hold a wide variety of settings for your notebook, including multiple Microsoft Sentinel workspace settings. The config.json, in contrast, only holds settings for a single Microsoft Sentinel workspace. For more information see this [msticpy Package Configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)
---

If you need to create or modify your config.json you can run the following cell.

You will need the subscription and workspace IDs for your Microsoft Sentinel Workspace. These can be found here in the Microsoft Sentinel portal as shown below.


Copy the subscription and workspace IDs:

import requests import json import ipywidgets as widgets from pathlib import Path from datetime import datetime config_dict = {} def get_tenant_for_subscription(sub_id): aad_url = ( f"https://management.azure.com/subscriptions/{sub_id}?api-version=2016-01-01" ) resp = requests.get(aad_url) if resp.status_code == 401: hdr_list = resp.headers["WWW-Authenticate"].split(",") hdr_dict = { item.split("=")[0].strip(): item.split("=")[1].strip() for item in hdr_list } return hdr_dict["Bearer authorization_uri"].strip('"').split("/")[3] else: return None def save_config_json(file_path, **kwargs): if Path(file_path).exists(): bk_file = ( str(Path(file_path)) + ".bak" + datetime.now().isoformat(timespec="seconds").replace(":", "-") ) print(f"Exising config found. Saving current config.json to {bk_file}") Path(file_path).rename(bk_file) with open(file_path, "w") as fp: json.dump(kwargs, fp, indent=2) print(f"Settings saved config to {file_path}") def save_config(b): tenant = input_tenant.value if not tenant: tenant = get_tenant_for_subscription(input_wgt["tenant"].value) print(f"TenantID found: {tenant_id}") save_config_json( file_path=input_wgt["path"].value, tenant_id=tenant, subscription_id=input_wgt["sub_id"].value, workspace_id=input_wgt["ws_id"].value, workspace_name=input_wgt["workspace"].value, resource_group=input_wgt["res_grp"].value, ) DEFAULT_CONFIG = "./config.json" WIDGET_DEFAULTS = { "layout": widgets.Layout(width="95%"), "style": {"description_width": "200px"}, } input_wgt = { "path": widgets.Text( description="Path to config.json", value=DEFAULT_CONFIG, **WIDGET_DEFAULTS ), "workspace": widgets.Text( description="Workspace name", placeholder="Workspace name", **WIDGET_DEFAULTS ), "sub_id": widgets.Text( description="Microsoft Sentinel Subscription ID", placeholder="for example, ef28a760-8c61-41d7-8167-5c8e5d91268b", **WIDGET_DEFAULTS, ), "ws_id": widgets.Text( description="Microsoft Sentinel Workspace ID", placeholder="for example, ef28a760-8c61-41d7-8167-5c8e5d91268b", **WIDGET_DEFAULTS, ), "res_grp": widgets.Text( description="Resource group", placeholder="Resource group", **WIDGET_DEFAULTS ), "tenant": widgets.Text( description="TenantId", placeholder="Leave blank to look up", **WIDGET_DEFAULTS ), } if Path(DEFAULT_CONFIG).exists(): with open(DEFAULT_CONFIG, "r") as fp: config_dict = json.load(fp) input_wgt["path"].value = DEFAULT_CONFIG input_wgt["sub_id"].value = config_dict.get("subscription_id", "") input_wgt["ws_id"].value = config_dict.get("workspace_id" "") input_wgt["workspace"].value = config_dict.get("workspace_name" "") input_wgt["res_grp"].value = config_dict.get("resource_group" "") input_wgt["tenant"].value = config_dict.get("tenant_id" "") save_button = widgets.Button(description="Save config.json file") save_button.on_click(save_config) display(widgets.VBox([*(input_wgt.values()), save_button]))
VBox(children=(Text(value='./config.json', description='Path to config.json', layout=Layout(width='95%'), styl…