Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Azure
GitHub Repository: Azure/Azure-Sentinel-Notebooks
Path: blob/master/tutorials-and-examples/training-notebooks/Training - MSTICPy Training 1221.ipynb
3253 views
Kernel: Python 3

MSTICPy - Microsoft Threat Intelligence Center Jupyter & Python Security Tools

msticpy is a library for InfoSec investigation and hunting in Jupyter Notebooks. It includes functionality to:

  • query log data from multiple sources

  • enrich the data with Threat Intelligence, geolocations and Azure resource data

  • extract Indicators of Activity (IoA) from logs and unpack encoded data

  • perform sophisticated analysis such as anomalous session detection and time series decomposition

  • visualize data using interactive timelines, process trees and multi-dimensional Morph Charts

It also includes some time-saving notebook tools such as widgets to set query time boundaries, select and display items from lists, and configure the notebook environment.

Source Code: https://github.com/microsoft/msticpy Python Package: https://pypi.org/project/msticpy/#:~:text=Microsoft Threat Intelligence Python Security Tools. msticpy is,functionality to%3A query log data from multiple sources Docs: https://msticpy.readthedocs.io/en/latest/

Why use MSTICPy?

Libraries such as MSTICPy include a wide range of functionality that you might want to use in a notebook, and make them avaliable in a easy to access way. This saves you significant time in writing code, identifying how specific APIs work, and coverting data so that it works between functions/services. Whilst there are other libraries that can do some of what MSTICPy does, MSTICPy provides all of these features in one place, with a integrated datamodel and configuration.

Note: This notebook has deliberate errors in it for the purpose of teaching how to troubleshoot them. Executing the notebook as is will fail.

Installing and importing pacakges in Python

To use any library in Python you first need to install the pacakge and import it. There are several ways to do this depending on how you want to access the library, however the simplest and easiest is using pip. Pip is the pacakge installer for Python and makes finding and installing Python pacakges simple. You can use pip to install packages via the command line, or if you are using a notebook, directly in a notebook cell. Azure ML compute come with Pip installed already but if you are running your notebook elsewhere you may need to install pip first.

To do this we need to use %pip followed by install and the pacakge name. e.g.: %pip install requests

Note: `%pip` is whats called a magic function in Jupyter. This tells the notebook to use pip to install the package in the notebooks compute environment.
%pip install requests
%pip install requests==2.2

If you have a package installed but you want to update it to the latest version you can add the --upgrade parameter

%pip install requests --upgrade
Note: Once you have installed a pacakge its a good idea to restart the kernel, this will ensure that when you import the package you will be using the latest version.
Note: During installation of pacakges you may see some warnings related to pacakge dependency, this is due to the fact that some packages have requirements on other pacakges being installed and something these requirements can clash (i.e. package 1 requires package A version 1.1 but pacakge 2 also requires package A but version 1.2). Often these warnings do not cause significant issues so attempt to run the notebook and see if it can execute correctly.

Example error message

Importing

Once a package has been installed you need to import some or all of it.

This is done with the import statement.

Generally there are 2 ways to import things in Python:

  • import <package> - this imports everything in the pacakge

  • from <package> import <item> - this imports a specific item from the package

You can also import pacakages and rename them for ease when calling them later: import <pacakage> as <alias> e.g. import pandas as pd

import pandas as pd
import xyz
%pip list

Some packages do not use the same name for installation and import. You many need to check package documentation to ensure you are improting correctly.

%pip install scikit-learn
import sklearn

Installing and Importing MSTICPy

Now that we have seen the fundamentals of installing and importing lets install and import MSTICPy:

# Install the latest version of MSTICPy %pip install msticpy --upgrade

Don't forget to restart that kernel!

No we could import MSTICPy as a whole with import msticpy however its a big pacakge with a lot of features, so to make it easier we have a function called nbinit that conducts a number of checks to make sure the environment is good, handles key imports and set up for us.

from msticpy.nbtools import nbinit nbinit.init_notebook( namespace=globals() )
Great! we are now ready to get going.

MSTICPy's config file

MSTICPy can handle connections to a variety of data sources and services, including Azure Sentinel.

To make it easier to manage and re-use the configuration and credentials fo these things MSTICPy has its own config file that holds these items - msticpyconfig.yaml

When you launched this notebook from Azure Sentinel it copied a basic configuration file - config.json - to your workspace folder.
You should be able to see this file in the file browser to the left.
This file contains details about your Azure Sentinel workspace but has no configuration settings for other external services that we need.

If you didn't have a msticpyconfig.yaml file in your workspace folder (which is likely if this is your first use of notebooks), the init_notebook function should have created one for you and populated it with the Azure Sentinel workspace data taken from your config.json.

Tip: If you do not see a "msticpyconfig.yaml" file in your user folder, click the refresh button
at the top of the file browser.

We can check this now by opening the settings editor and view the settings.

You should not have to change anything here unless you need to add one or more additional workspaces.

When you have verified that this looks OK. Click Save Settings

from msticpy.config import MpConfigEdit import os mp_conf = "msticpyconfig.yaml" # check if MSTICPYCONFIG is already an env variable mp_env = os.environ.get("MSTICPYCONFIG") mp_conf = mp_env if mp_env and Path(mp_env).is_file() else mp_conf if not Path(mp_conf).is_file(): print( "No msticpyconfig.yaml was found!", "Please check that there is a config.json file in your workspace folder.", "If this is not there, go back to the Azure Sentinel portal and launch", "this notebook from there.", sep="\n" ) else: mpedit = MpConfigEdit(mp_conf) mpedit.set_tab("AzureSentinel") display(mpedit)

We are going to use VirusTotal (VT) as an example of a popular threat intelligence source. To use VirusTotal threat intel lookups you will need a VirusTotal account and API key.

You can sign up for a free account at the VirusTotal getting started page website.

If you are already a VirusTotal user, you can, of course, use your existing key.

Warning If you are using a VT enterprise key we do not recommend storing this in the msticpyconfig.yaml file.
MSTICPy supports storage of secrets in Azure Key Vault. You can read more about this in the MSTICPY docs
For the moment, you can sign up for a free acount, until you can take the time to set up Key Vault storage.

As well as VirusTotal, we also support a range of other threat intelligence providers: https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html

To add the VirusTotal details, run the following cell.

  1. Select "VirusTotal" from the Add prov drop down

  2. Click the Add button

  3. In the left-side Details panel select Text as the Storage option.

  4. Paste the API key in the Value text box.

  5. Click the Update button to confirm your changes.

Your changes are not yet saved to your configuration file. To do this, click on the Save Settings button at the bottom of the dialog.

If you are unclear about what anything in the configuration editor means, use the Help drop-down. This has instructions and links to more detailed documentation.

mpedit.set_tab("TI Providers") mpedit

Our notebooks commonly use IP geo-location information. In order to enable this we are going to set up MaxMind GeoLite2 to provide geolocation lookup services for IP addresses.

GeoLite2 uses a downloaded database which requires an account key to download. You can sign up for a free account and a license key at The Maxmind signup page - https://www.maxmind.com/en/geolite2/signup.

Using IPStack as an alernative to GeoLite2...

For more details see the MSTICPy GeoIP Providers documentation


Once, you have an account, run the following cell to add the Maxmind GeopIP Lite details to your configuration.

The procedure is similar to the one we used for VirusTotal:

  1. Select the "GeoIPLite" provider from the Add prov drop-down

  2. Click Add

  3. Select Text Storage and paste the license (API/Auth) key into the text box

  4. Click Update

  5. Click Save Settings to write your settings to your configuration.

mpedit.set_tab("GeoIP Providers") mpedit

Validate your settings

  • click on the Validate settings button.

You may see some warnings about missing sections but not about the Azure Sentinel, TIProviders or GeoIP Providers settings.

Click on the Close button to hide the validation output.

If you need to make any changes as a result of the Validation, remember to save your changes by clicking the Save File button.

msticpy.settings.refresh_config()

Getting Data From Azure Sentinel

Now that the setup is out the way we want to focus on

!az login

Querying data from Azure Sentinel is handled by MSTICPy's QueryProvider. The first step is to initalize a QueryProvider and tell it we want to use the Azure Sentinel Query provider.

The other thing we want to provide the QueryProvider with is some details of the workspace we want to connect to. We could do this manually, but its much easier to get details from the configuration we set up earlier. We can do this with WorkspaceConfig

from msticpy.nbtools import nbinit nbinit.init_notebook(namespace=globals()) qry_prov=QueryProvider("AzureSentinel") ws_config = WorkspaceConfig(workspace="CyberSecDemo")

What WorkspaceConfig is doing for is is creating the connection string used by the QueryProvider:

ws_config.code_connect_str

Once set up we can tell the QueryProvider to connect which will kick off the authentication process. There are a number of ways that we can handle that authentication.

#qry_prov.connect(ws_config) qry_prov.connect(ws_config, mp_az_auth="cli")

Now that we are connected to Azure Sentinel we can start to look at running some queries to get some data.

MSTICPy comes with a number of built in Azure Sentinel queries to get some common datasets into the Notebook.

You can see a list of the avaliable queries with: .list_queries

qry_prov.list_queries()

However this output only has some use. To make these in-built queries more accesible and findable there is a query browser which makes searching for, and learning about, these queries much easier.

qry_prov.browse_queries()

Now that we have found a query that we want to run we simply pass its name to the QueryProvider and that in turn returns to results of the query in a Pandas DataFrame.

In addition to the stock query we can customize certain elements of the query.

#qry_prov.Azure.list_all_signins_geo() #qry_prov.SecurityAlert.list_alerts('?') qry_prov.SecurityAlert.list_alerts(add_query_items="| take 10") #qry_prov.SecurityAlert.list_alerts(add_query_items="take 10")

We also don't need to use the built-in queries. We can write our own queries and have then executed using .exec_query

query = "SecurityAlert | take 10" #qry_prov.exec_query(query) alert_df = qry_prov.exec_query(query)
alert_df

Working with the data

Data returned by the QueryProvider comes back in a Pandas DataFrame. This provides us with a powerful and flexible way to access our data.

One of the core things we want to do is look at specific rows in our table. Each table has an index that can be used to call a row using .loc, alternatively we can return a row by its position in the table with .iloc

alert_df.loc[1]

We can also choose just to return specific columns by providing a list of them to the DataFrame:

alert_df.iloc[:5][["AlertName", "AlertSeverity", "Description"]]

We can also do things such as search for rows with specific data.

alert_df[alert_df["AlertName"].str.contains("credential theft")]

Pandas also has some features to allow you to visualize the data you have:

alert_df["AlertSeverity"].value_counts().plot(kind='pie')
alert_df["AlertSeverity"].value_counts().plot(kind='bar')

There are many, many more features in Pandas. When starting with MSTICPy its a good idea to spend some time learning about the power of Pandas - https://pandas.pydata.org/docs/

Enriching data using external data sources

One of the powerful elements of Notebooks is you can combine data from Azure Sentinel with data from other sources. One of the most common sources of this data in security is Threat Intelligence (TI) data. MSTICPy has a support for a number of Threat Intelligence data sources including:

  • VirtusTotal

  • GreyNoise

  • AlienVault OTX

  • IBM XForce

  • Azure Sentinel TI data

  • OPR (for PageRank details)

  • ToR ExitNode information.

query = "SigninLogs | sample 100" signin_df = qry_prov.exec_query(query) signin_df.head()

The first step in using these TI sources is to create a TILookup object. This is can then be used to perform lookups.

Lookups can be done against individual items via .lookup_ioc or against multiple items with .lookup_iocs.

ti = TILookup() ti.lookup_iocs(signin_df, obs_col="IPAddress", providers=["GreyNoise"])
ti_hits = ti.lookup_iocs(signin_df, obs_col="IPAddress",providers=["GreyNoise"]) ti_hits[ti_hits["Result"]==True]
signin_df.set_index('IPAddress').join(ti_hits[ti_hits["Result"]==True].set_index('Ioc'), rsuffix="_", how="inner")[["TimeGenerated", "UserPrincipalName"]]
vt_df = ti.lookup_iocs(signin_df["IPAddress"].unique()[:4], providers=["VirusTotal"]) vt_df
ti.browse_results(vt_df)
ti.browse_results(ti.result_to_df(ti.lookup_ioc("87.97.178.92")))

Azure API access

MSTICPy also has integration with a range of Azure APIs that can be used to retrieve additional informaiton or perform actions.

from msticpy.data.azure_sentinel import AzureSentinel azs = AzureSentinel() azs.connect()
subs = azs.get_subscriptions() subs.head()
azs.get_subscription_info(subs.iloc[0]["Subscription ID"])
azs.get_incident(incident_id = "7a4f5e0e-c202-4298-8cb6-e1278500fbc7", sub_id = "d1d8779d-38d7-4f06-91db-9cbc8de0176f", res_grp= "soc", ws_name="cybersecuritysoc")

Visualizations with MSTICPy

The ability to create complex, interactive visualizations is one of the key benefits of Notebooks. Creating these visulizations from scratch can be quite complex and involve a lot of code.

To make the process easier MSTICPy contains a number of common visualization that can quickly and easily be called with minimal code.

Timelines

Understanding when events occured and in what order is key component of many security investigations. MSTICPy has the ability to plot various types of timelines.

user_df = qry_prov.Azure.list_aad_signins_for_account(account_name="[email protected]") #timeline.display_timeline(user_df) timeline.display_timeline(user_df, source_columns=["UserPrincipalName", "ResultType"])
user_df.columns
timeline.display_timeline(user_df, source_columns=["UserPrincipalName", "ResultDescription"])
ref_time = user_df["TimeGenerated"].iloc[5] timeline.display_timeline(user_df, source_columns=["UserPrincipalName", "ResultDescription"], group_by="ResultType", ref_time=ref_time)
alert_df = qry_prov.SecurityAlert.list_alerts(add_query_items="| take 10") alert_df
timeline_duration.display_timeline_duration(alert_df, group_by="AlertName", time_column="StartTimeUtc", end_time_column="EndTimeUtc")
#alert_df.mp_plot.timeline() alert_df.mp_plot.timeline(group_by="Severity", source_columns=["AlertName", "TimeGenerated"])

MSTICPY also includes a number of interactive widgets that make it easier for users to interact with notebooks.

network_vendor_data_q = "CommonSecurityLog | summarize by DeviceVendor" network_vendor_data = qry_prov.exec_query(network_vendor_data_q) network_selector = nbwidgets.SelectItem( item_list=network_vendor_data["DeviceVendor"].to_list(), description='Select an vendor', action=print, auto_display=True );
network_data_q = f"""CommonSecurityLog | where DeviceVendor == '{network_selector.value}' | take 50""" network_data = qry_prov.exec_query(network_data_q) network_data.head()

The Matrix Plot graph in MSTICPy allows you to plot the interactions between two elements in your data.

network_data.mp_plot.matrix(x="SourceIP", y="DestinationIP", title="IP Interaction")
q_times = nbwidgets.QueryTime(units='day', max_before=20, before=5, max_after=1) q_times.display()
print(123)
security_alerts = qry_prov.SecurityAlert.list_alerts(add_query_items="| take 10") alert_select = nbwidgets.SelectAlert(alerts=security_alerts, action=nbdisplay.display_alert) display(Markdown('### Alert selector with action=DisplayAlert')) display(HTML("<b> Alert selector with action=DisplayAlert </b>")) alert_select.display()

What to do next:

Run the Getting Started Notebook in Azure Sentinel - This will help you get your config set up

Try the MSTICPy Lab – https://aka.ms/msticpy-demo

Go and read the docs – https://msticpy.readthedocs.io/en/latest/GettingStarted.html

Learn more about Pandas - https://pandas.pydata.org/docs/

Check out our other notebooks for ideas! - https://github.com/Azure/Azure-Sentinel-Notebooks