Path: blob/master/tutorials-and-examples/example-notebooks/MSTICPy Tour.ipynb
3253 views
MSTICPy v1.0.0 Overview
This notebook is used to demonstrate some of the functionality of MSTICPy. New functionality is being added all the time (and old functionality improved - or, at least, that is the plan) so be sure to check the latest documentation on MSTICPy Readthedocs
Pre-requisites
Data
The first part of the notebook uses live data so must be run using a live Microsoft Sentinel subscription. The latter half uses captive data so can be run without Microsoft Sentinel.
Threat Intelligence and Geo-location provider subscriptions
This notebook uses examples that assume that you have an account with one or more of:
VirusTotal
AlienVault OTX
IBM XForce
Maxmind GeoLite
These providers all have free account tiers.
You can also use Microsoft Sentinel TI as a threat intelligence provider but it is a good idea to have more than one provider available.
For more information on setting up accounts and configuring TI and GeoIP providers see the following instructions:
You may also want to use the MPConfigEdit tool to manage these settings.
Load and initialize MSTICPy and the Notebook environment
Configuration
You may get warnings about missing configuration from init_notebook. MSTICPy uses a lot of external services (in addition to Microsoft Sentinel) - e.g. threat intelligence and IP geo-location providers. Each service typically needs an account (that you need to create) and MSTICPy needs to be able to access that account information in order to use the service. To do that we store this data in a central configuration file - msticpyconfig.yaml.
To learn more about setting this up see these two notebooks:
MSTICPy imports
The init_notebook function imports a number of MSTICPy components and some other common modules such as pandas and numpy.
We can see things that have been imported.
Data Queries
Data queries are the foundation of any analysis or investigation. If you can't query data you have nothing to analyze.
First we need to load and authenticated to the data provider. The example shown is for Microsoft Sentinel but other data providers are supported such as:
Microsoft Defender
Splunk
Microsoft Graph
What queries are available
You can choose from a set of predefined queries (this list is usually up-to-date but the code itself is the real authority since we add new queries frequently)
The easiest way to see the available queries is with the query browser. This also lets you view usage/parameter information for each query.
Command-line alternative
Command-line enthusiasts can use:
Or Jupyter/IPython tab-completion. You can use a trailing "?" to see the syntax and required parameters of the query
Viewing help for a query function from the command line.
Signature: qry_prov.Azure.list_azure_activity_for_account(*args, **kwargs) -> Union[pandas.core.frame.DataFrame, Any]
Call signature: qry_prov.Azure.list_azure_activity_for_account(*args, **kwargs)
Type: partial
String form: functools.partial(<bound method QueryProvider._execute_query of <msticpy.data.data_providers.Quer <...> object at 0x0000021CB07EA348>>, query_path='Azure', query_name='list_azure_activity_for_account')
File: c:\users\ian\anaconda3\envs\condadev\lib\functools.py
Docstring:
Lists Azure Activity for Account
Parameters
----------
account_name: str
The account name to find
add_query_items: str (optional)
Additional query clauses
end: datetime (optional)
Query end time
start: datetime (optional)
Query start time
(default value is: -5)
table: str (optional)
Table name
(default value is: AzureActivity)
Class docstring:
partial(func, *args, **keywords) - new function with partial application
of the given arguments and keywords.
Timespans
Nearly all queries need a time range parameter. You can specify this as a parameter to the query function but you can also the QueryTime widget to set your desired time range and just pass it to the query.
Extend an existing query
Write your own query
Visualize the data in a timeline
Note: if you are running this notebook without a Microsoft Sentinel subscription (or other log data source that you can load into a pandas DataFrame) you can do the following to run the the first two visualizations in the this section:
Run the cell "Retrieve sample data files" (towards the end of the notebook)
run the following Python code
Event Timelines
Process Trees
Viewing Alerts
Enrichment with Threat Intelligence, WhoIs and GeoIP
We're going to use Pivot functions here to allow us to focus on IP-specific operations
Side note - discovering pivot functions
If what you want to do is entity related, there is a good chance that the MSTICPy function will appear as an entity pivot function.
What is an Entity?
An entity is essentially a "noun" in the CyberSec world - e.g. IP Address, host, URL. They are typically things that do things or have things done to them. Entities will always have one or more properties that identify the entity or provide additional context information. For example, an IpAddress entity has its primary Address property and it might also have contextual properties like geo-location or ASN data.
Pivot functions are verbs that performs investigative actions (like data queries) on the entity and return a result. Host, for example, has data queries that retrieve process or logon events logged for that host. IpAddress has functions to lookup its geolocation or query information about the address from Threat intelligence providers.
The easiest way to view the entities, their pivot functions and help associated with each function is to use the Pivot browser.
Build a pipeline to do everything at once
Note: we join the results of each step to the previous. We also add a call to mp_pivot.display() to show intermediate results
Display the TI Results in a browsable format
Investigating Obfuscated commands
Plot GeoLocation of our bad IP address(es)
Using advanced analysis (AKA simple machine learning)
Retrieve sample data files
Time Series Decomposition - Anomaly detection
Detecting anomalous sequences using Markov Chain
The anomalous_sequence MSTICPy package uses Markov Chain analysis to predict the probability
that a particular sequence of events will occur given what has happened in the past.
Here we're applying it to Office activity.
Query the data
Perform Anomalous Sequence analysis on the data
The analysis groups events into sessions (time-bounded and linked by a common account). It then
builds a probability model for the types of command (E.g. "SetMailboxProperty")
and the parameters and parameter values used for that command.
I.e. how likely is it that a given user would be running this sequence of commands in a logon session?
Using this probability model, we can highlight sequences that have an extremely low probability, based
on prior behavior.