Path: blob/master/Machine Learning in Notebooks Examples.ipynb
3249 views
Microsoft Sentinel Notebooks and MSTICPy
Examples of machine learning techniques in Jupyter notebooks
Author: Ian Hellen
Co-Authors: Pete Bryan, Ashwin Patil
Released: 26 Oct 2020
Notebook Setup
Please ensure that MSTICPy is installed before continuing with this notebook.
The nbinit module loads required libraries and optionally installs require packages.
Retrieve sample data files
Time Series Analysis
Query network data
The starting point is ingesting data to analyze.
MSTICpy contains a number of query providers that let you query and return data from several different sources.
Below we are using the LocalData query provider to return data from sample files.
Data is returned in a Pandas DataFrame for easy manipulation and to provide a common interface for other features in MSTICpy.
Here we are getting a summary of our network traffic for the time period we are interested in.
This query fetches the total number of bytes send outbound on the network, grouped by hour.
The input to the Timeseries analysis needs to be in the form of:
a datetime index (in a regular interval like an hour or day)
a scalar value used to determine anomalous values based on periodicity
Using Timeseries decomposition to detect anomalous network activity
Below we use MSTICpy's time series analysis machine learning capabilities to identify anomalies in our network traffic for further investigation.
As well as computing anomalies we visualize the data so that we can more easily see where these anomalies present themselves.
View the summary events marked as anomalous
Extract the anomaly period
We can extract the start and end times of anomalous events and use this more-focused time range to query for unusual activity in this period.
Note: if more than one anomalous period is indicated we can use
msticpy.analysis.timeseries.extract_anomaly_periods()function to isolate time blocks around the anomalous periods.
Time Series Conclusion
We would take these start and end times to zero in on which machines were responsible for the anomalous traffic. Once we find them we can use other techniques to analyze what's going on on these hosts.
Other Applications
You can use the msticpy query function MultiDataSource.get_timeseries_anomalies on most Microsoft Sentinel tables to do this summarization directly.
Three examples are shown below.
Using Clustering
- Example: aggregating similar process patterns to highlight unusual logon sessions
Sifting through thousands of events from a host is tedious in the extreme. We want to find a better way of identifying suspicious clusters of activity.
Query the data and do some initial analysis of the results
Clustering motivation
We want to find atypical commands being run and see if they are associated with the same user or time period
It is tedious to do repeated queries grouping on different attributes of events.
Instead we can specify features that we are interested in grouping around and use
clustering, a form of unsupervised learning, to group the data.
A challenge when using simple grouping is that commands (commandlines) may vary slightly but are essentially repetitions of the same thing (e.g. contain dynamically-generated GUIDs or other temporary data).
We can extract features of the commandline rather than using it in its raw form.
Using clustering we can add arbitrarily many features to group on. Here we are using the following features:
Account name
Process name
Command line structure
Whether the process is a system session or not
Note: A downside to clustering is that text features (usually) need to be transformed from a string
into a numeric representation.
Clustering conclusion
We have narrowed down the task of sifting through > 20,000 processes to a few 10s and have them grouped into sessions ordered by the relative rarity of the process patterns
Other Applications
You can use this technique on other datasets where you want to group by multiple features of the data.
The caveat is that you need to transform any non-numeric data field into a numeric form.
msticpy has a few built-in functions to help with this:
You can use a combination of these and other functions on the same fields to measure different aspects of the data. For example, the following takes a hash of the browser version of the UA (user agent) string and a structural count of the delimiters used.
Use the ua_pref_hash and ua_delims to cluster on identical browser versions that have the same UA string
Detecting anomalous sequences using Markov Chain
The anomalous_sequence MSTICPy package uses Markov Chain analysis to predict the probability
that a particular sequence of events will occur given what has happened in the past.
Here we're applying it to Office activity.
Query the data
Perform Anomalous Sequence analysis on the data
The analysis groups events into sessions (time-bounded and linked by a common account). It then
builds a probability model for the types of command (E.g. "SetMailboxProperty")
and the parameters and parameter values used for that command.
I.e. how likely is it that a given user would be running this sequence of commands in a logon session?
Using this probability model, we can highlight sequences that have an extremely low probability, based
on prior behaviour.
The events are shown in descending order of likelihood (vertically), so the
events at the bottom of the chart are the ones most interesting to us.
Looking at these rare events, we can see potentially suspicious activity changing role memberships.
Print out content of the selected events/commands in more readable format
Note for many events the output will be long
Resources
MSTICpy:
msticpy Github https://github.com/Microsoft/msticpy
msticpy Docs https://msticpy.readthedocs.io/en/latest/
msticpy Release Blog https://medium.com/@msticmed
MSTICpy maintainers:
Ian Hellen @ianhellen
Pete Bryan @MSSPete
Ashwin Patil @ashwinpatil
Microsoft Sentinel Notebooks:
Microsoft Sentinel Github Notebooks https://github.com/Azure/Azure-Sentinel-Notebooks/
(Samples with data in Sample-Notebooks folder)
Microsoft Sentinel Tech Community Blogs https://aka.ms/AzureSentinelBlog