Path: blob/master/A Tour of Cybersec notebook features.ipynb
3249 views
An introduction to Cybersec notebook features
Contents
Introduction
Setting up the notebook environment
Querying data from Microsoft Sentinel
Visualizing data
Enriching data
Analyzing data
Using Pivot functions
Appendices
Additional resources
A brief introduction to pandas DataFrames
Introduction
This notebook takes you through some of the features of Microsoft Sentinel Notebooks and MSTICPy.
If you are new to notebooks we strongly recommend starting with the: A Getting Started Guide For Microsoft Sentinel ML notebooks.
After you've finished running this notebook, we also recommend:
Configuring your environment - this covers all of the configuration options for accessing external cybersec resources
Each topic includes 'learn more' sections to provide you with the resource to deep dive into each of these topics. We encourage you to work through the notebook from start to finish.
- This notebook assumes that you are running this in an Azure Notebooks environment
- This notebooks uses SigninLogs from your Microsoft Sentinel Workspace. If you are not yet collecting SigninLogs configure this connector in the Microsoft Sentinel portal before running this notebook.
- This notebook uses the following components and assumes that you have configuration
set up for them as described in the **A Getting Started Guide For Microsoft Sentinel ML** notebook:
- The VirusTotal Threat intelligence provider
- The Maxmind GeoLite2 geolocation provider.
Note: Please run the the code cells in sequence. Skipping cells will results in errors.
Setting up the notebook environment
MSTICPy initialization
This cell installs/updates and initializes the MSTICPy package. It should complete without errors.
If you see errors or warnings about missing configuration, please return to the A Getting Started Guide For Microsoft Sentinel ML notebook or the Configuring your environment to correct this.
Querying Data from Microsoft Sentinel
We will use MSTICPy's QueryProvider() class from MSTICPy to query data.
Some of the next section is a review of material contained in the A Getting Started Guide For Microsoft Sentinel ML notebook.
The QueryProvider class
The query provider class has one main function:
querying data from a data source to make it available to view and analyze in the notebook.
Query results are always returned as pandas DataFrames. If you are new to using pandas look at the Introduction to Pandas section at the end of this notebook.
Learn more:
More details on configuring and using QueryProviders can be found in the MSTICPy Documentation.
Choose whether to use demonstration or live Microsoft Sentinel
You can use this notebook with either live data queried from Microsoft Sentinel or with sample data downloaded from the Azure-Sentinel-Notebooks GitHub.
Run the following cell and use the option buttons to select which of these you want to use. The option buttons use a timeout. After 15 seconds the default of "Demo data" will be automatically selected.
Doing this will re-initialize the data providers correctly.
Most of the code in the cell below handles download of demo data.
1. Demo is still downloaded even if chose Microsoft Sentinel (although this is
cached after the first download). The demo data
is used as a backup if the queries to the Microsoft Sentinel workspace return
no data.
2. If you see a warning "Runtime dependency of PyGObject is missing" when loading the
Microsoft Sentinel driver please see the FAQ section at the end of the
A Getting Started Guide For Microsoft Sentinel ML Notebooks notebook.
Microsoft Sentinel data schema
Now that we have connected we can query Microsoft Sentinel for data.
Before we do that there are a couple of things that help us understand what data is available to query.
The AzureSentinel QueryProvider has a "schema_tables" property that lets us get a list of tables as well the schema (column names and data types) for each table.
After that we'll look at the queries available.
Note: For local data this will just appear as a list of files.
MSTICPy Query browser
MSTICPy includes a number of built in queries. Most require additional parameters such as the time range and often an identifying parameter such as the host name, account name or IP address that you are querying for.
You also can list available queries from Python code with:
Get specific details about a query by calling it with "?" as a parameter:
Query browser
The query browser combines both of these functions in a scrollable and filterable list.
Most queries require time parameters!
Datetime strings are painful to type in and keep track of.
Fortunately MSTICPy has an easier way to specify time parameters for queries:
you can use the built-in
query_timewidget to set the default time range for queriesalternatively, you can use the MSTICPy
nbwidgets.QueryTimeclass to set a custom
time range and pass it as a parameter.
Example of using standalone nbwidgets.QueryTime instance
Customizable queries
Most built-in queries support the "add_query_items" parameter. You can use this to append additional filters or other operations to the built-in queries,
1. For local data this query is emulated.
2. If using Microsoft Sentinel and you have no alerts for this period, no data will display.
Try extending the time range from the default of 2 to a larger number of days
in the code below.
E.g.
start=datetime.utcnow() - timedelta(20),
Custom queries
Another way to run queries is to pass a full KQL query string to the query provider.
This will run the query against the workspace connected to above, and will return the data in a Pandas DataFrame. We will look at working with Pandas in a bit more detail later.
Note: exec_query is not supported for local data.
Learn more:
You can learn more about the MSTICpy pre-defined queries in the MSTICPy Documentation
Visualizing data
1 - Using pandas and matplotlib
Visualizing data can provide an excellent way to analyze data, identify patterns and anomalies.
Python has a wide range of data visualization packages each of which have their own benefits and drawbacks. We will look at some basic capabilities as well as one of the visualizations in MSTICPy.
Basic Graphs
Pandas and Matplotlib provide the easiest and simplest way to produce simple plots of data:
2 MSTICPy Event Timeline
Much like the built-in pandas "plot" function, MSTICPy adds an Event timelines plotting function to DataFrames.
Using the mp_timeline.plot() method on a DataFrame you can visualize the relative timing of events much more easily that from a data table.
Unlike the previous Matplotlib charts, the Event Timeline uses Bokeh plots making it interactive.
Using the toolbar buttons (to the left of the chart)
Pan from left to right (select the arrows) by dragging with the mouse
Zoom in on a selected area (magnifier tool) and draw a selection box with the mouse
Zoom with the mouse wheel (mouse + magnifier tool)
Display hide details about the individual events as you hover the mouse cursor over them
Note: you may see data for multiple events if more than one event is overlaid
You can also use the Range Tool (the small graphic beneath the main timeline)
Drag the selection area to left or right
Grab the left or right edge of the selection area to change the selection size.
1. Most Microsoft Sentinel data uses the common "TimeGenerated" timestamp column.
if your data uses a different timestamp column, specify this using the time_column parameter of the mp_plot.timeline() function. E.g.
df.mp_plot.timeline(time_column="EventStartTimeUTC", ...)2. If there are a lot of logons in your query result the timeline may appear
to be a bar rather than individual events. You can use one of the zoom tools
described above to zoom in on individual events.
from msticpy.vis.timeline import display_timeline, display_timeline_values from msticpy.vis.timeline_duration import display_timeline_duration
display_timeline(data, ...[other params]) display_timeline - shows events as discrete diamonds
display_timeline_values - lets you display scalar values for each event
display_timeline_duration - shows bars of start/end of activity for a group of events
Use the group_by parameter to partition the data
Learn more:
The Infosec Jupyterbook includes a section on data visualization.
Enriching data
Now that we have seen how to query for data, and do some basic manipulation we can look at enriching this data with additional data sources.
For this we are going to use an external threat intelligence provider to give us some more details about an IP address in our dataset using the MSTICPy TIProvider feature.
To learn more about adding TI sources, see the TI Provider setup in the A Getting Started Guide For Microsoft Sentinel ML Notebooks notebook
Learn more:
MSTICPy includes further threat intelligence capabilities as well as other data enrichment options. More details on these can be found in the documentation.
Analyzing data
With the data we have collected we may wish to perform some analysis on it in order to better understand it.
MSTICPy includes a number of features to help with this, and there are a vast array of other data analysis capabilities available via Python ranging from simple processes to complex ML models.
We will start simply and look at how we can decode some obfuscated command lines, so that we understand their content.
We can also use MSTICpy to extract Indicators of Compromise (IoCs) from a dataset.
The IoCExtract class makes it easy to extract and match on a set of IoCs within our data.
In the example below we take a US Cybersecurity & Infrastructure Security Agency (CISA) report and extract all domains listed in the report.
Learn more:
There are a wide range of options when it comes to data analysis in notebooks using Python. Here are some useful resources to get you started:
Scikit-Learn is a popular Python ML data analysis library, which has a useful tutorial
Pivot Functions
Pivot functions use the concept of Cyber Entities to group MSTICPy functionality logically.
An entity is something like an Account, IP Address or Host, and has one or more identifying properties.
Pivot functions are methods of Entities that provide quick access to:
data queries related to an entity
enrichment functions relevant to that entity
Pivot functions are dynamically attached to entities - so we need to load the Pivot library to initialize this
Motivations for Pivot functions
We had built a lot of functionality in MSTICPy for querying and enrichment
A lot of the functions had inconsistent type/parameter signatures
There was no easy discovery mechanism for these functions - you had to know!
Using entities as pivot points is a "natural" investigation pattern
1. You may see a warning/error about not being able to load the IPStack geo-ip provider. You can safely ignore this.
2. From MSTICPy v2.0.0 you do not need the "from msticpy.datamodel.entities import *" since these are imported in "init_notebook".
Use the pivot browser to see what functions are available for different entities.
The Help drop-down panels show you more detail about the selected function.
1. If you are using Local data (rather than data from Microsoft Sentinel) you will see fewer entities and pivot functions in the browser. This is because a lot of the pivot functions are data queries and the local data provider that we are using only has a limited number of queries defined.
2. The function-specific help shows the parameters and usage for the original function
that is wrapped by the Pivot interface. Use the parameter guidance in the generic help when calling pivot functions.
You can pass a single value to a pivot function.
The result is returned as a pandas DataFrame.
Here are five examples with the output shown below.
You can also pass a DataFrame as a parameter. You also need to provide the column name that contains the data that you want to process.
When using a DataFrame as input, you can also join the output data to the input data.
And because pivot functions always return DataFrames, you can easily use the output as input to MSTICPy functions.
The first example shows sending the results from the WhoIs pivot function to a timeline plot.
The second example shows using the tilookup_url Url pivot function to check
Threat intelligence reports for a URL and using the output as input to the TIBrowser
Conclusion
This notebook has shown some basic components of MSTICPy and how to use them in notebooks for Microsoft Sentinel for security investigaitons.
There are many more things possible using notebooks. We strongly encourage you to read the material referenced in the "Learn More" sections in this notebook.
You can also explore the other Microsoft Sentinel notebooks in order to take advantage of the pre-built hunting logic, and understand other analysis techniques that are possible.
Appendices
Further resources
Introduction to Pandas
If you are working with data in the notebook a lot you will want to learn about pandas. Our query results are returned in the form of a Pandas DataFrame: they are a pivotal to Microsoft Sentinel notebooks and MSTICPy and are used for both input and output formats.
Pandas DataFrames are incredibly versatile data structures with a lot of useful features. You might think of them as programmable Excel worksheets.
We will cover a small number of them here and we recommend that you check out the Learn more section to learn more about Pandas features.
Displaying a DataFrame:
The first thing we want to do is display our DataFrame. If the DataFrame is the last item in a code cell, you can just run the cell to display the data.
You can the Jupyter display function - display(df) to explicitly display it - this is especially useful if you want to display a DataFrame from the middle of a code block in a cell.
code cell, Jupyter will automatically display it without using the `display()` function.
However, if you want to display a DataFrame in the middle of other code in a cell you must use the `display()` function.
You may not want to display the whole DataFrame and instead display only a subset of items.
There are numerous ways to do this and the cell below shows some of the most widely used functions.
We can also choose to select a subset of our DataFrame by filtering the contents of the DataFrame.
data[<boolean expression>]returns all rows in the dataframe where the boolean expression is True.
In the first example we telling pandas to return all rows where the column value of TargetUserName matches 'MSTICAdmin'
Grouping and calculating aggregate totals on the groups is done using the groupby function.
Our DataFrame call also be extended to add new columns with additional data if required. The new column data can be static or calculated data as show in these examples.
Learn more:
There is a lot more you can do with Pandas, the links below provide some useful resources: