GitHub Repository: Azure/Azure-Sentinel-Notebooks
Path: blob/master/tutorials-and-examples/training-notebooks/Training - MSTICPy Training 3 - 2022-01-13.ipynb
³²⁵³ views

Kernel: Python 3.8 - AzureML

MSTICPy - Intermediate/Advance Use

Notebooks and Microsoft Sentinel Training #3

msticpy is a library for InfoSec investigation and hunting in Jupyter Notebooks. It includes functionality to:

query log data from multiple sources
enrich the data with Threat Intelligence, geolocations and Azure resource data
extract Indicators of Activity (IoA) from logs and unpack encoded data
perform sophisticated analysis such as anomalous session detection and time series decomposition
visualize data using interactive timelines, process trees and multi-dimensional Morph Charts

It also includes some time-saving notebook tools such as widgets to set query time boundaries, select and display items from lists, and configure the notebook environment.

Source Code: https://github.com/microsoft/msticpy
Python Package: https://pypi.org/project/msticpy/
Docs: https://msticpy.readthedocs.io/en/latest/

1. Introduction

1.1 What's New
1.2 Optional components/dependencies

2. Data Queries

2.1 Recap
2.2 Parameters
2.3 Query time ranges
2.4 Querying other data sources - Microsoft Defender 365
2.5 Creating/saving your own queries

4. Enriching data with Threat Intelligence (and others)

4.1 Introduction to Pivot functions
4.2 Pivot on individual values
4.3 Pivot from DataFrames
4.4 Joining input to your output
4.5 Pivoting with RiskIQ

5. Visualization

5.1 Timelines and timeline values
5.2 Matrix plots for large data sets
5.3 Process Trees
5.4 Time series for temporal pattern anomalies

6. Extras

6.1 Open Threat Research Security Data sets

1. Introduction

1.1 Background

Watch the first two parts of this series
- Getting started with Microsoft Sentinel Notebooks
- MSTICPy Fundamentals to create your own notebooks

1.2 What's new

Single sign-on with Managed Identities

Your sign-in credentials from Azure Machine Learning are used automatically for MS Sentinel
You can override this (in the query provider) with mp_az_auth=False (see later)

The MSTICPy Configuration tool now works in non-geological timescales in Azure Machine Learning!

In [9]:

import msticpy
msticpy.init_notebook(globals())
mpconf = msticpy.MpConfigEdit()
mpconf.set_tab("TI Providers")
mpconf

Out[9]:

Label(value='Loading. Please wait.')

VBox(children=(Tab(children=(VBox(children=(Label(value='Microsoft Sentinel workspace settings'), HBox(childre…

1.3 Extras - installing optional dependencies

Notes:
1. This doesn't affect the MSTICPy code that's installed - only the dependencies
2. Often, you won't need this unless you want the specific *extra* functionality
3. Use %pip within the notebook, not !pip.

MSTICPy is a library with a broad range of functionality and a lot of dependencies. As such, installing all of the dependencies can take a lot of time.

MSTICPy has implemented a series of Extras that allow for subsets of these dependencies. These Extras are grouped around core technologies that you might want to use with MSTICPy.

Extra	Functionality
--none--	Most functionality (approx 75%) Kqlmagic Jupyter basic
keyvault	Key Vault and keyring storage of settings secrets
azure	Azure API data retrieval, Azure storage APIs, Sentinel APIs
kql	Kqlmagic Jupyter extended functionality
azsentinel	Combination of core install + "azure", "keyvault", "kql"
ml	Timeseries analysis, Event clustering, Outlier analysis
splunk	Splunk data queries
vt3	VirusTotal V3 graph API
riskiq	RiskIQ Illuminate threat intel provider & pivot functions
all	Includes all of above packages
dev	Development tools plus "base"
test	"dev" plus "all"

To install a specific Extra, use the following syntax: %pip install msticpy[extra]

You can also install multiple extras at once: %pip install msticpy[extra1,extra2,...]

In [ ]:

%pip install --upgrade msticpy[vt3,riskiq]

If you see this kind of exception - install the extra mentioned

In [10]:

from IPython.display import Image
fname = "/images/extra_exception.png"
gh_path = "https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master"
img_path = f"..{fname}" if Path(f"..{fname}").is_file() else f"{gh_path}{fname}"
Image("../images/extra_exception.png", width=1000)

Out[10]:

2. Data Queries

2.1. Recap

In the last training session we covered:

Authenticating to Microsoft Sentinel
Browsing and listing queries
Running queries

In [11]:

qry_prov=QueryProvider("AzureSentinel")
ws_config = WorkspaceConfig(workspace="CyberSecuritySOC")

Out[11]:

Please wait. Loading Kqlmagic extension...done

New and Improved!

Once set up we can tell the QueryProvider to connect which will kick off the authentication process.

Old way

!az connect
qry_prov.connect(ws_config, mp_az_auth="cli")

Integrated auth with MSI

In [12]:

qry_prov.connect(ws_config)

Out[12]:

Connecting... 

connected

In [13]:

qry_prov.browse()

Out[13]:

VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…

In [15]:

qry_prov.WindowsSecurity.list_host_logons("?")

Out[15]:

Query:  list_host_logons
Data source:  AzureSentinel
Retrieves the logon events on the host

Parameters
----------
add_query_items: str (optional)
    Additional query clauses
end: datetime
    Query end time
event_filter: str (optional)
    Event subset
    (default value is: | where EventID == 4624)
host_name: str
    Name of host
query_project: str (optional)
    Column project statement
    (default value is:  | project TenantId, Account, EventID, TimeGenerat...)
start: datetime
    Query start time
subscription_filter: str (optional)
    Optional subscription/tenant filter expression
    (default value is: true)
table: str (optional)
    Table name
    (default value is: SecurityEvent)
Query:
 {table} {event_filter} {query_project} | where {subscription_filter} | where Computer has "{host_name}" | where TimeGenerated >= datetime({start}) | where TimeGenerated <= datetime({end}) {add_query_items}

2.3 Query parameters

In [16]:

logons_df = qry_prov.WindowsSecurity.list_host_logons(host_name="WORKSTATION6")

print(f"Total records: {len(logons_df)} - First record {logons_df.TimeGenerated.min()}, Last record {logons_df.TimeGenerated.max()}")
logons_df.head(5)

Out[16]:

Total records: 866 - First record 2022-01-12 04:05:20.217000+00:00, Last record 2022-01-12 17:25:47.887000+00:00

Where did `start` and `end` parameters come from?

2.3 Query Times

In [17]:

qry_prov.query_time

Out[17]:

VBox(children=(HTML(value='<h4>Set query time boundaries</h4>'), HBox(children=(DatePicker(value=datetime.date…

In [18]:

logons_df = qry_prov.WindowsSecurity.list_host_logons(host_name="WORKSTATION6")
print(f"Total records: {len(logons_df)} - First record {logons_df.TimeGenerated.min()}, Last record {logons_df.TimeGenerated.max()}")
logons_df.head(5)

Out[18]:

Total records: 562 - First record 2022-01-11 21:06:59.207000+00:00, Last record 2022-01-12 05:02:09.813000+00:00

Can set `start` and `end` parameters manually

Python datetimes
datetime strings
integers/floats (days, relative to now)

In [19]:

logons_df = qry_prov.WindowsSecurity.list_host_logons(
    host_name="WORKSTATION6",
    start="2022-01-11 16:32:05.323000+00:00",
    end=-1
)

print(f"Total records: {len(logons_df)} - First record {logons_df.TimeGenerated.min()}, Last record {logons_df.TimeGenerated.max()}")
logons_df.head(5)

Out[19]:

Total records: 817 - First record 2022-01-11 16:32:05.323000+00:00, Last record 2022-01-12 04:07:09.470000+00:00

2.4 Querying other data sources - Microsoft Defender 365

In [20]:

qry_m365 = QueryProvider("MDE")  # Use "M365D" for MS Defender APIs
qry_m365.connect()

Out[20]:

Connected.

In [21]:

list(filter(
    lambda x: "host" in x.lower(),
    qry_m365.list_queries())
)

Out[21]:

['MDATP.host_alerts', 'MDATP.host_connections', 'MDATP.list_host_processes']

In [23]:

qry_m365.list_queries("host")  # Not yet published!

Out[23]:

['MDATP.ip_alerts', 'MDATP.ip_connections']

In [24]:

qry_m365.MDATP.host_connections("?")

Out[24]:

Query:  host_connections
Data source:  MDE
Lists alerts by for a specified hostname

Parameters
----------
add_query_items: str (optional)
    Additional query clauses
end: datetime
    Query end time
host_name: str
    Name of host
    Aliases: 'hostname'
start: datetime
    Query start time
table: str (optional)
    Table name
    (default value is: DeviceNetworkEvents)
Query:
 {table} | where Timestamp >= datetime({start}) | where Timestamp <= datetime({end}) | where DeviceName has "{host_name}" {add_query_items}

In [25]:

qry_m365.exec_query(
    "DeviceProcessEvents | where Timestamp > ago(1d) | summarize count() by DeviceName | limit 5"
)

Out[25]:

In [26]:

qry_m365.MDATP.list_host_processes(
    host_name="atevet06cl003.defenderatevet06.onmicrosoft.com",
    start=-0.1,
    end=0
).head(5)

Out[26]:

In [27]:

qry_m365.list_queries()

Out[27]:

['MDATP.file_path',
 'MDATP.host_alerts',
 'MDATP.host_connections',
 'MDATP.ip_alerts',
 'MDATP.ip_connections',
 'MDATP.list_alerts',
 'MDATP.list_connections',
 'MDATP.list_filehash',
 'MDATP.list_files',
 'MDATP.list_host_processes',
 'MDATP.process_cmd_line',
 'MDATP.process_creations',
 'MDATP.process_paths',
 'MDATP.protocol_connections',
 'MDATP.sha1_alerts',
 'MDATP.url_alerts',
 'MDATP.url_connections',
 'MDATP.user_files',
 'MDATP.user_logons',
 'MDATP.user_network',
 'MDATP.user_processes',
 'MDATPHunting.accessibility_persistence',
 'MDATPHunting.av_sites',
 'MDATPHunting.b64_pe',
 'MDATPHunting.brute_force',
 'MDATPHunting.cve_2018_1000006l',
 'MDATPHunting.cve_2018_1111',
 'MDATPHunting.cve_2018_4878',
 'MDATPHunting.doc_with_link',
 'MDATPHunting.dropbox_link',
 'MDATPHunting.email_link',
 'MDATPHunting.email_smartscreen',
 'MDATPHunting.malware_recycle',
 'MDATPHunting.network_scans',
 'MDATPHunting.powershell_downloads',
 'MDATPHunting.service_account_powershell',
 'MDATPHunting.smartscreen_ignored',
 'MDATPHunting.smb_discovery',
 'MDATPHunting.tor',
 'MDATPHunting.uncommon_powershell',
 'MDATPHunting.user_enumeration']

2.5 Creating/saving your own queries

A template query looks like this

sources:
  ...
  list_ip_connections:
    description: Lists alerts associated with a specified remote IP
    metadata:
    args:
      query: '
        {table}
        | where Timestamp >= datetime({start})
        | where Timestamp <= datetime({end})
        | where RemoteIP has "{ip_address}" or where LocalIP has "{ip_address}"
        {add_query_items}'
    parameters:
      ip_address:
        description: Remote IP Address
        type: str

It is query language-agnostic
Parameters are substituted using Python format strings
- you might need to quote the parameter
- or invoke a conversion function in the target language

In [ ]:

query_yaml = """
metadata:
  version: 1
  description: MDATP Queries
  data_environments: [MDATP, MDE, M365D]
  data_families: [MDATP]
  tags: ["network"]
defaults:
  metadata:
    data_source: "network_events"
  parameters:
    table:
      description: Table name
      type: str
      default: "DeviceNetworkEvents"
    start:
      description: Query start time
      type: datetime
    end:
      description: Query end time
      type: datetime
    add_query_items:
      description: Additional query clauses
      type: str
      default: ""
sources:
  list_ip_connections:
    description: Lists alerts associated with a specified remote IP
    metadata:
    args:
      query: '
        {table}
        | where Timestamp >= datetime({start})
        | where Timestamp <= datetime({end})
        | where RemoteIP has "{ip_address}" or where LocalIP has "{ip_address}"
        {add_query_items}'
    parameters:
      ip_address:
        description: Remote IP Address
        type: str
"""

Steps

Create your query file(s)
Save them to a folder
Add this to your msticpyconfig.yaml or specify at runtime as param to QueryProvider

Config

QueryDefinitions:
    Custom:
        - C:\queries
        - /home/user/custom_queries

Runtime parameter

qry_prov = QueryProvider("M365", query_paths=["/home/user/custom_queries"])

See Creating Custom Queries for more details

3. Incident Explorer

See forthcoming Guided Investigation - Incident Triage notebook

In [28]:

from msticpy.data.azure_sentinel import AzureSentinel as Sentinel
# instantiate the Sentinel class and connect
sent_api = Sentinel()
sent_api.connect()
# Define our sentinel workspace
workspace_id = "/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/soc/providers/Microsoft.OperationalInsights/workspaces/cybersecuritysoc"

# set a timespan for incidents to display (last 24 hours)
start = pd.Timestamp.utcnow() - pd.Timedelta("1D")
end = pd.Timestamp.utcnow()

# Get current incidents
incidents = sent_api.get_incidents(workspace_id)

# Make sure that we have a timestamp of datetime type
# and filter incidents to our desired time range
incidents["timestamp"] = pd.to_datetime(incidents["properties.createdTimeUtc"], utc=True)
filtered_incidents = (
    incidents[incidents["timestamp"].between(start, end)] 
    if not incidents[incidents["timestamp"].between(start, end)].empty 
    else incidents
)

# plot a timeline of incidents
filtered_incidents.mp_plot.timeline(
    source_columns=["properties.title", "properties.status"],
    title="Incidents over time - grouped by severity",
    height=300,
    group_by="properties.severity",
    time_column="timestamp",
)

Out[28]:

MIME type unknown not supported

MIME type unknown not supported

In [29]:

# pick an incident ID and get full details
incident_uuid = "fac7e091-b7cb-4d27-88e6-61336ea63a36"
incident_id = f"{workspace_id}/providers/Microsoft.SecurityInsights/Incidents/{incident_uuid}"
incident_details = sent_api.get_incident(
    incident_id, entities=True, alerts=True
)
pd.DataFrame(incident_details.iloc[0])

Out[29]:

In [30]:

from msticpy.vis.entity_graph_tools import EntityGraph

# shortcut for time reasons - read saved incident details from file
incident_df = pd.read_pickle("../data/training_incident.pkl")

# Plot graph of incident
incident_graph = EntityGraph(incident_df.iloc[0])
incident_graph.plot()

Out[30]:

MIME type unknown not supported

MIME type unknown not supported

MIME type unknown not supported

4. Enriching data with Threat Intelligence (and others)

4.1 Introduction to Pivot functions
4.2 Pivot on individual values
4.3 Pivot from DataFrames
4.4 Joining input to your output
4.5 Pivoting with RiskIQ

Threat intelligence enrichment recap

In [31]:

# First we create our provider
ti_lookup = TILookup()
# Then we lookup results
ti_results = ti_lookup.lookup_ioc("91.211.89.33")
# Convert results to a DataFrame for ease of viewing
ti_results = ti_lookup.result_to_df(ti_results)
ti_results

Out[31]:

Using Open PageRank. See https://www.domcop.com/openpagerank/what-is-openpagerank

In [32]:

# We can also display them in a browser
TILookup.browse_results(ti_results)

Out[32]:

VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…

Azure Data Enrichment

MSTICPy also includes a number of Azure API integrations that can be used to enrich your data with additional data about Azure Resources. These are available in two formats, via the AzureData feature of MSTICPy and also via the new Azure Resource Graph data connector.

See https://msticpy.readthedocs.io/data_acquisition/AzureData.html

In [33]:

from msticpy.data.azure_data import AzureData
# Create our Azure Data instance and connect
az_data = AzureData()
az_data.connect()

# get the host resource ID from Heartbeat table
host_res_id = qry_prov.Heartbeat.get_heartbeat_for_host(host_name="WORKSTATION12").iloc[0].ResourceId
host_res_id

Out[33]:

'/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/SOC-AttackIQ/providers/Microsoft.Compute/virtualMachines/WORKSTATION12'

In [34]:

az_data.get_resource_details(sub_id="d1d8779d-38d7-4f06-91db-9cbc8de0176f", resource_id=host_res_id)

Out[34]:

{'resource_id': '/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/SOC-AttackIQ/providers/Microsoft.Compute/virtualMachines/WORKSTATION12',
 'name': 'WORKSTATION12',
 'resource_type': 'Microsoft.Compute/virtualMachines',
 'location': 'eastus',
 'tags': None,
 'plan': None,
 'properties': {'vmId': '090cf642-37f3-4a51-bcfd-89f5f1c9bbb0',
  'hardwareProfile': {'vmSize': 'Standard_B2s'},
  'storageProfile': {'imageReference': {'publisher': 'microsoftwindowsdesktop',
    'offer': 'windows-11',
    'sku': 'win11-21h2-pro',
    'version': 'latest',
    'exactVersion': '22000.318.2111041236'},
   'osDisk': {'osType': 'Windows',
    'name': 'WORKSTATION12_OsDisk_1_c39845fdbadb4f0cbf86c9420803ae5a',
    'createOption': 'FromImage',
    'caching': 'ReadWrite',
    'managedDisk': {'storageAccountType': 'StandardSSD_LRS',
     'id': '/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/SOC-AttackIQ/providers/Microsoft.Compute/disks/WORKSTATION12_OsDisk_1_c39845fdbadb4f0cbf86c9420803ae5a'},
    'deleteOption': 'Detach',
    'diskSizeGB': 127},
   'dataDisks': []},
  'osProfile': {'computerName': 'WORKSTATION12',
   'adminUsername': 'ContosoAdmin',
   'windowsConfiguration': {'provisionVMAgent': True,
    'enableAutomaticUpdates': True,
    'patchSettings': {'patchMode': 'AutomaticByOS',
     'assessmentMode': 'ImageDefault',
     'enableHotpatching': False}},
   'secrets': [],
   'allowExtensionOperations': True,
   'requireGuestProvisionSignal': True},
  'networkProfile': {'networkInterfaces': [{'id': '/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/SOC-AttackIQ/providers/Microsoft.Network/networkInterfaces/workstation12417'}]},
  'diagnosticsProfile': {'bootDiagnostics': {'enabled': True}},
  'licenseType': 'Windows_Client',
  'provisioningState': 'Succeeded',
  'timeCreated': '2021-12-09T18:43:43.1596839+00:00'},
 'kind': None,
 'managed_by': None,
 'sku': None,
 'identity': <azure.mgmt.resource.resources.v2021_04_01.models._models_py3.Identity at 0x1ca458e5100>,
 'state': <azure.mgmt.compute.v2020_06_01.models._models_py3.VirtualMachineInstanceView at 0x1ca4558f1f0>}

We can also use the `AzureSentinel` (soon to be renamed) class to get details about specific Microsoft Sentinel elements.

See https://msticpy.readthedocs.io/data_acquisition/AzureSentinel.html

4.1 Introduction to Pivot Functions

Pivot functions are methods of entities that provide:

data queries related to an entity
enrichment functions relevant to that entity

Pivot functions are dynamically attached to entities. We created this framework to make it easier to find which functions you can use for which entity type.

Motivation

We had built a lot of functionality in MSTICPy for querying and enrichment
A lot of the functions had inconsistent type/parameter signatures
There was no easy discovery mechanism for these functions - you had to know
Using entities as pivot points is a "natural" investigation pattern

Access functionality from entities

In [35]:

# Initialize Pivots (this will soon happen by default in init_notebook)
pivot = Pivot(namespace=globals())

In [36]:

pivot.browse()

Out[36]:

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Entities</b>'), Select(description='entity', layou…

4.2 Pivot on individual values

In [49]:

from msticpy.datamodel.entities import IpAddress, Host, Account, Dns
IpAddress.pivots()

Out[49]:

['AzureSentinel.VMComputer_vmcomputer',
 'AzureSentinel.aad_signins',
 'AzureSentinel.az_activity',
 'AzureSentinel.az_storage_ops',
 'AzureSentinel.aznet_interface',
 'AzureSentinel.aznet_net_flows',
 'AzureSentinel.azsent_bookmarks',
 'AzureSentinel.azti_list_indicators_by_ip',
 'AzureSentinel.dns_queries',
 'AzureSentinel.dns_queries_from_ip',
 'AzureSentinel.hb_heartbeat',
 'AzureSentinel.hb_heartbeat_for_ip_depr',
 'AzureSentinel.list_alerts_for_ip',
 'AzureSentinel.lxsys_logon_failures',
 'AzureSentinel.lxsys_logons',
 'AzureSentinel.o365_activity',
 'MDE.DeviceAlertEvents_ip_alerts',
 'MDE.DeviceNetworkEvents_ip_connections',
 'RiskIQ.articles',
 'RiskIQ.artifacts',
 'RiskIQ.certificates',
 'RiskIQ.components',
 'RiskIQ.cookies',
 'RiskIQ.hostpair_children',
 'RiskIQ.hostpair_parents',
 'RiskIQ.malware',
 'RiskIQ.projects',
 'RiskIQ.reputation',
 'RiskIQ.resolutions',
 'RiskIQ.services',
 'RiskIQ.summary',
 'RiskIQ.trackers',
 'RiskIQ.whois',
 'geoloc',
 'ip_type',
 'qry_aad_signins',
 'qry_az_activity',
 'qry_aznet_interface',
 'qry_aznet_net_flows',
 'qry_azsent_bookmarks',
 'qry_dns_queries',
 'qry_dns_queries_from_ip',
 'qry_hb_heartbeat',
 'qry_lxsys_logon_failures',
 'qry_lxsys_logons',
 'qry_o365_activity',
 'ti.lookup_ip',
 'ti.lookup_ipv4',
 'ti.lookup_ipv4_OTX',
 'ti.lookup_ipv4_RiskIQ',
 'ti.lookup_ipv4_Tor',
 'ti.lookup_ipv4_VirusTotal',
 'ti.lookup_ipv4_XForce',
 'ti.lookup_ipv6',
 'ti.lookup_ipv6_OTX',
 'tilookup_ip',
 'tilookup_ipv4',
 'tilookup_ipv6',
 'util.geoloc',
 'util.geoloc_ips',
 'util.ip_rev_resolve',
 'util.ip_type',
 'util.whois',
 'whois']

In [39]:

from msticpy.datamodel.entities import IpAddress, Host, Account

display(IpAddress.whois("38.75.137.9"))
display(IpAddress.geoloc("38.75.137.9"))

Out[39]:

Queries are pivot functions too

In [40]:

Host.AzureSentinel.alerts(host_name="AdminHost", add_query_items="| limit 5")

Out[40]:

4.3 Pivot on DataFrames and Lists

entity.`pivot_func(list_or_iterable)`

entity.`pivot_func(dataframe, column="col_name")`

In [41]:

%%ioc --out ip_list
	SourceIP	DestinationIP	TotalBytesSent	nir	asn_registry	asn	asn_cidr	asn_country_code	asn_date	asn_description	query	nets	raw	referral	raw_referral
0	10.0.3.5	40.124.45.19	621	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	10.16.12.1	40.124.45.19	1004	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	10.4.5.12	13.71.172.130	247	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	10.4.5.12	40.77.232.95	189	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	10.4.5.16	13.71.172.130	46	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	10.4.5.16	65.55.44.109	120	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	10.90.78.142	104.43.212.12	12	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	10.90.78.71	104.43.212.12	4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	20.185.182.48	38.75.137.9	8328	NaN	arin	8075

Out[41]:

[('ipv4',
  ['10.4.5.16',
   '10.90.78.71',
   '10.4.5.12',
   '65.55.44.109',
   '10.0.3.5',
   '10.90.78.142',
   '10.16.12.1',
   '38.75.137.9',
   '13.71.172.130',
   '104.43.212.12',
   '40.124.45.19',
   '40.77.232.95',
   '20.185.182.48'])]

In [42]:

IpAddress.whois(ip_list["ipv4"])

Out[42]:

4.4 Joining input to output

In [43]:

IpAddress.whois(ip_list["ipv4"], join="left")

Out[43]:

Creating Pivot pipelines - DataFrames as input and output

In [44]:

list(ip_list["ipv4"])[:4]

Out[44]:

['10.4.5.16', '10.90.78.71', '10.4.5.12', '65.55.44.109']

In [45]:

(
    IpAddress.whois(list(ip_list["ipv4"])[:4], join="left")
    .mp_pivot.run(IpAddress.geoloc, input_col="ip_column", join="left")
    .mp_pivot.run(IpAddress.tilookup_ipv4, input_col="ip_column", join="left")
)

Out[45]:

4.5 RiskIQ Pivots

In [46]:

display(IpAddress.RiskIQ.whois("185.191.34.209"))
display(IpAddress.RiskIQ.articles("185.191.34.209"))

Out[46]:

In [50]:

(
   Dns.RiskIQ.resolutions("teamworks455.com")
   .query("recordtype=='A'")
   .mp_pivot.run(IpAddress.util.geoloc, column="resolve", join="left")
   .mp_pivot.display()
   .mp_pivot.run(IpAddress.RiskIQ.resolutions, column="resolve", join="left")
)

Out[50]:

5. Visualization

5.1 Timelines and timeline values
5.2 Matrix plots for large data sets
5.3 Process Trees
5.4 Time series for temporal pattern anomalies

Most visualization functionality is available through `DataFrame.mp_plot`.vis_func()

5.1 Timelines and timeline values

In [51]:

type(logons_df)

Out[51]:

pandas.core.frame.DataFrame

In [53]:

logons_df.mp_plot.timeline(group_by="Account", time_column="TimeGenerated")

Out[53]:

MIME type unknown not supported

MIME type unknown not supported

In [56]:

logon_count_df = (
    logons_df[["Account", "TimeGenerated", "EventID"]]
    .groupby(["Account", pd.Grouper(key="TimeGenerated", freq="10min")])
    .count().reset_index()
 )
logon_count_df.mp_plot.timeline_values(group_by="Account", y="EventID", kind=["circle", "vbar"], source_columns=["Account"])

Out[56]:

MIME type unknown not supported

MIME type unknown not supported

5.3 Matrix plots

Often these are useful at large scale for showing patterns of behavior and highlighting significant changes.

In [57]:

norm_failed_df = pd.read_pickle("../data/failed_logons_det_df.pkl")
comb_failed_df = pd.read_pickle("../data/combined_df.pkl")

norm_failed_df.mp_plot.matrix(y="IPAddress", x="Location", title="Normal failed logons", height=300)
comb_failed_df.mp_plot.matrix(y="IPAddress", x="Location", title="Suspect spray attack", height=300)

Out[57]:

MIME type unknown not supported

MIME type unknown not supported

MIME type unknown not supported

MIME type unknown not supported

5.4 Process Trees

Schema-dependent

Works with:

MS Sentinel/WEVT Windows process events
Linux Auditd logs
MDE DeviceProcess events
Sysmon - thanks Nicholas Bareil!

Custom schemas can be used.

In [58]:


qry_m365.connect()
proc_df = qry_m365.MDATP.list_host_processes(
    host_name="atevet06cl003.defenderatevet06.onmicrosoft.com",
    start=-0.5,
    end=0
)

Out[58]:

Connected.

In [59]:

proc_df.mp_plot.process_tree(legend_col="InitiatingProcessAccountName")

Out[59]:

MIME type unknown not supported

MIME type unknown not supported

(Figure(id='4235', ...), Row(id='4354', ...))

5.5 Time series analysis

Note: your data set must:

Be at least 1 week long
Grouped/aggregated by a time interval (e.g. 1 hour)
Have a scalar value column (number of logons, bytes transmitted, etc.)

In [60]:

logons_by_hour_df = (
    comb_failed_df[['TimeGenerated', 'OperationName']]
    .groupby(pd.Grouper(key="TimeGenerated", freq="1h"))
    .count()
    .fillna(0)
    .sort_index()
)
logons_by_hour_df.head(5)

Out[60]:

In [61]:

from msticpy.analysis.timeseries import timeseries_anomalies_stl
from msticpy.nbtools.timeseries import display_timeseries_anomalies

logons_by_hour_df = pd.read_pickle("../data/failed_logons_hourly.pkl")
ts_analysis = timeseries_anomalies_stl(logons_by_hour_df)
display_timeseries_anomalies(ts_analysis, y="count", period=7, height=600)

Out[61]:

MIME type unknown not supported

MIME type unknown not supported

In [62]:

from msticpy.analysis.timeseries import find_anomaly_periods
from msticpy.common.timespan import TimeSpan
periods = find_anomaly_periods(ts_analysis.sort_values("TimeGenerated"))
periods

Out[62]:

[TimeSpan(start=2021-11-08 07:00:00+00:00, end=2021-11-08 09:00:00+00:00, period=0 days 02:00:00),
 TimeSpan(start=2021-11-15 06:00:00+00:00, end=2021-11-15 14:00:00+00:00, period=0 days 08:00:00),
 TimeSpan(start=2021-11-22 07:00:00+00:00, end=2021-11-22 09:00:00+00:00, period=0 days 02:00:00)]

In [63]:

anomaly_time = nbwidgets.QueryTime(timespan=periods[1])
anomaly_time

Out[63]:

VBox(children=(HTML(value='<h4>Set query time boundaries</h4>'), HBox(children=(DatePicker(value=datetime.date…

In [64]:

aad_logins = qry_prov.Azure.list_all_signins_geo(anomaly_time)

In [65]:


(
    aad_logins.query("ResultType != '0'")
    [["ResultDescription", "UserPrincipalName", "TimeGenerated"]]
    .groupby(["ResultDescription", "UserPrincipalName"]).count()
)

Out[65]:

6. Extras

6.1 OTRF Security Datasets

In [66]:

from msticpy.data.browsers.mordor_browser import MordorBrowser

mordor = MordorBrowser()
mordor

Out[66]:

Retrieving Mitre data...
Retrieving Mordor data...

Downloading Mordor metadata: 100%|██████████| 96/96 [00:00<00:00, 6000.43 files/s]

VBox(children=(VBox(children=(HTML(value='<h2>Mordor dataset browser</h2>'), Select(description='Data sets', l…

<msticpy.data.browsers.mordor_browser.MordorBrowser at 0x1ca4ddd07f0>

In [67]:

mordor.current_dataset.mp_plot.timeline(group_by="EventID", time_column="EventTime")

Out[67]:

MIME type unknown not supported

MIME type unknown not supported

7. Conclusion and Resources

Conclusion

Notebooks give the kind of flexibility not found in any SIEM (including Sentinel)
- Create your own reusable analysis flows
- Capture progress as it happens
- Automate complex detections and observation patterns
There is a learning curve - be prepared to invest some time
But the payoff in capability is worth it
Lots of functionality in MSTICPy and more being added all the time

Actions

Watch the training videos 📺

Watch InfoSec Jupyterthon workshops 📺

MSTICPy - Intermediate/Advance Use

Notebooks and Microsoft Sentinel Training #3

Contents

1. Introduction

2. Data Queries

3. Incident Triage

4. Enriching data with Threat Intelligence (and others)

5. Visualization

6. Extras

7. Conclusion and Resources

1. Introduction

1.1 Background

1.2 What's new

Single sign-on with Managed Identities

The MSTICPy Configuration tool now works in non-geological timescales in Azure Machine Learning!

1.3 Extras - installing optional dependencies

If you see this kind of exception - install the extra mentioned

2. Data Queries

2.1. Recap

New and Improved!

Old way

Integrated auth with MSI

2.3 Query parameters

Where did start and end parameters come from?

2.3 Query Times

Can set start and end parameters manually

2.4 Querying other data sources - Microsoft Defender 365

2.5 Creating/saving your own queries

Steps

Config

Runtime parameter

3. Incident Explorer

See forthcoming Guided Investigation - Incident Triage notebook

4. Enriching data with Threat Intelligence (and others)

Threat intelligence enrichment recap

Azure Data Enrichment

See https://msticpy.readthedocs.io/data_acquisition/AzureData.html

We can also use the AzureSentinel (soon to be renamed) class to get details about specific Microsoft Sentinel elements.

See https://msticpy.readthedocs.io/data_acquisition/AzureSentinel.html

4.1 Introduction to Pivot Functions

Motivation

Access functionality from entities

4.2 Pivot on individual values

Queries are pivot functions too

4.3 Pivot on DataFrames and Lists

entity.pivot_func(list_or_iterable)

entity.pivot_func(dataframe, column="col_name")

4.4 Joining input to output

Creating Pivot pipelines - DataFrames as input and output

4.5 RiskIQ Pivots

5. Visualization

Most visualization functionality is available through DataFrame.mp_plot.vis_func()

5.1 Timelines and timeline values

5.3 Matrix plots

5.4 Process Trees

5.5 Time series analysis

6. Extras

6.1 OTRF Security Datasets

7. Conclusion and Resources

Conclusion

Actions

Watch the training videos 📺

Watch InfoSec Jupyterthon workshops 📺

Play around with template and sample notebooks

Visit MSTICPy GitHub repo and leave us a star ⭐

Read the MSTICPy docs 📓

MSTICPy Hackathon - On Now!

Contacts

Where did `start` and `end` parameters come from?

Can set `start` and `end` parameters manually

We can also use the `AzureSentinel` (soon to be renamed) class to get details about specific Microsoft Sentinel elements.

entity.`pivot_func(list_or_iterable)`

entity.`pivot_func(dataframe, column="col_name")`

Most visualization functionality is available through `DataFrame.mp_plot`.vis_func()