Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Azure
GitHub Repository: Azure/Azure-Sentinel-Notebooks
Path: blob/master/tutorials-and-examples/training-notebooks/Training - MSTICPy Training 3 - 2022-01-13.ipynb
3253 views
Kernel: Python 3.8 - AzureML

MSTICPy - Intermediate/Advance Use

Notebooks and Microsoft Sentinel Training #3

msticpy is a library for InfoSec investigation and hunting in Jupyter Notebooks. It includes functionality to:

  • query log data from multiple sources

  • enrich the data with Threat Intelligence, geolocations and Azure resource data

  • extract Indicators of Activity (IoA) from logs and unpack encoded data

  • perform sophisticated analysis such as anomalous session detection and time series decomposition

  • visualize data using interactive timelines, process trees and multi-dimensional Morph Charts

It also includes some time-saving notebook tools such as widgets to set query time boundaries, select and display items from lists, and configure the notebook environment.


Contents

1. Introduction

  • 1.1 What's New

  • 1.2 Optional components/dependencies

2. Data Queries

  • 2.1 Recap

  • 2.2 Parameters

  • 2.3 Query time ranges

  • 2.4 Querying other data sources - Microsoft Defender 365

  • 2.5 Creating/saving your own queries

3. Incident Triage

4. Enriching data with Threat Intelligence (and others)

  • 4.1 Introduction to Pivot functions

  • 4.2 Pivot on individual values

  • 4.3 Pivot from DataFrames

  • 4.4 Joining input to your output

  • 4.5 Pivoting with RiskIQ

5. Visualization

  • 5.1 Timelines and timeline values

  • 5.2 Matrix plots for large data sets

  • 5.3 Process Trees

  • 5.4 Time series for temporal pattern anomalies

6. Extras

  • 6.1 Open Threat Research Security Data sets

7. Conclusion and Resources


1. Introduction

1.1 Background

1.2 What's new

Single sign-on with Managed Identities

  • Your sign-in credentials from Azure Machine Learning are used automatically for MS Sentinel

  • You can override this (in the query provider) with mp_az_auth=False (see later)

The MSTICPy Configuration tool now works in non-geological timescales in Azure Machine Learning!

import msticpy msticpy.init_notebook(globals()) mpconf = msticpy.MpConfigEdit() mpconf.set_tab("TI Providers") mpconf
Label(value='Loading. Please wait.')
VBox(children=(Tab(children=(VBox(children=(Label(value='Microsoft Sentinel workspace settings'), HBox(childre…

1.3 Extras - installing optional dependencies

Notes:
1. This doesn't affect the MSTICPy code that's installed - only the dependencies
2. Often, you won't need this unless you want the specific *extra* functionality
3. Use %pip within the notebook, not !pip.

MSTICPy is a library with a broad range of functionality and a lot of dependencies. As such, installing all of the dependencies can take a lot of time.

MSTICPy has implemented a series of Extras that allow for subsets of these dependencies. These Extras are grouped around core technologies that you might want to use with MSTICPy.

ExtraFunctionality
--none--Most functionality (approx 75%) Kqlmagic Jupyter basic
keyvaultKey Vault and keyring storage of settings secrets
azureAzure API data retrieval, Azure storage APIs, Sentinel APIs
kqlKqlmagic Jupyter extended functionality
azsentinelCombination of core install + "azure", "keyvault", "kql"
mlTimeseries analysis, Event clustering, Outlier analysis
splunkSplunk data queries
vt3VirusTotal V3 graph API
riskiqRiskIQ Illuminate threat intel provider & pivot functions
allIncludes all of above packages
devDevelopment tools plus "base"
test"dev" plus "all"

To install a specific Extra, use the following syntax: %pip install msticpy[extra]

You can also install multiple extras at once: %pip install msticpy[extra1,extra2,...]

%pip install --upgrade msticpy[vt3,riskiq]

If you see this kind of exception - install the extra mentioned

from IPython.display import Image fname = "/images/extra_exception.png" gh_path = "https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master" img_path = f"..{fname}" if Path(f"..{fname}").is_file() else f"{gh_path}{fname}" Image("../images/extra_exception.png", width=1000)
Image in a Jupyter notebook

2. Data Queries

2.1. Recap

In the last training session we covered:

  • Authenticating to Microsoft Sentinel

  • Browsing and listing queries

  • Running queries

qry_prov=QueryProvider("AzureSentinel") ws_config = WorkspaceConfig(workspace="CyberSecuritySOC")
Please wait. Loading Kqlmagic extension...done

New and Improved!

Once set up we can tell the QueryProvider to connect which will kick off the authentication process.

Old way

!az connect qry_prov.connect(ws_config, mp_az_auth="cli")

Integrated auth with MSI

qry_prov.connect(ws_config)
Connecting...
connected
qry_prov.browse()
VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…
qry_prov.WindowsSecurity.list_host_logons("?")
Query: list_host_logons Data source: AzureSentinel Retrieves the logon events on the host Parameters ---------- add_query_items: str (optional) Additional query clauses end: datetime Query end time event_filter: str (optional) Event subset (default value is: | where EventID == 4624) host_name: str Name of host query_project: str (optional) Column project statement (default value is: | project TenantId, Account, EventID, TimeGenerat...) start: datetime Query start time subscription_filter: str (optional) Optional subscription/tenant filter expression (default value is: true) table: str (optional) Table name (default value is: SecurityEvent) Query: {table} {event_filter} {query_project} | where {subscription_filter} | where Computer has "{host_name}" | where TimeGenerated >= datetime({start}) | where TimeGenerated <= datetime({end}) {add_query_items}

2.3 Query parameters

logons_df = qry_prov.WindowsSecurity.list_host_logons(host_name="WORKSTATION6") print(f"Total records: {len(logons_df)} - First record {logons_df.TimeGenerated.min()}, Last record {logons_df.TimeGenerated.max()}") logons_df.head(5)
Total records: 866 - First record 2022-01-12 04:05:20.217000+00:00, Last record 2022-01-12 17:25:47.887000+00:00

Where did start and end parameters come from?

2.3 Query Times

qry_prov.query_time
VBox(children=(HTML(value='<h4>Set query time boundaries</h4>'), HBox(children=(DatePicker(value=datetime.date…
logons_df = qry_prov.WindowsSecurity.list_host_logons(host_name="WORKSTATION6") print(f"Total records: {len(logons_df)} - First record {logons_df.TimeGenerated.min()}, Last record {logons_df.TimeGenerated.max()}") logons_df.head(5)
Total records: 562 - First record 2022-01-11 21:06:59.207000+00:00, Last record 2022-01-12 05:02:09.813000+00:00

Can set start and end parameters manually

  • Python datetimes

  • datetime strings

  • integers/floats (days, relative to now)

logons_df = qry_prov.WindowsSecurity.list_host_logons( host_name="WORKSTATION6", start="2022-01-11 16:32:05.323000+00:00", end=-1 ) print(f"Total records: {len(logons_df)} - First record {logons_df.TimeGenerated.min()}, Last record {logons_df.TimeGenerated.max()}") logons_df.head(5)
Total records: 817 - First record 2022-01-11 16:32:05.323000+00:00, Last record 2022-01-12 04:07:09.470000+00:00

2.4 Querying other data sources - Microsoft Defender 365

qry_m365 = QueryProvider("MDE") # Use "M365D" for MS Defender APIs qry_m365.connect()
Connected.
list(filter( lambda x: "host" in x.lower(), qry_m365.list_queries()) )
['MDATP.host_alerts', 'MDATP.host_connections', 'MDATP.list_host_processes']
qry_m365.list_queries("host") # Not yet published!
['MDATP.ip_alerts', 'MDATP.ip_connections']
qry_m365.MDATP.host_connections("?")
Query: host_connections Data source: MDE Lists alerts by for a specified hostname Parameters ---------- add_query_items: str (optional) Additional query clauses end: datetime Query end time host_name: str Name of host Aliases: 'hostname' start: datetime Query start time table: str (optional) Table name (default value is: DeviceNetworkEvents) Query: {table} | where Timestamp >= datetime({start}) | where Timestamp <= datetime({end}) | where DeviceName has "{host_name}" {add_query_items}
qry_m365.exec_query( "DeviceProcessEvents | where Timestamp > ago(1d) | summarize count() by DeviceName | limit 5" )
qry_m365.MDATP.list_host_processes( host_name="atevet06cl003.defenderatevet06.onmicrosoft.com", start=-0.1, end=0 ).head(5)
qry_m365.list_queries()
['MDATP.file_path', 'MDATP.host_alerts', 'MDATP.host_connections', 'MDATP.ip_alerts', 'MDATP.ip_connections', 'MDATP.list_alerts', 'MDATP.list_connections', 'MDATP.list_filehash', 'MDATP.list_files', 'MDATP.list_host_processes', 'MDATP.process_cmd_line', 'MDATP.process_creations', 'MDATP.process_paths', 'MDATP.protocol_connections', 'MDATP.sha1_alerts', 'MDATP.url_alerts', 'MDATP.url_connections', 'MDATP.user_files', 'MDATP.user_logons', 'MDATP.user_network', 'MDATP.user_processes', 'MDATPHunting.accessibility_persistence', 'MDATPHunting.av_sites', 'MDATPHunting.b64_pe', 'MDATPHunting.brute_force', 'MDATPHunting.cve_2018_1000006l', 'MDATPHunting.cve_2018_1111', 'MDATPHunting.cve_2018_4878', 'MDATPHunting.doc_with_link', 'MDATPHunting.dropbox_link', 'MDATPHunting.email_link', 'MDATPHunting.email_smartscreen', 'MDATPHunting.malware_recycle', 'MDATPHunting.network_scans', 'MDATPHunting.powershell_downloads', 'MDATPHunting.service_account_powershell', 'MDATPHunting.smartscreen_ignored', 'MDATPHunting.smb_discovery', 'MDATPHunting.tor', 'MDATPHunting.uncommon_powershell', 'MDATPHunting.user_enumeration']

2.5 Creating/saving your own queries

A template query looks like this

sources: ... list_ip_connections: description: Lists alerts associated with a specified remote IP metadata: args: query: ' {table} | where Timestamp >= datetime({start}) | where Timestamp <= datetime({end}) | where RemoteIP has "{ip_address}" or where LocalIP has "{ip_address}" {add_query_items}' parameters: ip_address: description: Remote IP Address type: str
  • It is query language-agnostic

  • Parameters are substituted using Python format strings

    • you might need to quote the parameter

    • or invoke a conversion function in the target language

query_yaml = """ metadata: version: 1 description: MDATP Queries data_environments: [MDATP, MDE, M365D] data_families: [MDATP] tags: ["network"] defaults: metadata: data_source: "network_events" parameters: table: description: Table name type: str default: "DeviceNetworkEvents" start: description: Query start time type: datetime end: description: Query end time type: datetime add_query_items: description: Additional query clauses type: str default: "" sources: list_ip_connections: description: Lists alerts associated with a specified remote IP metadata: args: query: ' {table} | where Timestamp >= datetime({start}) | where Timestamp <= datetime({end}) | where RemoteIP has "{ip_address}" or where LocalIP has "{ip_address}" {add_query_items}' parameters: ip_address: description: Remote IP Address type: str """

Steps

  1. Create your query file(s)

  2. Save them to a folder

  3. Add this to your msticpyconfig.yaml or specify at runtime as param to QueryProvider

Config

QueryDefinitions: Custom: - C:\queries - /home/user/custom_queries

Runtime parameter

qry_prov = QueryProvider("M365", query_paths=["/home/user/custom_queries"])

See Creating Custom Queries for more details


3. Incident Explorer

See forthcoming Guided Investigation - Incident Triage notebook

from msticpy.data.azure_sentinel import AzureSentinel as Sentinel # instantiate the Sentinel class and connect sent_api = Sentinel() sent_api.connect() # Define our sentinel workspace workspace_id = "/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/soc/providers/Microsoft.OperationalInsights/workspaces/cybersecuritysoc" # set a timespan for incidents to display (last 24 hours) start = pd.Timestamp.utcnow() - pd.Timedelta("1D") end = pd.Timestamp.utcnow() # Get current incidents incidents = sent_api.get_incidents(workspace_id) # Make sure that we have a timestamp of datetime type # and filter incidents to our desired time range incidents["timestamp"] = pd.to_datetime(incidents["properties.createdTimeUtc"], utc=True) filtered_incidents = ( incidents[incidents["timestamp"].between(start, end)] if not incidents[incidents["timestamp"].between(start, end)].empty else incidents ) # plot a timeline of incidents filtered_incidents.mp_plot.timeline( source_columns=["properties.title", "properties.status"], title="Incidents over time - grouped by severity", height=300, group_by="properties.severity", time_column="timestamp", )
MIME type unknown not supported
MIME type unknown not supported
# pick an incident ID and get full details incident_uuid = "fac7e091-b7cb-4d27-88e6-61336ea63a36" incident_id = f"{workspace_id}/providers/Microsoft.SecurityInsights/Incidents/{incident_uuid}" incident_details = sent_api.get_incident( incident_id, entities=True, alerts=True ) pd.DataFrame(incident_details.iloc[0])
from msticpy.vis.entity_graph_tools import EntityGraph # shortcut for time reasons - read saved incident details from file incident_df = pd.read_pickle("../data/training_incident.pkl") # Plot graph of incident incident_graph = EntityGraph(incident_df.iloc[0]) incident_graph.plot()
MIME type unknown not supported
MIME type unknown not supported
MIME type unknown not supported

4. Enriching data with Threat Intelligence (and others)

  • 4.1 Introduction to Pivot functions

  • 4.2 Pivot on individual values

  • 4.3 Pivot from DataFrames

  • 4.4 Joining input to your output

  • 4.5 Pivoting with RiskIQ

Threat intelligence enrichment recap

# First we create our provider ti_lookup = TILookup() # Then we lookup results ti_results = ti_lookup.lookup_ioc("91.211.89.33") # Convert results to a DataFrame for ease of viewing ti_results = ti_lookup.result_to_df(ti_results) ti_results
Using Open PageRank. See https://www.domcop.com/openpagerank/what-is-openpagerank
# We can also display them in a browser TILookup.browse_results(ti_results)
VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…

Azure Data Enrichment

MSTICPy also includes a number of Azure API integrations that can be used to enrich your data with additional data about Azure Resources. These are available in two formats, via the AzureData feature of MSTICPy and also via the new Azure Resource Graph data connector.

See https://msticpy.readthedocs.io/data_acquisition/AzureData.html

from msticpy.data.azure_data import AzureData # Create our Azure Data instance and connect az_data = AzureData() az_data.connect() # get the host resource ID from Heartbeat table host_res_id = qry_prov.Heartbeat.get_heartbeat_for_host(host_name="WORKSTATION12").iloc[0].ResourceId host_res_id
'/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/SOC-AttackIQ/providers/Microsoft.Compute/virtualMachines/WORKSTATION12'
az_data.get_resource_details(sub_id="d1d8779d-38d7-4f06-91db-9cbc8de0176f", resource_id=host_res_id)
{'resource_id': '/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/SOC-AttackIQ/providers/Microsoft.Compute/virtualMachines/WORKSTATION12', 'name': 'WORKSTATION12', 'resource_type': 'Microsoft.Compute/virtualMachines', 'location': 'eastus', 'tags': None, 'plan': None, 'properties': {'vmId': '090cf642-37f3-4a51-bcfd-89f5f1c9bbb0', 'hardwareProfile': {'vmSize': 'Standard_B2s'}, 'storageProfile': {'imageReference': {'publisher': 'microsoftwindowsdesktop', 'offer': 'windows-11', 'sku': 'win11-21h2-pro', 'version': 'latest', 'exactVersion': '22000.318.2111041236'}, 'osDisk': {'osType': 'Windows', 'name': 'WORKSTATION12_OsDisk_1_c39845fdbadb4f0cbf86c9420803ae5a', 'createOption': 'FromImage', 'caching': 'ReadWrite', 'managedDisk': {'storageAccountType': 'StandardSSD_LRS', 'id': '/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/SOC-AttackIQ/providers/Microsoft.Compute/disks/WORKSTATION12_OsDisk_1_c39845fdbadb4f0cbf86c9420803ae5a'}, 'deleteOption': 'Detach', 'diskSizeGB': 127}, 'dataDisks': []}, 'osProfile': {'computerName': 'WORKSTATION12', 'adminUsername': 'ContosoAdmin', 'windowsConfiguration': {'provisionVMAgent': True, 'enableAutomaticUpdates': True, 'patchSettings': {'patchMode': 'AutomaticByOS', 'assessmentMode': 'ImageDefault', 'enableHotpatching': False}}, 'secrets': [], 'allowExtensionOperations': True, 'requireGuestProvisionSignal': True}, 'networkProfile': {'networkInterfaces': [{'id': '/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourceGroups/SOC-AttackIQ/providers/Microsoft.Network/networkInterfaces/workstation12417'}]}, 'diagnosticsProfile': {'bootDiagnostics': {'enabled': True}}, 'licenseType': 'Windows_Client', 'provisioningState': 'Succeeded', 'timeCreated': '2021-12-09T18:43:43.1596839+00:00'}, 'kind': None, 'managed_by': None, 'sku': None, 'identity': <azure.mgmt.resource.resources.v2021_04_01.models._models_py3.Identity at 0x1ca458e5100>, 'state': <azure.mgmt.compute.v2020_06_01.models._models_py3.VirtualMachineInstanceView at 0x1ca4558f1f0>}

We can also use the AzureSentinel (soon to be renamed) class to get details about specific Microsoft Sentinel elements.

See https://msticpy.readthedocs.io/data_acquisition/AzureSentinel.html

4.1 Introduction to Pivot Functions

Pivot functions are methods of entities that provide:

  • data queries related to an entity

  • enrichment functions relevant to that entity

Pivot functions are dynamically attached to entities. We created this framework to make it easier to find which functions you can use for which entity type.

Motivation

  • We had built a lot of functionality in MSTICPy for querying and enrichment

  • A lot of the functions had inconsistent type/parameter signatures

  • There was no easy discovery mechanism for these functions - you had to know

  • Using entities as pivot points is a "natural" investigation pattern

Access functionality from entities

# Initialize Pivots (this will soon happen by default in init_notebook) pivot = Pivot(namespace=globals())
pivot.browse()
VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Entities</b>'), Select(description='entity', layou…

4.2 Pivot on individual values

from msticpy.datamodel.entities import IpAddress, Host, Account, Dns IpAddress.pivots()
['AzureSentinel.VMComputer_vmcomputer', 'AzureSentinel.aad_signins', 'AzureSentinel.az_activity', 'AzureSentinel.az_storage_ops', 'AzureSentinel.aznet_interface', 'AzureSentinel.aznet_net_flows', 'AzureSentinel.azsent_bookmarks', 'AzureSentinel.azti_list_indicators_by_ip', 'AzureSentinel.dns_queries', 'AzureSentinel.dns_queries_from_ip', 'AzureSentinel.hb_heartbeat', 'AzureSentinel.hb_heartbeat_for_ip_depr', 'AzureSentinel.list_alerts_for_ip', 'AzureSentinel.lxsys_logon_failures', 'AzureSentinel.lxsys_logons', 'AzureSentinel.o365_activity', 'MDE.DeviceAlertEvents_ip_alerts', 'MDE.DeviceNetworkEvents_ip_connections', 'RiskIQ.articles', 'RiskIQ.artifacts', 'RiskIQ.certificates', 'RiskIQ.components', 'RiskIQ.cookies', 'RiskIQ.hostpair_children', 'RiskIQ.hostpair_parents', 'RiskIQ.malware', 'RiskIQ.projects', 'RiskIQ.reputation', 'RiskIQ.resolutions', 'RiskIQ.services', 'RiskIQ.summary', 'RiskIQ.trackers', 'RiskIQ.whois', 'geoloc', 'ip_type', 'qry_aad_signins', 'qry_az_activity', 'qry_aznet_interface', 'qry_aznet_net_flows', 'qry_azsent_bookmarks', 'qry_dns_queries', 'qry_dns_queries_from_ip', 'qry_hb_heartbeat', 'qry_lxsys_logon_failures', 'qry_lxsys_logons', 'qry_o365_activity', 'ti.lookup_ip', 'ti.lookup_ipv4', 'ti.lookup_ipv4_OTX', 'ti.lookup_ipv4_RiskIQ', 'ti.lookup_ipv4_Tor', 'ti.lookup_ipv4_VirusTotal', 'ti.lookup_ipv4_XForce', 'ti.lookup_ipv6', 'ti.lookup_ipv6_OTX', 'tilookup_ip', 'tilookup_ipv4', 'tilookup_ipv6', 'util.geoloc', 'util.geoloc_ips', 'util.ip_rev_resolve', 'util.ip_type', 'util.whois', 'whois']
from msticpy.datamodel.entities import IpAddress, Host, Account display(IpAddress.whois("38.75.137.9")) display(IpAddress.geoloc("38.75.137.9"))

Queries are pivot functions too

Host.AzureSentinel.alerts(host_name="AdminHost", add_query_items="| limit 5")

4.3 Pivot on DataFrames and Lists

entity.pivot_func(list_or_iterable)

entity.pivot_func(dataframe, column="col_name")

%%ioc --out ip_list SourceIP DestinationIP TotalBytesSent nir asn_registry asn asn_cidr asn_country_code asn_date asn_description query nets raw referral raw_referral 0 10.0.3.5 40.124.45.19 621 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 10.16.12.1 40.124.45.19 1004 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 10.4.5.12 13.71.172.130 247 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 10.4.5.12 40.77.232.95 189 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 10.4.5.16 13.71.172.130 46 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5 10.4.5.16 65.55.44.109 120 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6 10.90.78.142 104.43.212.12 12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 10.90.78.71 104.43.212.12 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8 20.185.182.48 38.75.137.9 8328 NaN arin 8075
[('ipv4', ['10.4.5.16', '10.90.78.71', '10.4.5.12', '65.55.44.109', '10.0.3.5', '10.90.78.142', '10.16.12.1', '38.75.137.9', '13.71.172.130', '104.43.212.12', '40.124.45.19', '40.77.232.95', '20.185.182.48'])]
IpAddress.whois(ip_list["ipv4"])

4.4 Joining input to output

IpAddress.whois(ip_list["ipv4"], join="left")

Creating Pivot pipelines - DataFrames as input and output

list(ip_list["ipv4"])[:4]
['10.4.5.16', '10.90.78.71', '10.4.5.12', '65.55.44.109']
( IpAddress.whois(list(ip_list["ipv4"])[:4], join="left") .mp_pivot.run(IpAddress.geoloc, input_col="ip_column", join="left") .mp_pivot.run(IpAddress.tilookup_ipv4, input_col="ip_column", join="left") )

4.5 RiskIQ Pivots

display(IpAddress.RiskIQ.whois("185.191.34.209")) display(IpAddress.RiskIQ.articles("185.191.34.209"))
( Dns.RiskIQ.resolutions("teamworks455.com") .query("recordtype=='A'") .mp_pivot.run(IpAddress.util.geoloc, column="resolve", join="left") .mp_pivot.display() .mp_pivot.run(IpAddress.RiskIQ.resolutions, column="resolve", join="left") )

5. Visualization

  • 5.1 Timelines and timeline values

  • 5.2 Matrix plots for large data sets

  • 5.3 Process Trees

  • 5.4 Time series for temporal pattern anomalies

Most visualization functionality is available through DataFrame.mp_plot.vis_func()

5.1 Timelines and timeline values

type(logons_df)
pandas.core.frame.DataFrame
logons_df.mp_plot.timeline(group_by="Account", time_column="TimeGenerated")
MIME type unknown not supported
MIME type unknown not supported
logon_count_df = ( logons_df[["Account", "TimeGenerated", "EventID"]] .groupby(["Account", pd.Grouper(key="TimeGenerated", freq="10min")]) .count().reset_index() ) logon_count_df.mp_plot.timeline_values(group_by="Account", y="EventID", kind=["circle", "vbar"], source_columns=["Account"])
MIME type unknown not supported
MIME type unknown not supported

5.3 Matrix plots

Often these are useful at large scale for showing patterns of behavior and highlighting significant changes.

norm_failed_df = pd.read_pickle("../data/failed_logons_det_df.pkl") comb_failed_df = pd.read_pickle("../data/combined_df.pkl") norm_failed_df.mp_plot.matrix(y="IPAddress", x="Location", title="Normal failed logons", height=300) comb_failed_df.mp_plot.matrix(y="IPAddress", x="Location", title="Suspect spray attack", height=300)
MIME type unknown not supported
MIME type unknown not supported
MIME type unknown not supported
MIME type unknown not supported

5.4 Process Trees

Schema-dependent

Works with:

  • MS Sentinel/WEVT Windows process events

  • Linux Auditd logs

  • MDE DeviceProcess events

  • Sysmon - thanks Nicholas Bareil!

Custom schemas can be used.

qry_m365.connect() proc_df = qry_m365.MDATP.list_host_processes( host_name="atevet06cl003.defenderatevet06.onmicrosoft.com", start=-0.5, end=0 )
Connected.
proc_df.mp_plot.process_tree(legend_col="InitiatingProcessAccountName")
MIME type unknown not supported
MIME type unknown not supported
(Figure(id='4235', ...), Row(id='4354', ...))

5.5 Time series analysis

Note: your data set must:

  1. Be at least 1 week long

  2. Grouped/aggregated by a time interval (e.g. 1 hour)

  3. Have a scalar value column (number of logons, bytes transmitted, etc.)

logons_by_hour_df = ( comb_failed_df[['TimeGenerated', 'OperationName']] .groupby(pd.Grouper(key="TimeGenerated", freq="1h")) .count() .fillna(0) .sort_index() ) logons_by_hour_df.head(5)
from msticpy.analysis.timeseries import timeseries_anomalies_stl from msticpy.nbtools.timeseries import display_timeseries_anomalies logons_by_hour_df = pd.read_pickle("../data/failed_logons_hourly.pkl") ts_analysis = timeseries_anomalies_stl(logons_by_hour_df) display_timeseries_anomalies(ts_analysis, y="count", period=7, height=600)
MIME type unknown not supported
MIME type unknown not supported
from msticpy.analysis.timeseries import find_anomaly_periods from msticpy.common.timespan import TimeSpan periods = find_anomaly_periods(ts_analysis.sort_values("TimeGenerated")) periods
[TimeSpan(start=2021-11-08 07:00:00+00:00, end=2021-11-08 09:00:00+00:00, period=0 days 02:00:00), TimeSpan(start=2021-11-15 06:00:00+00:00, end=2021-11-15 14:00:00+00:00, period=0 days 08:00:00), TimeSpan(start=2021-11-22 07:00:00+00:00, end=2021-11-22 09:00:00+00:00, period=0 days 02:00:00)]
anomaly_time = nbwidgets.QueryTime(timespan=periods[1]) anomaly_time
VBox(children=(HTML(value='<h4>Set query time boundaries</h4>'), HBox(children=(DatePicker(value=datetime.date…
aad_logins = qry_prov.Azure.list_all_signins_geo(anomaly_time)
( aad_logins.query("ResultType != '0'") [["ResultDescription", "UserPrincipalName", "TimeGenerated"]] .groupby(["ResultDescription", "UserPrincipalName"]).count() )

6. Extras

6.1 OTRF Security Datasets

from msticpy.data.browsers.mordor_browser import MordorBrowser mordor = MordorBrowser() mordor
Retrieving Mitre data... Retrieving Mordor data...
Downloading Mordor metadata: 100%|██████████| 96/96 [00:00<00:00, 6000.43 files/s]
VBox(children=(VBox(children=(HTML(value='<h2>Mordor dataset browser</h2>'), Select(description='Data sets', l…
<msticpy.data.browsers.mordor_browser.MordorBrowser at 0x1ca4ddd07f0>
mordor.current_dataset.mp_plot.timeline(group_by="EventID", time_column="EventTime")
MIME type unknown not supported
MIME type unknown not supported

7. Conclusion and Resources

Conclusion

  1. Notebooks give the kind of flexibility not found in any SIEM (including Sentinel)

    • Create your own reusable analysis flows

    • Capture progress as it happens

    • Automate complex detections and observation patterns

  2. There is a learning curve - be prepared to invest some time

  3. But the payoff in capability is worth it

  4. Lots of functionality in MSTICPy and more being added all the time

Actions

Watch the training videos 📺

Watch InfoSec Jupyterthon workshops 📺

Play around with template and sample notebooks

Visit MSTICPy GitHub repo and leave us a star ⭐

Read the MSTICPy docs 📓

MSTICPy Hackathon - On Now!

Contacts