Path: blob/master/tutorials-and-examples/deprecated-notebooks/Entity Explorer - Windows Host.ipynb
3253 views
Windows Host Explorer
Details...
Notebook Version: 1.0
Python Version: Python 3.6 (including Python 3.6 - AzureML)
Required Packages: kqlmagic, msticpy, pandas, numpy, matplotlib, bokeh, networkx, ipywidgets, ipython, scikit_learn, dnspython, ipwhois, folium, maxminddb_geolite2
Data Sources Required:
Log Analytics - SecurityAlert, SecurityEvent (EventIDs 4688 and 4624/25), AzureNetworkAnalytics_CL, Heartbeat
(Optional) - VirusTotal, AlienVault OTX, IBM XForce, Open Page Rank, (all require accounts and API keys)
Brings together a series of queries and visualizations to help you determine the security state of the Windows host or virtual machine that you are investigating.
Contents
- 1 Windows Host Explorer
- 2 Search for a Host name and query host properties
- 3 Related Alerts
- 4 Host Logons
- 5 Other Security Events
- 6 Examine Logon Sessions
- 7 Check for IOCs in Commandline for selected session
- 8 Network Check Communications with Other Hosts
- 9 Appendices
Notebook initialization
The next cell:
Checks for the correct Python version
Checks versions and optionally installs required packages
Imports the required packages into the notebook
Sets a number of configuration options.
This should complete without errors. If you encounter errors or warnings look at the following two notebooks:
If you are running in the Microsoft Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:
You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. There are more details about this in the ConfiguringNotebookEnvironment notebook and in these documents:
Get WorkspaceId and Authenticate to Microsoft Sentinel
Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:
instead of
Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
On successful authentication you should see a popup schema button. To find your Workspace Id go to Log Analytics. Look at the workspace properties to find the ID.
Authentication and Configuration Problems
Click for details about configuring your authentication parameters
The notebook is expecting your Microsoft Sentinel Tenant ID and Workspace ID to be configured in one of the following places:
config.jsonin the current foldermsticpyconfig.yamlin the current folder or location specified byMSTICPYCONFIGenvironment variable.
For help with setting up your config.json file (if this hasn't been done automatically) see the ConfiguringNotebookEnvironment notebook in the root folder of your Azure-Sentinel-Notebooks project. This shows you how to obtain your Workspace and Subscription IDs from the Microsoft Sentinel Portal. You can use the SubscriptionID to find your Tenant ID). To view the current config.json run the following in a code cell.
%pfile config.json
For help with setting up your msticpyconfig.yaml see the Setup section at the end of this notebook and the ConfigureNotebookEnvironment notebook
Search for a Host name and query host properties
Browse List of Related Alerts
Select an Alert to view details
Successful Logons - Timeline and LogonType breakdown
Failed Logons
Accounts With Failed And Successful Logons
This query joins failed and successful logons for the same account name. Multiple logon failures followed by a sucessful logon might indicate attempts to guess or probe the user password.
Other Security Events
It's often useful to look at what other events were being logged at the time of the attack.
We show events here grouped by Account. Things to look for are:
Unexpected events that change system security such as the addition of accounts or services
Event types that occur for only a single account - especially if there are a lot of event types only executed by a single account.
Parse Event Data for Selected Events
For events that you want to look at in more detail you can parse out the full EventData field (containing all fields of the original event). The parse_event_data function below does that - transforming the EventData XML into a dictionary of property/value pairs). The expand_event_properties function takes this dictionary and transforms into columns in the output DataFrame.
 More details... You can do this for multiple event types in a single pass but, dependng on the schema of each event you may end up with a lot of sparsely populated columns. E.g. suppose EventID 1 has EventData fields A, B and C and EventID 2 has fields A, D, E. If you parse both IDs you'll will end up with a DataFrame with columns A, B, C, D and E with contents populated only for the rows that with corresponding data.
We recommend that you process batches of related event types (e.g. all user account change events) that have similar sets of fields to keep the output DataFrame manageable.
Account Change Events - Timeline
Here we want to focus on a some specific subcategories of events. Attackers commonly try to add or change user accounts and group memberships. We also include events related to addition or change of scheduled tasks and Windows services.
Show Details of Selected Events
From the above data - pick which event types you want to view (by default, all are selected). The second cell will display the event types selected.
Examine Logon Sessions
Looking at characteristics and activity of individual logon sessions is an effective way of spottting clusters of attacker activity.
The biggest problem is deciding which logon sessions are the ones to look at. We may already have some indicators of sessions that we want to examine from earlier sections:
Accounts that experienced a series of failed logons followed by successful logons [see](#Accounts With Failed And Successful Logons)
Accounts that triggered unexpected events see
In this section we use clustering to collapse repetive logons and show details of the distinct logon patterns
Browse logon account details
View distinct host logon patterns
Analyze Processes Patterns for logon sessions
In this section we look at the types of processes run in each logon session. For each process (and process characteristics such as command line structure) we measure its rarity compared to other processes on the same host. We then calculate the mean rarity of all processes in a logon session and display the results ordered by rarity. One is the highest possible score and would indicate all processes in the session have a unique execution pattern.
Note: The next section retrieves processes for time period around the logons for the user ID selected in the previous session. If you want to view a broader time boundary please adjust the query time boundaries in below.
Compute the relative rarity of processes in each session
This should be a good guide to which sessions are the more interesting to look at.
Note Clustering lots (1000s) of events will take a little time to compute.
Overview of session timelines for sessions with higher rarity score
View the processes for these Sessions
Browse All Sessions (Optional)
If the previous section did not reveal anything interesting you can opt to browse all logon sessions.
Otherwise, skip to the Check Commandline for IoCs section
To do this you need to first pick an account + logon type (in the following cell) then pick a particular session that you want to view in the subsequent cell. Use the rarity score from the previous graph to guide you.
Step 1 - Select a logon ID and Type
Step 2 - Pick a logon session to view its processes
Check for IOCs in Commandline for selected session
This section looks for Indicators of Compromise (IoC) within the data sets passed to it.
The input data for this comes from the session you picked in the following sections:
To change which section is used - go back to the desired section and pick a session or re-run the entire cell.
Extract IoCs
If any Base64 encoded strings, decode and search for IoCs in the results.
For simple strings the Base64 decoded output is straightforward. However for nested encodings this can get a little complex and difficult to represent in a tabular format.
Columns
reference - The index of the row item in dotted notation in depth.seq pairs (e.g. 1.2.2.3 would be the 3 item at depth 3 that is a child of the 2nd item found at depth 1). This may not always be an accurate notation - it is mainly use to allow you to associate an individual row with the reference value contained in the full_decoded_string column of the topmost item).
original_string - the original string before decoding.
file_name - filename, if any (only if this is an item in zip or tar file).
file_type - a guess at the file type (this is currently elementary and only includes a few file types).
input_bytes - the decoded bytes as a Python bytes string.
decoded_string - the decoded string if it can be decoded as a UTF-8 or UTF-16 string. Note: binary sequences may often successfully decode as UTF-16 strings but, in these cases, the decodings are meaningless.
encoding_type - encoding type (UTF-8 or UTF-16) if a decoding was possible, otherwise 'binary'.
file_hashes - collection of file hashes for any decoded item.
md5 - md5 hash as a separate column.
sha1 - sha1 hash as a separate column.
sha256 - sha256 hash as a separate column.
printable_bytes - printable version of input_bytes as a string of \xNN values
src_index - the index of the row in the input dataframe from which the data came.
full_decoded_string - the full decoded string with any decoded replacements. This is only really useful for top-level items, since nested items will only show the 'full' string representing the child fragment.
Threat Intel Lookup
This section takes the output from the IoC Extraction from commandlines and submits it to Threat Intelligence services to see if any of the IoC are known threats.
Please take a moment to review the Selected list below and remove any items that are obviously not items that you want to lookup (e.g. myadmintool.ps is almost certainly a PowerShell script but is also a valid match for a legal DNS domain).
If you have not used msticpy threat intelligence lookups before you will need to supply API keys for the TI Providers that you want to use. See the documentation on configuring Configuring TI Proiders
Then reload provider settings:
Network Check Communications with Other Hosts
Query Flows by Host IP Addresses
Flow Summary
Choose ASNs/IPs to Check for Threat Intel Reports
Choose from the list of Selected ASNs for the IPs you wish to check on. The Source list is been pre-populated with all ASNs found in the network flow summary.
As an example, we've populated the Selected list with the ASNs that have the lowest number of flows to and from the host. We also remove the ASN that matches the ASN of the host we are investigating.
Please edit this list, using flow summary data above as a guide and leaving only ASNs that you are suspicious about. Typicially these would be ones with relatively low TotalAllowedFlows and possibly with unusual L7Protocols.
If you have not used msticpy threat intelligence lookups before you will need to supply API keys for the TI Providers that you want to use. Please see the section on configuring Configuring TI Proiders
Then reload provider settings:
GeoIP Map of External IPs
Appendices
Available DataFrames
Saving Data to Excel
To save the contents of a pandas DataFrame to an Excel spreadsheet use the following syntax
Configuration
msticpyconfig.yaml configuration File
You can configure primary and secondary TI providers and any required parameters in the msticpyconfig.yaml file. This is read from the current directory or you can set an environment variable (MSTICPYCONFIG) pointing to its location.
To configure this file see the ConfigureNotebookEnvironment notebook