Path: blob/master/tutorials-and-examples/feature-tutorials/ProcessTree.ipynb
3253 views
msticpy - ProcessTree
This notebook demonstrates the use of the process tree data and visualization modules. These modules can be used with either Windows process creation events (ID 4688) or Linux auditd logs.
You must have msticpy installed to run this notebook:
There are two main components:
Process Tree creation - this takes a standard log from a single host and builds the parent-child relationships between processes in the data set. There are a set of utility functions to extract individual and partial trees from the processed data set.
Process Tree visualization - this takes the processed output from the previous component and displays the process tree using Bokeh plots.
Note The expected schema for the Linux audit data is as produced by the auditdextract.py module in msticpy. This module combines related process exec messages into a single combined message that emulates the Windows 4688 event. This retains the audit schema apart from the following additions:
cmdline: this is a concatenation of thea0,a1, etc argument fieldsEventType: this is the audit message type (SYSCALL,EXECVE,CWD, etc.) - the combinedSYSCALL_EXECVEcreated byauditextractis the only type currently supported.
Support for other formats such as Sysmon, Microsoft Defender is planned but not yet included.
Extracting Process Trees from logs
The input can be either Windows 4688 events or Linux audit events (with the above caveats).
Import libraries and read in test data. Then call build_process_tree to extract the parent-child relationships between processes.
Process Tree utils module
The module is imported as follows:
or explicitly
The module contains functions for building the parent-child relations as well as a number of utility functions for manipulating and extracting the trees. Most of these are described in the later section Process Tree utility functions.
Plotting a Process Tree
Plotting Syntax
nbdisplay.plot_process_tree( data, schema=None, output_var=None, legend_colNone, show_table=False, )
Parameter descriptions
data : pd.DataFrame
DataFrame containing one or more Process Trees. This should be the output of
build_process_treedescribed above.
schema : ProcSchema, optional
The data schema to use for the data set, by default None. If None the schema is inferred. A schema object maps generic field names (e.g.
process_name) on to a data-specific name (e.g.exein the case of Linux audit data). This is usually not required since the function will try to infer the schema from fields in the input DataFrame.
output_var : str, optional
Output variable for selected items in the tree, by default None. Setting this lets you return the keys of any items selected in the bokeh plot. For example, if you supply the string "my_results" and then select one or more processes in the tree, the Python variable
my_resultswill be populated with a list of keys (index items) of the corresponding rows in the input DataFrame.
legend_col : str, optional
The column used to color the tree items, by default None. If this column is a string, the values will be treated as categorical data and map unique values to different colors and display a legend of the mapping. If this column is a numeric or datetime value, the values will be treated as continuous and a color gradient bar will be displayed indicating the mapping of values on to the color gradient.
show_table: bool
Set to True to show the data table, by default False. Shows the source values as a data table beneath the process tree.
Caveats
Large data sets (more than a few hundred processses)
These will normally be handled well by the Bokeh plot (up to multiple tens of thousands or more) but it will make navigation of the tree difficult. In particular, the range tool (on the right of the main plot) will be difficult to manipulate. Split the input data into smaller chunks before plotting.
Font Size
The font size does not scale based on how much data is shown. If you use the range tool to select too large a subset of the data in the main plot, the font will become unreadable. If this happens, use the reset tool to set the plot back to its defaults.
Linux Process Tree
Note This assumes that the Linux audit log has been read from a file using msticpy.sectools.auditdextract.read_from_file() or read from Microsoft Sentinel/Log Analytics using the LinuxAudit.auditd_all query and processed using msticpy.sectools.auditdextract.extract_events_to_df()` function.
Using either of these, the process exec events related to a single process start are merged into a single row.
Plotting Using a color gradient
Process Tree utility functions
The process_tree_utils module has a number of functions that may be useful in extracting or manipulating process trees or tree relationships.
Functions
build_process_key
build_process_tree
get_ancestors
get_children
get_descendents
get_parent
get_process
get_process_key
get_root
get_root_tree
get_roots
get_siblings
get_summary_info
get_tree_depth
infer_schema
get_summary_info
Get summary information.
get_roots
Get roots of all trees in the data set.
get_descendents
Get the full tree beneath a process.
get_children
Get the immediate children of a process
get_tree_depth
Get the depth of a tree.
get_parent and get_ancestors
Get the parent process or all ancestors.
get_process and build_process_key
Get a process record by its key. Build a key from a process object (pandas Series).
get_siblings
Get the siblings of a process.
Some functions take an include_source parameter. Setting this to True returns the source process with the result set.