Path: blob/master/tutorials-and-examples/feature-tutorials/TimeSeriesAnomaliesVisualization.ipynb
3253 views
Table of Contents
- msticpy - Time Series Analysis and anomalies Visualization
- Generating Time Series Data
- Time Series Analysis and discovering Anomalies
- Time Series Anomalies Visualization
msticpy - Time Series Analysis and anomalies Visualization
This notebook demonstrates the time series analysis and anomalies visualization built using the Bokeh library as well as using built-in native KQL operators.
You must have msticpy installed along with the timeseries dependencies to run this notebook:
To run the Microsoft Sentinel timeseries queries you will also need the "azsentinel" dependencies
Time Series analysis generally involves below steps
Generating TimeSeries Data
Use Time Series Analysis functions to discover anomalies
Visualize Time Series anomalies
Read more about time series analysis in detail from reference microsoft TechCommunity blog posts
Reference Blog Posts:
Generating Time Series Data
Time Series is a series of data points indexed (or listed or graphed) in time order.
The data points are often discrete numeric points such as frequency of counts or occurrences against a timestamp column of the dataset
Using LogAnalytics Query Provider
msticpy has QueryProvider through which you can connect to LogAnalytics Data environment. via QueryProvider(data_environment="LogAnalytics")
Once you connect to data environment (qry_prov.connect()), you can list the available queries (qry_prov.list_queries()) for the data environment which in this case is LogAnalytics.
Displaying available timeseries queries
For this notebook, we are interested in time series queries only, so we will filter and display only those.
Get TimeSeries Data from LogAnalytics Table
You can get more details about the individual query by executing qry_prov.MultiDataSource.get_timeseries_data('?') which will display Query, data source, parameters and parameterized raw KQL query
Sample python code leveraging KQL query will look like this
Time Series Analysis and discovering Anomalies
By analyzing time series data over an extended period, we can identify time-based patterns (e.g. seasonality, trend etc.) in the data and extract meaningful statistics which can help in flagging outliers. A particular example in a security context is user logon patterns over a period of time exhibiting different behavior after hours and on weekends: computing deviations from these changing patterns is rather difficult in traditional atomic detections with static thresholds. KQL built-in functions can automatically identify such seasonality and trend from the input data and take it into consideration when flagging anomalies.
Using Built-in KQL to generate TimeSeries decomposition
In this case, we will use built-in KQL function series_decompose() to decompose time series to generate additional data points such as baseline, seasonal , trend etc.
KQL Reference Documentation:
You can use available query qry_prov.MultiDataSource.get_timeseries_decompose() to get the similar details
Sample python code leveraging KQL query will look like this
Using MSTICPY - Seasonal-Trend decomposition using LOESS (STL)
In this case, we will use function msticpy function timeseries_anomalies_stl which leverages STL method from statsmodels API to decompose a time series into three components: trend, seasonal and residual. STL uses LOESS (locally estimated scatterplot smoothing) to extract smooths estimates of the three components. The key inputs into STL are:
season - The length of the seasonal smoother. Must be odd.
trend - The length of the trend smoother, usually around 150% of season. Must be odd and larger than season.
low_pass - The length of the low-pass estimation window, usually the smallest odd number larger than the periodicity of the data.
More info : https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html#statsmodels.tsa.seasonal.STL
Documentation of timeseries_anomalies_stl function
timeseries_anomalies_stl(data: pandas.core.frame.DataFrame, **kwargs) -> pandas.core.frame.DataFrame Discover anomalies in Timeseries data using STL (Seasonal-Trend Decomposition using LOESS) method using statsmodels package.
Discover anomalies using timeseries_anomalies_stl function
We will run msticpy function timeseries_anomalies_stl on the input data to discover anomalies.
Displaying Anomalies using STL
We will filter only the anomalies (with value 1 from anomalies column) of the output dataframe retrieved after running the msticpy function timeseries_anomalies_stl
Read From External Sources
If you have time series data in other locations, you can read it via pandas or respective data store API where data is stored. The pandas I/O API is a set of top level reader functions accessed like pandas.read_csv() that generally return a pandas object.
Read More at Pandas Documentation:
Example of using Pandas read_csv to read local csv file containing TimeSeries demo dataset. Additional columns in the csv such as baseline, score and anoamlies are generated using built-in KQL Time series functions such as series_decompose_anomalies().
Displaying Anomalies Separately
We will filter only the anomalies shown in the above plot and display below along with associated aggreageted hourly timewindow. You can later query for the time windows scope for additional alerts triggered or any other suspicious activity from other datasources.
Displaying Time Series anomaly alerts
You can also use series_decompose_anomalies() which will run Anomaly Detection based on series decomposition. This takes an expression containing a series (dynamic numerical array) as input and extract anomalous points with scores.
KQL Reference Documentation:
You can use available query qry_prov.MultiDataSource.get_timeseries_alerts() to get the similar details
Sample python code leveraging KQL query will look like this
Time Series Anomalies Visualization
Time series anomalies once discovered, you can visualize with line chart type to display outliers. Below we will see 2 types to visualize, using msticpy function display_timeseries_anomalies() via Bokeh library as well as using built-in KQL render operator.
Using Bokeh Visualization Library
Documentation for display_timeseries_anomalies
Exporting Plots as PNGs
To use bokeh.io image export functions you need selenium, phantomjs and pillow installed:
conda install -c bokeh selenium phantomjs pillow
or
pip install selenium pillow npm install -g phantomjs-prebuilt
For phantomjs see https://phantomjs.org/download.html.
Once the prerequisites are installed you can create a plot and save the return value to a variable. Then export the plot using export_png function.
Sample code to export png
Using Built-in KQL render operator
Render operator instructs the user agent to render the results of the query in a particular way. In this case, we are using timechart which will display linegraph.
KQL Reference Documentation:
sample python code with KQL query leveraging render operator on time series data will look like below
Rendered output for the above code look like below image