Path: blob/master/tutorials-and-examples/feature-tutorials/Data_Queries.ipynb
3253 views
Title: msticpy - Data
Description:
This package provides functions to allow for the defining of data sources, connectors to them, and queries for them as well as the ability to call these elements to return query result from the defined data sources. The package currently support connections to Log Analytics/Microsoft Sentinel/Azure Security Center, and the Microsoft Security Graph.
The first step in using this package is to install the msticpy package.
Instantiating a Query Provider
In order to connect to and query a data source we need to define what sort of Data Environment we want to connect to and query (in this Notebook we will use Log Analytics as an example). To view the options available you can call QueryProvider.list_data_environments() which will return a list of all the available options.
After selecting a Data Environment we can initialize our Query Provider by calling QueryProvider(DATA_ENVIRONMENT). This will load the relavent driver for connecting to the data environment we have selected as well as provisioning a query store for us and adding queries from our default query directory.
There are two other optional parameters we can pass when initializing our Query Providers to further customize it:
We can also chose to initialize our Query Provider with a driver other than the defualt one with
QueryProvider(data_environment=DATA_ENVIRONMENT, driver=QUERY_DRIVER)We can choose to import queries from a custom query directory (see - Creating a new set of queries for more details) with
QueryProvider(data_environment=DATA_ENVIRONMENT, driver=QUERY_DRIVER, query_path=QUERY_DIRECTORY_PATH).
For now we will simply create a Query Provider with default values.
Connecting to a Data Environment
Once we have instantiated the query provider and loaded the relevent driver we can connect to the Data Environment. This is done by calling the connect() function of the Query Provider we just initialized and passing it a connection string to use.
For Log Analytics/Microsoft Sentinel the connection string is in the format of loganalytics://code().tenant("TENANT_ID").workspace("WORKSPACE_ID"). Other Data Environments will have different connection string formats.
{"name":"stdin","output_type":"stream","text":"Workspace ID xxxxxxxxxxxxxxxxxxxxxxxxxxx\nTenant ID xxxxxxxxxxxxxxxxxxxxxxxxxxx\n"}Reviewing available queries
Upon connecting to the relevant Data Environment we need to look at what query options we have available to us. In order to do this we can call QUERY_PROVIDER.list_queries(). This will return a generator with the names of all the queries in our store.
The results returned show the data family the query belongs to and the name of the specific query.
To get further details on a specific query call QUERY_PROVIDER.DATA_FAMILY.QUERY_NAME('?') or QUERY_PROVIDER.DATA_FAMILY.QUERY_NAME('help')
This will display:
Query Name
What Data Environment it is designed for
Short description of what the query does
What parameter the query can be passed
The raw query that will be run
Running an pre-defined query
To run a query from our query store we again call QUERY_PROVIDER.DATA_FAMILY.QUERY_NAME(**Kwargs) but this time we simply pass required parameters for that query as key word arguments.
This will return a Pandas DataFrame of the results with the columns determined by the query parameters. Should the query fail for some reason an exception will be raised.
It is also possible to pass queries objects as arguments before defining keywork arguments. For example if I wanted to define query times as an object rather than defining a start and end via keywork arguments I could simply pass a querytimes object to the pre-defined query.
Running an ad-hoc query
It is also possible to run ad-hoc queries via a similar method. Rather than calling a named query from the Query Provider query store, we can pass a query directly to our Query Provider with QUERY_PROVIDER.exec_query(query=QUERY_STRING). This will execute the query string passed in the parameters with the driver contained in the Query Provider and return data in a Pandas DataFrame. As with predefined queries an exception will be raised should the query fail to execute.
Creating a new set of queries
msticpy provides a number of pre-defined queries to call with using the data package. You can also add in additional queries to be imported and used by your Query Provider, these are defined in YAML format files and examples of these files can be found at the msticpy GitHub site https://github.com/microsoft/msticpy/tree/master/msticpy/data/queries.
The required structure of these query definition files is as follows:
metadata
version: The version number of the definition file
description: A description of the purpose of this collection of query definitions
data_environments[]: A list of the Data Environments that the defined queries can be run against (1 or more)
data_families[]: A list of Data Families the defined queries related to, these families are defined as part of msticpy.data.query_defns
tags[]: A list of tags to help manage definition files
defaults: A set of defaults that apply to all queries in the file
metadata: Metadata regarding a query
data_source: The data source to be used for the query
parameters: Parameters to be passed to the query
name: The parameter name
description: A description of what the parameter is
type: The data type of the parameter
default: The default value for that parameter
sources: a set of queries
name: The name of the query -description: A description of the query's function -metadata: Any metadata associated with the query -args: The arguments of the query -query: The query to be executed -uri: A URI associated with the query -parameters: Any parameters required by the query not covered by defaults - name: The parameter name - description: A description of what the parameter is - type: The data type of the parameter - default: The default value for that parameter
There are also a number of tools within the package to assist in validating new query definition files once created.
validate_query_defs() does not perform comprehensive checks on the file but does check key elements required in the file are present.
Once imported the queries in the files appear in the Query Provider's Query Store alongside the others and can be called in the same manner as pre-defined queries.
If you have created a large number of query definition files and you want to have the automatically imported into a Query Provider's query store at initialization you can specify a directory containing these queries in the msticpyconfig.yaml file under QueryDefinitions: Custom:
For example if I have a folder at C:\queries I will set the config file to:
QueryDefinitions: Default: "queries" Custom: - "C:\queries" - "C:\queries2
Having the Custom field populated will mean the Query Provider will automatically enumerate all the YAML files in the directory provided and automatically import he relevant queries into the query store at initialization alongside the default queries. Custom queries with the same name as default queries will overwrite default queries.
If you are having difficulties with a defined query and it is not producing the expected results it can be useful to see the raw query exactly as it is passed to the Data Environment. If you call a query with 'print' and the parameters required by that query it will construct and print out the query string to be run.