APIs
Rule Engine Guide

Rule Engine API Guide

Jump to API spec

The Rule Engine can dynamically affect the processing of logs based on the provided context. The user has the power to configure the rule engine, via the attributes mentioned below, that dictate the actions that it takes. One example usage scenario is if an incident occurs and the user wants to stop log aggregation, and backfill already aggregated data for a period of time for a service.

The Rule Engine has the following configuration options:

VariablesDescription
triggersMap of trigger id to trigger objects. The ids are a unique name for the trigger.
conditionsMap of condition id to condition objects. The ids are a unique name for the conditions.

Example definition of a Rule Engine as a vertex in the Grepr job graph:

{
    "name": "my-rule-engine",
    "type": "rule-engine",
    "triggers": {
        "trigger1": {
            "type": "event-predicate",
            "predicate": {
                "type": "datadog-query",
                "query": "service:my-service"
            },
            "conditionIds": ["condition1"],
            "duration": "5m",
            "variables": {
                "__host": "host" // Extracts host tag from the matching event
            }
        }
    },
    "conditions": {
        "condition1": {
            "actionRules": [
                {
                    "type": "event-rule",
                    "actionPredicate": {
                        "type": "datadog-query",
                        "query": "host:__host" // Matches events with host tag value extracted from the event that fired trigger1
                    },
                    "actions": [
                        {
                            "type": "tag-action",
                            "order": 0,
                            "modification": "ADD",
                            "tagKey": "new-tag",
                            "values": ["new-value"] // Adds {new-tag: [new-value]} to tags
                        }
                    ]
                },
                {
                    "type": "job-rule",
                    "actions": [
                        {
                            "type": "backfill-job-action",
                            "name": "my_backfill_job",
                            "order": 1,
                            "backfillTimespan": "PT10M",
                            "jobTags": {
                                "job-type": "backfill"
                            },
                            "backfillQuery": { // Setting the query to define the context for the backfill
                                "type": "datadog-query",
                                "query": "host:__host"
                            },
                            "rawTableOperationName": "my_rawlogs_sink",
                            "dedupTableOperationName": "my_deduplogs_sink",
                            "sinkOperationName": "sink",
                            "limit": 50000
                        }
                    ]
                }
            ]
        }
    }
}

Condition

Conditions describe the state of the environment. When a condition is "triggered", some actions can be taken by Grepr and, those are described by the condition itself. A condition has a duration, and actions can be taken when the condition starts or for all messages while the condition is ongoing depending on the type of the action.

The actions are described in the condition as action rules. They can be of the following types:

  1. Event Action Rule: This takes actions on events that match a certain predicate while the condition is active.
  2. Job Action Rule: This executes a set of associated job actions when a condition is first triggered.

Event Action Rule

This is executed on events that match a certain predicate, for the entire duration that the associated condition is active. The event action rule is configured by specifying a predicate for matching events and a set of actions to take on matched events.

As an example, the UI configures the rule engine and conditions such that, when there's an abnormal event or an external API call hits Grepr, events that are related to the incident are tagged with a special tag that tells the Log Reducer to skip aggregating them.

Job Action Rule

A set of job actions that are executed once per associated condition. This means, if an already active condition is extended by the (re-)firing of a trigger, the job actions will not be executed again.

As an example, the UI sets up the Rule Engine to kick off a backfill job when a trigger fires.

Trigger

A trigger kicks off a one or more conditions. Triggers may continue to fire, and that would extend the activation time of the associated conditions. Each trigger has a unique identifier and is one of these types:

  1. Event Trigger: This trigger is activated by a matching predicate on an event. The predicate is provided by the user as part of the Rule Engine configuration.
  2. External Trigger: This trigger is activated by an external source, for example, an alert from an observability tool. These will allow users to feed external stimuli into the Rule Engine to take appropriate actions.
  3. Trace Sampling Trigger This trigger allows users to make sure that complete sets of log messages would be available in the observability tool for a sample set of traces. A trace can be a user ID, a request ID, an actual trace ID, or anything that the user wants to consider an "execution identifier". This trace ID can be an attribute or a tag.

You can find the object model specification for the various trigger types here.

Variables

Variables are defined as part of the trigger configuration. These are used to extract values from the event that fired the trigger. The extracted values could be used as context for the conditions that are activated by the trigger. For example, if the trigger is activated by an event with a service tag, the service tag value could be extracted and used in the conditions to perform actions on the events that have the same service tag value.

It's a map from variable names to paths for variables to extract from a matching event. The way to reference an attribute is by using an @ in the variable path. For example, @syslog.appname will extract the value of the attribute syslog.appname from the event. If @ is not specified, the variable will be extracted from the tags of the event. For example, the path app will extract the value of the tag app from the event.

The variable names must start with '__' and be unique. Note that __timestamp (and __severity for logs) is automatically extracted from the respective top level fields of the matching event and are available for use in conditions out of the box.

Note: If the path provided does not exist for a variable, the path itself will be taken as a literal value for the variable.