Creating and managing pipelines
After you start a pipeline, you will be redirected to the pipeline details view.
This page will allow you to manage your pipeline job graph.
As soon as you start editing you enter the edit mode. A black "Changes pending" bar will appear at the top, like in the following screen:
When you create a new pipeline, you will already start in edit mode.
Pipeline job graph
The pipeline job graph is a visual representation of your pipeline. It is composed of the following elements:
- Sources
- Filters
- Parser
- Data warehouse
- Exceptions (Rule Engine)
- Log reducer
- Sinks
Sources
Sources are the entry point of your pipeline. They are the place where your logs are coming from. You can have multiple sources in a pipeline.
A source can be an integration like Datadog.
If you don't have any integrations at this point, you will see the following screen:
Click on the create integration button to create a new integration.
Once you have created your integration, you can select it as a source for your pipeline, by clicking on the add source button.
Once added, you can see the source in the pipeline job graph. For datadog agents, you will have to add the ingestion URL to your agent configuration.
Filters
Filters are used to filter out logs that you don't want to keep. You can have multiple filters in a pipeline.
There are three places you can add filters, if you want to drop logs at a certain point in the pipeline:
- After the source
- After the parser
- After the log reducer
To add a filter, click on the add filter button.
You will then see an input where you can put you query. This is a Datadog-syntax query for the logs you would like to keep. Everything else is dropped.
Parser
The Grepr allows you to add parsing rules to your logs using Grok patterns. To add a parser, click on the add parser button.
You will then see an input where you can put your parsing rules.
Data warehouse
The data warehouse is where your raw logs will be stored before being reduced. To add a data warehouse, click on the add data warehouse button. This will allow you to select a data warehouse integration.
Exceptions
You can specify exceptions as part of the Rule Engine that will dynamically affect the processing of logs.
You can add multiple exceptions to your pipeline, and there are multiple types that you may choose from. See more details here.
External trigger exceptions
In some situations, you might want to trigger an exception from an external callback. For example, you might have an alert configured in your observability tool that might indicate an incident. You might want to make sure that all relevant logs for that incident are not aggregated, so that when an engineer goes to troubleshoot, those logs are already available.
This feature enables a webhook in Grepr to trigger an "exception". When this webhook is called, Grepr can stop aggregating data for a customizable time period, and can also load some historical data. To configure this trigger, there are a few parts to configure/setup:
- Creating an API key to call the hook externally
- Defining relevant data via scoping keys
- Backfill timespan
- How long to stop aggregation
- Generating callback payload and URL
Creating an API key
If you haven't already, create an API key by going to the API key page at https://[ORG_ID].app.grepr.ai/api-keys
. Copy and save the API key so you can use it in your call when you configure it.
Defining relevant data
When a trigger call arrives, it needs to tell Grepr what the "scope" of the exception is. The scope defines the tags and attributes, what we call "scope keys", that should be used for backfills and for selecting messages to pass through unaggregated. For tags, just use the tag key such as host
. For attributes, prepend a @
and use a dotted JSON path, such as @url.host
.
The API call payload will need to provide these keys as part of the call. One wrinkle here is that you'll need to prepend __
(a double underscore) to the names of all these scoping keys in the payload.
Backfill timespan
This defines the length of time before the current time for which to load unaggregated data back to your observability tool. Note that the backfill takes a couple of minutes usually before it actually starts to allow for some all the data to make it to the data lake and become available for querying.
How long to stop aggregation
You can also optionally define how long to stop aggregating relevant, in-scope data. After this time period ends, data will be aggregated again.
Generating the callback payload and URL
The API call is a POST request to a URL like https://[ORG_ID].app.grepr.ai/api/v1/triggers. The payload is in JSON and consists of the following:
"__@<attribute path>": "<attribute value>",
"__<tag key>": "<tag value>",
...
To make this easier, we provide a cURL command example that you can copy and use, based on the configuration of the trigger so far. You will need to update the values to your needs.
If you are creating a new pipeline, you will see the Generate cURL command button disabled, because the pipeline needs to be created first.
Once your pipeline is successfully created, you will be able to see the button enabled and the following dialog:
From here you can copy the cURL command and run it in your terminal to trigger the exception as you need.
If you don't have an API key yet, you will see the following dialog instead:
You can click on the API key link to create a new API key. You will be redirected to the API key page, where you can create a new API key.
Once your API key is created, you can go back to the pipeline page and copy the cURL command.
Log reducer
The log reducer is used to reduce the amount of logs stored.
In the log reducer, you can choose how to group the logs and the aggregation threshold to start reducing the logs.
Sinks
Sinks are the exit point of your pipeline. They are the place where your reduced logs are going to.
You can choose your sink by clicking on the add sink button.
You can also add some additional tags you will want to add to your logs.
By default, we add processor:grepr
and pipeline:{YOUR_PIPELINE_NAME}
.
Once you have added all the elements to your pipeline, you can click on the save button to save your pipeline!