Tutorial: Build your first Grepr pipeline

This tutorial is an end-to-end example that shows you how Grepr dynamically reduces the volume of shipped logs without losing data. It does this by walking you through an example that integrates Grepr with Datadog, creates a Grepr pipeline that processes log data, and sends the resulting events to Datadog.

In this tutorial, you:

Verify that you can directly send log events from your local environment to a Datadog host.
Configure a Grepr integration with Datadog.
Configure a Grepr storage integration to store raw logs in an AWS S3 bucket.
Create a Grepr pipeline that processes raw log data and sends the resulting processed events to Datadog.
Use Docker to simulate multiple hosts sending log events to Grepr by running multiple Datadog agents in your local environment.
Run the pipeline and view the output in the Grepr and Datadog UIs.

This tutorial uses Datadog as the observability tool. However, you can use this tutorial to create a pipeline that integrates with any Grepr-supported observability tool. For the complete list of supported observability vendors, see Supported integrations.
To show you the most direct way to create a pipeline in the Grepr UI, this tutorial walks you through creating an end-to-end pipeline directly in the Grepr pipeline editor. However, to give you more control over configurations, the Grepr UI also includes separate interfaces to configure pipeline components such as integrations.
This tutorial shows you how to send logs to Grepr using Datadog agents running in local Docker containers. However, if you have existing infrastructure forwarding logs to Datadog, you can use that infrastructure for this example. The Grepr Datadog integration also supports log collectors other than Datadog agents. To learn more about supported collectors, including configuration details, see The Grepr integration for Datadog.

Requirements

To complete this tutorial, you must have:

A Grepr account, and a login to your organization’s Grepr UI. If you don’t have a Grepr account, sign up at https://app.grepr.ai/signup .
An organization identifier. If you don’t already have one, you can create your organization ID when you sign up for a Grepr account. The organization ID can contain only lowercase letters, the digits 0 to 9, and hyphens. It must be a minimum of three characters and cannot exceed 50 characters in length.
An account with an observability vendor supported by Grepr. This tutorial uses Datadog. If you don’t have a Datadog account, you can sign up for a free trial at www.datadoghq.com/ . For the complete list of supported observability vendors, see Supported integrations.
A Datadog API key for your account. A key is created for you when you sign up for a Datadog trial account. To find the API key, in the command line examples on the Datadog Agent Setup page, you can hover over the mask covering the DD_API_KEY value.
The Datadog site parameter that the Datadog agents should use. This parameter is determined by the Datadog site you selected. For example, if you selected US1, then the value of the site parameter is datadoghq.com.
A Docker installation in your local environment. Docker is used to run the Datadog agents that send log events to Grepr.

Step 1: Verify that agents can send events to Datadog

Before working through this tutorial, you should verify that Datadog agents running in your local environment can send log events directly to Datadog. In a terminal window, run the following command, replacing DD_API_KEY_VALUE with your Datadog API key and DD_SITE_VALUE with your Datadog site parameter. Optionally replace DD_HOSTNAME with a preferred hostname for testing. This value can be used in the Datadog UI to filter incoming log events:


docker run --rm --name dd-agent \
-e DD_API_KEY=DD_API_KEY_VALUE \
-e DD_SITE="datadoghq.com" \
-e DD_LOGS_ENABLED=true \
-e DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true \
-e DD_HOSTNAME=my-test-host \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
gcr.io/datadoghq/agent:latest

While this command runs, go to the Datadog UI, click Logs, and then click Live Tail. If everything’s working correctly, you see log events arriving from the Datadog agent.

The Datadog Live Tail view illustrating logs being streamed to Datadog.

To learn more about the Datadog Docker Agent, including details for the configuration options, see the Datadog documentation .

When you’ve finished testing, shut down the agent by running the following command in a separate terminal:


docker stop dd-agent

Step 2: Create a Grepr pipeline to process log events

A Grepr pipeline defines the steps to process log data. Pipelines you create in the Grepr UI implement a flow with the following steps to reduce the volume of log data sent to your observability tools:

Read data from one or more sources. Sources use Grepr integrations, which connect Grepr to external systems such as observability vendors or cloud storage. For this tutorial, you configure a Datadog integration.
Filter the data at several stages in the pipeline.
Extract structured data from unstructured log messages using the Grepr Grok parser. See the Grepr Grok parser documentation.
Persist the processed data into your Grepr data lake. To add a data lake in the pipeline editor, you must have a Grepr storage integration and a Grepr dataset configured.
- The storage integration is used to ensure retention of all data by storing raw logs in low-cost cloud storage. This tutorial uses a storage integration that uses a Grepr-hosted AWS S3 bucket.
- The dataset defines a namespace in your storage specifying where the raw data is stored.
Reduce the data through the log reducer. To learn more about the log reducer, see Log reduction.
Send the data to one or more sinks. Like sources, sinks use a configured Grepr integration.

To create pipelines with more complex functionality, you can use the Grepr REST API. See the Job creation API guide.

To create a new pipeline:

In the Grepr UI, click Home in the navigation menu, then click Create Pipeline.
In the Create Pipeline dialog, enter a name for your new pipeline.
Click Create.

After clicking Create, the pipeline editor view appears. The left pane displays the flow for a log reduction pipeline. In the next step, you begin configuring the pipeline by adding a source.

The pipeline editor view.

Add a source for Datadog logs

To add a source in the pipeline editor, you first configure a Grepr integration. To create an integration and add a source:

Click Continue in the pipeline editor view, then click Add next to sources.
Click Add integration.
In the Add Observability Vendor dialog:
- In the Type menu, select Datadog .
- In the Name field, enter a name for the integration.
- In the Site menu, select the parameter matching your Datadog site.
- In the API Key field, enter your Datadog API key.
- Leave the Application Key field empty.
Click Create.

After clicking Create, Grepr validates the API key with Datadog, creates the integration, and returns a success message. The API key is stored securely in AWS Secrets Manager. If any issues are found, an error message is returned instead.
In the Source menu, in the Add source dialog, select the Datadog integration you created in the previous step. A name for this source is automatically added for you.
Click Submit.

Add a data lake

To add a Grepr data lake to retain raw log data, you first configure a Grepr storage integration. To create a storage integration and add a data lake to your pipeline:

Click Data Lake in the left pane, then click Add.
In the Add Data Lake dialog:
- In the Type menu, select Grepr-hosted.
- In the Name field, enter a name for the integration.
Click Next.
In the Dataset Name field, enter a name for the dataset to contain raw data. This can be any name you choose, but it should be descriptive.
In the Integration menu, select the storage integration you created in the previous step.
Click Add.

To send the processed logs to Datadog, a sink is automatically added to your pipeline. Click Sinks in the left pane to view the sink. Under Additional Tags / Attributes, you can see automatically populated tags. These tags are appended to messages before they’re sent to Datadog. Grepr recommends retaining these tags because they allow you to distinguish Grepr-processed logs from other logs.

View the automatically created sink in the Grepr UI.

Step 3: Create and start the pipeline

Click Create Pipeline in the top bar, then click Confirm in the Create pipeline dialog. Your pipeline will shortly go from Starting to Running. When the pipeline reaches the Running state, it’s ready to start processing logs. In the next step, you simulate sending log messages through your pipeline.

Step 4: Simulate multiple agents sending logs through Grepr

Before sending log messages to your pipeline, make sure no Docker containers are running the Datadog agent. In a terminal, run the following command:
```
docker stop dd-agent
```
To configure the Datadog agents to send logs to Grepr, you need the Grepr ingestion URL. To find the URL, click Sources in the left pane and copy the Ingest URL value.
To start five agents to send logs through Grepr, run the following command, replacing INGESTION_URL with the Grepr ingestion URL you copied above, and DD_API_KEY_VALUE with your Datadog API key:
```
for i in $(seq 1 5); do docker run -d --rm --name dd-agent-$i \
-e DD_API_KEY=DD_API_KEY_VALUE \
-e DD_SITE="datadoghq.com" \
-e DD_LOGS_ENABLED=true \
-e DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true \
-e DD_HOSTNAME=my-test-host-$i \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_LOGS_CONFIG_LOGS_DD_URL=INGESTION_URL \
-e DD_LOGS_CONFIG_USE_HTTP="true" \
gcr.io/datadoghq/agent:latest; done
```
This command starts five agents, each with a different hostname. There are two new environment variables added to the command: DD_LOGS_CONFIG_LOGS_DD_URL and DD_LOGS_CONFIG_USE_HTTP. The first variable provides the agents with the Grepr URL to send logs to, and the second variable tells the agents to use HTTPS instead of TCP when sending logs to Grepr.

Before sending logs for ingestion, ensure there’s at least one running pipeline that includes a Datadog source that matches the Datadog integration you created in step 2. If a pipeline doesn’t exist, the ingestion requests are not accepted.

View the results

After the agents start, to see the pipeline’s activity as it ingests log messages, go to the Grepr pipelines UI and click Overview.

Pipeline overview page showing events arriving in the Grepr pipeline.

To see the logs arriving to Datadog, go to the Datadog Log Explorer. To filter for only messages arriving from your test agents, enter host:my-test-host* in the Filter your logs text box.

The Datadog Log Explorer displaying processed logs.

When you scroll through the logs in the Datadog Log Explorer, you can see that repeated log events are aggregated. This aggregation, which reduces the data volumes sent to Datadog, is the result of the Grepr log reducer. To learn more about the log reducer, see Log reduction.

Cleaning up

When you’ve finished the tutorial, you can stop the Datadog agents by running the following command in a terminal:


for i in $(seq 1 5); do docker stop dd-agent-$i; done

To stop the pipeline, go to the Grepr pipelines UI, click Overview, and then click Stop.

Next steps

To see an example of troubleshooting incidents with Grepr and Datadog, see the troubleshooting tutorial.