Quick Start

Quick Start

This tutorial walks you through a simple setup with Datadog so you can start seeing Grepr dynamically reducing shipped log volume without loosing any data.

Prerequisites

The tutorial assumes you already have a Grepr account and can log in to your organization's Grepr UI. If not, sign up at https://app.grepr.ai/signup (opens in a new tab).

Grepr sits between an agent sending log data and an observability vendor's service. So you'll need to have an agent that you can configure to emit logs to Grepr and an account with an observability vendor that can receive logs from Grepr. This tutorial uses Datadog, but we also support many other vendors. This tutorial uses Datadog, if you don't already have a Datadog account, sign up for free (opens in a new tab).

Finally, we're going to be using Docker to run the Datadog agents, so you'll need to have Docker (opens in a new tab) installed.

Step 1: Deploy a single Datadog agent

Follow Datadog's instructions to get an API key for your account. You'll need this next.

We're going to simulate the existence of multiple machines by deploying a few Datadog agents on your machine so you can see log reduction in action. But first let's get one agent going without Grepr to verify you can see logs in Datadog. Make sure you modify the command below to use your API key. If you're using a site other than US1, you'll also need to provide or update the DD_SITE="datadoghq.com" environment variable below to match your site:

docker run --rm --name dd-agent \
-e DD_API_KEY="${DD_API_KEY}" \
-e DD_SITE="${DD_SITE:-datadoghq.com}" \
-e DD_LOGS_ENABLED=true \
-e DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true \
-e DD_HOSTNAME=my-test-host \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
gcr.io/datadoghq/agent:latest

Note that in the above command we have explicitly set the hostname that is being collected to "my-test-host". Now, you can go the Datadog log explorer dashboard, turn on "Live tail", filter by host:my-test-host and watch the logs come in.

Datadog Live Tail

If you're not seeing logs arriving, please review the Datadog agent documentation (opens in a new tab) to make sure you have it configured correctly. If all is well, you can now turn off the agent by running:

docker stop dd-agent

Step 2: Setup an integration for Grepr with Datadog

Now we'll start the process of integrating with Grepr. As a first step, you'll need to tell Grepr about Datadog and give it the API key you used above. An integration connects Grepr to an external system like Datadog. Another integration you'll add later is an S3 bucket that will be used to store your raw logs.

At the top navigation bar in the Grepr UI, click on the "Integrations" link.

Integrations page

Next to "Observability Vendors", click on the "Add new" button. Select "Datadog" from the dropdown, set the name to something you like, select the Datadog site, and fill in the form with the API key you used above. You don't need the Application Key key for this quick tutorial.

Add new integration

Finally, click "Create". Grepr will validate the key with Datadog and let you know if there are any issues. The key is stored securely into AWS Secrets Manager. You'll see a success message when the integration is created.

Integration created

Step 3: Create a storage integration

Next, you'll need to tell Grepr where to store the raw logs. For this tutorial, we're going to use the Grepr-hosted S3 bucket to get going quickly.

Next to "Data warehouses" on the Integrations page, click on the "Add new" button. Select "Grepr-hosted" from the dropdown, set the name to something you like, and click "Create".

Add new data warehouse

You'll see a success message when the data warehouse is created.

Data warehouse created

Step 4: Create a pipeline in Grepr

A pipeline defines the steps taken when processing the log data. The pipelines you can create in the Grepr UI have a set structure focused on log data reduction.

  • add one or more sources
  • filter the data through a series of filters at various stages
  • parse the log messages using Grok patterns
  • store the data into a data warehouse
  • reduce the data through the log reducer
  • send the data to one or more sinks.

To have more control over a pipeline, you can use the API to create more complex pipelines.

On the top navigation bar, click on the "Pipelines" link.

Pipelines page

Then click on "Create Pipeline" and give your new pipeline a name.

Create pipeline

You will now be in the pipeline editing view. You'll notice what a generic pipeline looks like on the left panel.

Pipeline editor

Add a source

Click "Continue" to go to adding a source, then click on the "Add" button. Select the Datadog integration you created earlier, a name will be automatically added for you.

Add source form

Click "Add". You'll notice that you're now in "Edit mode" for the pipeline. This pipeline will now expose an endpoint where it will be able to collect logs being sent to it from a Datadog agent.

Pipeline edit mode

Add a data warehouse

Next click on "Data Warehouse" on the left to go to the data warehouse section and click "Add" to tell the pipeline to add the data warehouse you created earlier.

Add data warehouse

Click "Add" and you'll see the data warehouse added to the pipeline.

Data warehouse added

Add a sink

Next, we'll want to add a sink so that all the processed logs can be sent to Datadog. Click on "Sinks" on the left to go to the sink section and click "Add". Select the Datadog integration you created earlier, a name will be automatically added for you.

Add sink

You'll notice that there are some tags already populated that Grepr will append to messages being sent to Datadog. We suggest keeping those there so you can easily distinguish logs processed by Grepr from other logs. Click "Add" to add the sink to the pipeline.

Sink added

Step 5: Start the pipeline

Now click on "Create pipeline" at the top of the page and confirm the creation of the pipeline when the dialog pops up. Your pipeline is now starting. Grepr behind the scenes is setting up all the pieces needed to start processing logs.

Pipeline starting

You will see the pipeline go from "Starting" to "Running" in about 30 seconds. There's not much interesting data being reported at the moment and there won't be until we have some logs going through. So let's do that next.

Pipeline running

Step 6: Simulate multiple agents sending logs through Grepr

Let's make sure that the agent Docker container is stopped. In a terminal, run the following command:

docker stop dd-agent

Next, let's get the URL that you'll need to point your agents to. Click on "Sources" on the left to go to your sources. You'll see the source you added earlier in a table, and ingest URL under the "Ingest URL" column. Copy that. We'll use it below.

Ingest URL

Now, let's start 5 agents that will send logs through Grepr. Run the following command, substituting $INGEST_URL for the ingest URL you copied above, and ${DD_API_KEY} for the Datadog API key you used earlier:

for i in $(seq 1 5); do docker run -d --rm --name dd-agent-$i \
-e DD_API_KEY=${DD_API_KEY} \
-e DD_SITE="${DD_SITE:-datadoghq.com}" \
-e DD_LOGS_ENABLED=true \
-e DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true \
-e DD_HOSTNAME=my-test-host-$i \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_LOGS_CONFIG_LOGS_DD_URL=${INGEST_URL} \
-e DD_LOGS_CONFIG_USE_HTTP="true" \
gcr.io/datadoghq/agent:latest; done

The above command is a modified version of the command we used to start a single agent. Here, we start 5 agents, each with a different hostname. We also added two new environment variables to the command: DD_LOGS_CONFIG_LOGS_DD_URL and DD_LOGS_CONFIG_USE_HTTP. The first tells the agent where to send logs, and the second tells the agent to use HTTPS instead of TCP to send logs to Grepr.

NOTE: Make sure at least one pipeline with a datadog source corresponding to the created datadog integration exists before sending logs for ingestion. Otherwise, the ingestion requests will be not be accepted.

See The Results

Now you will see logs coming into Grepr. Go back to the Grepr UI and the pipeline detail view. If you're not there, click on the Overview step in the left panel to see some statistics on the data passing through.

Pipeline overview

Go to Datadog, and query for your logs using query host:my-test-host*. You will see logs coming in from all the agents you started.

Datadog logs

You'll notice that now instead of getting all the logs across all agents all the time, you'll be getting repeated logs aggregated together. This is the Grepr log reduction in action already saving you money.

Grepr works by bucketing similar logs together into a "pattern". It will let logs pass through as-is until it a pattern crosses a "duplication threshold". Once that happens, Grepr will start aggregating messages that belong to that pattern together into a single message. Every two minutes, Grepr will send summary messages to the sink, and start the whole cycle again.

This aggregation makes sure that low-frequency messages which are the ones that are the most important for debugging or security are not lost in the noise of high-frequency messages. It also ensures that you can always see a sample of all aggregated messages at the sink (Datadog).

Grepr has many settings to tune the log reduction behavior to your needs, such as adding exceptions, triggering backfills, and various rules to make it do exactly what you like.

More details on how the Log Reducer works can be found here.

Troubleshooting Example

Now let's say you have an incident and you need to see the logs that have been aggregated for a particular summary message. In Datadog, open one of the summary messages. These start with "Repeats XXx times...". Grepr replaces parameters that change between aggregated messages with a placeholder such as <number> or <timestamp> or <any>.

Datadog log details

In this specific message we've selected, Grepr has identified a timestamp parameter and a number parameter that change between log messages. You'll also notice some Grepr-specific fields in the details. Here is a description of the most important ones:

  • firstTimestamp and lastTimestamp are the timestamps of the first and last log message that was aggregated into this summary message.
  • patternId is the ID of the pattern that was matched.
  • rawLogsUrl is a URL that you can click on to see all the raw logs that were aggregated into this summary message.
  • repeatCount is the number of log messages that were aggregated into this summary message.

In our example, these messages don't have any attributes as they arrived from the agent. If they did, Grepr would have aggregated them and kept a few unique examples in the details.

Next, let's try to find the other messages that belong to this summary's pattern. Hover over patternId and click Filter by @grepr.patternId:xxxxx.

Datadog log filter

This will filter the logs to only show logs that belong to the same pattern as the one you selected. You may see multiple summary messages with the same pattern ID. Rest assured, the data is correct and the logs are being aggregated correctly. We sometimes emit multiple summaries to account for late-arriving data and make sure that you get all the data correctly. This could also happen if a single summary contains so many aggregated messages that we need to spill over to another summary message to make sure Datadog can handle the message size or number of aggregated tags.

Datadog log filtered

Let's open the summary message again, and this time click on the URL in the rawLogsUrl field.

Datadog log url

This will open a new tab that will execute a search in the Grepr UI for all the raw messages with the same hosts and service, within the time period of the summarized messages, and highlight all the messages with the same pattern ID.

Grepr raw logs

Clicking on one of the messages will open a side panel similar to Datadog's where you can see the log message's details. The Grepr search UI loads quickly and provides an intuitive interface for examining your logs.

Grepr raw log details

Now let's say that you'd like to actually load all these raw logs back into Datadog so you can search on them along with the other logs. On the top right of the Grepr UI, click on the dropdown next to the "Search" button and select "Backfill".

Grepr backfill

When you click on "Backfill", Grepr will start a backfill job that will load all the raw logs that have been searched back into Datadog. After a few seconds, the job will start and then complete. You can see the status of the job in the "Jobs" dropdown on the top right of the Grepr UI next to your profile picture.

Grepr backfill job

Clicking on the now "Finished" job takes you back to Datadog where you can see all the logs that were just backfilled. Note that Datadog takes a few seconds to index the logs, so you may not see them immediately after the backfill job completes.

Datadog backfilled logs

Grepr automatically ensures logs are deduplicated on backfill so you don't end up with multiple copies of the same log messages as you backfill logs across multiple searches. For example, if you try doing the same backfill again, you won't see any new logs in Datadog without changing the search parameters.

Conclusion

Congratulations, you've successfully set up Grepr to reduce and backfill logs in Datadog. You can now shutdown the agents with docker stop dd-agent-1 dd-agent-2 dd-agent-3 dd-agent-4 dd-agent-5 and you can also stop your pipeline from the Grepr UI.