Introduction

Introduction

Welcome to Grepr's documentation. Grepr is a next-generation observability data engine that allows you to collect and query observability data (logs only for now; metrics, traces, and events coming soon) and transform, analyze, search, alert and route it in real-time. Grepr is built on a battle-tested stateful stream processing engine (Apache Flink (opens in a new tab)), which enables us to do complex real-time processing and alerting on the data.

Pipeline summary

Our first use case is reducing logging costs by aligning the volume of logs in your observability tool (such as Splunk, Elastic, Datadog, New Relic etc) with the current state of your infrastructure. When everything is running smoothly, Grepr reduces the volume of data sent to your observability platform. No data is lost as part of this reduction; it is diverted to lower cost bulk storage. When there are incidents or anomalies, Grepr increases data granularity sent to your observability platform ensuring there is sufficient data available for troubleshooting. Additionally, previously diverted data relevant to the incident can be backfilled into your observability platform for continuity.

A few capabilities work together to make sure that Grepr delivers on this promise without impacting the developer experience and your MTTR:

  1. Dynamic aggregation: Grepr automatically understands the patterns in your logs by using unsupervised machine learning and aggregates similar messages together with zero configuration. This capability can reduce log volumes by 80-99% right out of the box. A ton of knobs are available to tune this behavior to your needs. This aggregation results in summaries in addition to samples being sent to your existing vendor. Summaries in logs mean that engineers can see exactly what is happening in their systems without having to sift through thousands of lines of logs. They can use the summaries to then drill down into the relevant logs in the Grepr UI.

  2. Raw data storage: All the original raw logs are stored in low-cost object storage (S3 for now; GCS, Azure later) for later retrieval and debugging. No data is dropped unless you explicitly configure it to be. This store could be a bucket that we host, or a bucket that you own.

  3. Raw data query: Logs are stored efficiently using Apache Parquet files and the Apache Iceberg table format, which enables them to be queried efficiently using our system or any other standard query engine like Spark or Trino. Our APIs and UI allow users to query using a Lucene-based language similar to Datadog's query language with support for other languages planned.

  4. Automated granularity adjustment: When an incident occurs or when there are alerts in your infrastructure, Grepr can automatically ensure that a developer has a complete set of logs to debug the issue. Grepr does this by (1) temporarily increasing the granularity of related logs passing through and (2) backfilling relevant logs from the raw store. This capability can either be triggered manually or automatically based on alerts from your monitoring system or on certain matches in the log data.

  5. REST APIs and UI: Grepr provides a web-based user interface that allows you to create and manage pipelines and to search and manage log data. The same capabilities are available through REST APIs, which allow you to automate your observability pipelines, build much more complex pipelines, and integrate with other systems.

  6. Standard observability pipeline capabilities: Grepr is built on a general-purpose stream processing engine, which enables all the standard observability pipeline capabilities like filtering, parsing, remapping, sampling, routing, etc.

  7. Security and scalability: Grepr is SOC2 Type 2 compliant, and is built with security and scalability as top concerns. Grepr automatically scales to handle any volume of logs. Performance and health of pipelines are monitored and managed by the Grepr team, so you can rest assured that you will always have the logs you need when you need them.

SaaS or Private Cloud

Grepr is mainly available as a SaaS product. For larger deployments and for customers with strict compliance requirements, Grepr also offers a Private Cloud deployment model. For both of these models, we currently only run in AWS, but are planning to support GCP and Azure in the future.

User Interface and API

Grepr offers two powerful ways to interact with the platform:

Web User Interface

We recommend that new users start with our intuitive web UI. It provides:

  • A streamlined, task-oriented experience for creating and managing pipelines
  • Visual tools for searching and exploring log data
  • Dashboards that visualize pipeline performance and data flows
  • Guided workflows for common observability tasks

The UI abstracts away many underlying complexities, making it easy to accomplish common tasks without deep technical knowledge of the system internals.

RESTful API

For advanced users and automation scenarios, our comprehensive RESTful API provides:

  • Complete programmatic control over all Grepr capabilities
  • The ability to integrate Grepr into your existing workflows and tools
  • Powerful customization options for complex observability pipelines
  • Support for infrastructure-as-code and GitOps approaches

As you become more familiar with Grepr, you can gradually transition to API usage for automating repetitive tasks and creating sophisticated custom integrations.