Skip to Content

SQL transform supported data types

This page provides details on the three data types supported as input to and output from a SQL transform: LOG_EVENT, VARIANT, and COMPLETE_SPAN. These are not scalar data types. Instead, they are structured or semi-structured data formats.

The LOG_EVENT data type

The LOG_EVENT type is a structured format representing a log event, and has the following schema:

ROW< id STRING, eventtimestamp TIMESTAMP_LTZ(3), receivedtimestamp TIMESTAMP_LTZ(3), message STRING, severity INT, tags MAP<STRING, ARRAY<STRING>>, attributes STRING >
ColumnTypeDescription
idSTRINGGlobally unique identifier for the log event.
eventtimestampTIMESTAMP_LTZ(3)When the event occurred, formatted as TIMESTAMP(3) WITH LOCAL TIME ZONE. This field is parsed from the input log event.
receivedtimestampTIMESTAMP_LTZ(3)When Grepr received the event, formatted as TIMESTAMP(3) WITH LOCAL TIME ZONE.
messageSTRINGThe log message content.
severityINTThe severity level of the event. This is a value between 1 and 24, following the OpenTelemetry convention.
tagsMAP<STRING, ARRAY<STRING>>Key-value pairs for filtering and routing.
attributesSTRINGJSON string containing structured data associated with the event.

If you’re transforming a field like message or severity, the transformed field must appear before the * wildcard in your SELECT clause.

Add tags to a LOG_EVENT using a special column syntax

The SQL transform provides a special column-naming syntax to add tags to output log events. When you define a column name that starts with the tags. prefix, such as tags.service or tags.environment, the SQL transform automatically adds that tag to the output log event’s tags field.

The SQL statement in the following example uses the column-naming syntax to add these tags to the output log event:

  • A tag named environment with the value production.
  • A service tag set to the first value from the input log event service tag.
  • A priority tag based on the severity level of the input event.
SELECT *, 'production' as `tags.environment`, tags['service'][1] as `tags.service`, CASE WHEN severity >= 17 THEN 'high' ELSE 'normal' END as `tags.priority` FROM logs

The following can be used to add tags with the column-naming syntax:

  • String values can be assigned directly. For example 'production' as `tags.environment`.
  • Any SQL expression that returns a string or array.
  • To add multiple tag values, you can use a SQL array. For example, ARRAY['web', 'api'] as `tags.service`. The values in the SQL array are converted to multiple tag values.
  • To add values to an existing tag, use ARRAY_APPEND. For example, ARRAY_APPEND(tags['service'], 'billing') as `tags.service` .
  • Column names must be enclosed in backticks when using the dot notation. For example, `tags.service`.
  • Re-using a tag name will overwrite the tag’s existing values. To add additional values to an existing tag, you must use the ARRAY_APPEND() function. The arguments to ARRAY_APPEND() must include the existing array values and the new values.
  • Values that are NULL or empty strings are automatically filtered out and not added as tags.

For more examples of using the tags column syntax, see Add tags to a log event.

Create custom log events

You can use a transformation to construct new LOG_EVENT records with specific fields. For example:

-- Explicitly construct a new LogEvent with specific fields. SELECT UPPER(message) as message, CASE WHEN severity > 16 THEN 21 ELSE severity END as severity, MAP['transformed', ARRAY['true']] as tags, JSON_OBJECT('original_message' VALUE message) as attributes FROM logs

Grepr automatically adds default values for fields that are not specified:

  • id: When not specified or NULL, defaults to a new random UUID.
  • eventtimestamp: When not specified or NULL, defaults to the current timestamp in milliseconds.
  • receivedtimestamp: When not specified or NULL, defaults to the current timestamp in milliseconds.
  • severity: When not specified or NULL, defaults to 9 (INFO level).
  • message: When not specified or NULL, defaults to an empty string.
  • tags: When not specified or NULL, defaults to an empty map.
  • attributes: When not specified or NULL, defaults to an empty JSON object.

The following shows an example input event to the above transformation and the resulting output event:

Input:

{ "id": "evt1", "eventtimestamp": 1724214074062, "receivedtimestamp": 1724214074188, "severity": 17, "message": "user login successful", "tags": {"service": ["auth"]}, "attributes": {} }

Output:

{ "id": "12345678-1234-5678-1234-567812345678", // A new generated UUID "eventtimestamp": 174000000000, // Current timestamp in milliseconds "receivedtimestamp": 174000000002, // Current timestamp in milliseconds "severity": 21, // Transformed severity "message": "USER LOGIN SUCCESSFUL", // Transformed message "tags": {"transformed": ["true"]}, // New tags "attributes": { // New attributes "original_message": "user login successful" } }

The VARIANT data type

A VARIANT data type contains timestamped, semi-structured data with flexible JSON content:

ROW< receivedtimestamp TIMESTAMP_LTZ(3), eventtimestamp TIMESTAMP_LTZ(3), data STRING >

The following is an example of a simple VARIANT:

{ "receivedtimestamp": 174000000000, "eventtimestamp": 174000000000, "data": { "count": 10 } }

The COMPLETE_SPAN data type

The COMPLETE_SPAN data type represents an OpenTelemetry span with complete resource and scope context. See Spans  in the OpenTelemetry documentation.

Complex nested ROW structure with resource, scope, and span fields
Last updated on