SQL transform supported data types
This page provides details on the three data types supported as input to and output from a SQL transform: LOG_EVENT, VARIANT, and COMPLETE_SPAN. These are not scalar data types. Instead, they are structured or semi-structured data formats.
The LOG_EVENT data type
The LOG_EVENT type is a structured format representing a log event, and has the following schema:
ROW<
id STRING,
eventtimestamp TIMESTAMP_LTZ(3),
receivedtimestamp TIMESTAMP_LTZ(3),
message STRING,
severity INT,
tags MAP<STRING, ARRAY<STRING>>,
attributes STRING
>| Column | Type | Description |
|---|---|---|
id | STRING | Globally unique identifier for the log event. |
eventtimestamp | TIMESTAMP_LTZ(3) | When the event occurred, formatted as TIMESTAMP(3) WITH LOCAL TIME ZONE. This field is parsed from the input log event. |
receivedtimestamp | TIMESTAMP_LTZ(3) | When Grepr received the event, formatted as TIMESTAMP(3) WITH LOCAL TIME ZONE. |
message | STRING | The log message content. |
severity | INT | The severity level of the event. This is a value between 1 and 24, following the OpenTelemetry convention. |
tags | MAP<STRING, ARRAY<STRING>> | Key-value pairs for filtering and routing. |
attributes | STRING | JSON string containing structured data associated with the event. |
If you’re transforming a field like message or severity, the transformed field must appear before the * wildcard in your SELECT clause.
Add tags to a LOG_EVENT using a special column syntax
The SQL transform provides a special column-naming syntax to add tags to output log events. When you define a column name that starts with the tags. prefix, such as tags.service or tags.environment, the SQL transform automatically adds that tag to the output log event’s tags field.
The SQL statement in the following example uses the column-naming syntax to add these tags to the output log event:
- A tag named
environmentwith the valueproduction. - A
servicetag set to the first value from the input log eventservicetag. - A
prioritytag based on the severity level of the input event.
SELECT *,
'production' as `tags.environment`,
tags['service'][1] as `tags.service`,
CASE WHEN severity >= 17 THEN 'high' ELSE 'normal' END as `tags.priority`
FROM logsThe following can be used to add tags with the column-naming syntax:
- String values can be assigned directly. For example
'production' as `tags.environment`. - Any SQL expression that returns a string or array.
- To add multiple tag values, you can use a SQL array. For example,
ARRAY['web', 'api'] as `tags.service`. The values in the SQL array are converted to multiple tag values. - To add values to an existing tag, use
ARRAY_APPEND. For example,ARRAY_APPEND(tags['service'], 'billing') as `tags.service`.
- Column names must be enclosed in backticks when using the dot notation. For example,
`tags.service`. - Re-using a tag name will overwrite the tag’s existing values. To add additional values to an existing tag, you must use the
ARRAY_APPEND()function. The arguments toARRAY_APPEND()must include the existing array values and the new values. - Values that are
NULLor empty strings are automatically filtered out and not added as tags.
For more examples of using the tags column syntax, see Add tags to a log event.
Create custom log events
You can use a transformation to construct new LOG_EVENT records with specific fields. For example:
-- Explicitly construct a new LogEvent with specific fields.
SELECT
UPPER(message) as message,
CASE WHEN severity > 16 THEN 21 ELSE severity END as severity,
MAP['transformed', ARRAY['true']] as tags,
JSON_OBJECT('original_message' VALUE message) as attributes
FROM logsGrepr automatically adds default values for fields that are not specified:
id: When not specified orNULL, defaults to a new random UUID.eventtimestamp: When not specified orNULL, defaults to the current timestamp in milliseconds.receivedtimestamp: When not specified orNULL, defaults to the current timestamp in milliseconds.severity: When not specified orNULL, defaults to 9 (INFO level).message: When not specified orNULL, defaults to an empty string.tags: When not specified orNULL, defaults to an empty map.attributes: When not specified orNULL, defaults to an empty JSON object.
The following shows an example input event to the above transformation and the resulting output event:
Input:
{
"id": "evt1",
"eventtimestamp": 1724214074062,
"receivedtimestamp": 1724214074188,
"severity": 17,
"message": "user login successful",
"tags": {"service": ["auth"]},
"attributes": {}
}Output:
{
"id": "12345678-1234-5678-1234-567812345678", // A new generated UUID
"eventtimestamp": 174000000000, // Current timestamp in milliseconds
"receivedtimestamp": 174000000002, // Current timestamp in milliseconds
"severity": 21, // Transformed severity
"message": "USER LOGIN SUCCESSFUL", // Transformed message
"tags": {"transformed": ["true"]}, // New tags
"attributes": { // New attributes
"original_message": "user login successful"
}
}The VARIANT data type
A VARIANT data type contains timestamped, semi-structured data with flexible JSON content:
ROW<
receivedtimestamp TIMESTAMP_LTZ(3),
eventtimestamp TIMESTAMP_LTZ(3),
data STRING
>The following is an example of a simple VARIANT:
{
"receivedtimestamp": 174000000000,
"eventtimestamp": 174000000000,
"data": {
"count": 10
}
}The COMPLETE_SPAN data type
The COMPLETE_SPAN data type represents an OpenTelemetry span with complete resource and scope context. See Spans in the OpenTelemetry documentation.
Complex nested ROW structure with resource, scope, and span fields