Using a storage location as a source

Grepr can read files that exist in a storage location for processing. These files can be read in either STREAMING mode or in BATCH mode (see Execution).

When in BATCH mode, Grepr reads all the files that exist at job creation time and processes them in one batch. When in STREAMING mode, Grepr will monitor the location for new files and will read those files as they appear, processing their contents. If a job restarts, Grepr keeps track of the last read location and will continue from there.

As entries in each file are read, they are converted into Grepr's internal log event model. Additional processing may be needed on those events using the available parsing operators.

Formats

Grepr current supports two formats: Parquet and newline-delimited files.

Parquet

Reading Parquet files is currently only supported via the API. When reading Parquet files you need to specify the schema for the files and how columns map to the Grepr log event model. More details are available in the API docs.

Newline-delimited files

Newline-delimited files are currently only supported via the API. Each line's contents are read into the log event's message field. If the entries are JSON, later operations in the pipeline can deserialize it and process it as needed. See the details in the API docs.

Iceberg Data Lake Splunk Log HTTP Source