APIs
Job States

Grepr Job States

Grepr makes APIs available to track the lifecycle of jobs through the state field returned as part of the Job APIs. The following sections describe the different lifecycles of jobs.

Asynchronous Streaming Jobs

These are long-running jobs that are also known as pipelines.

Create Job lifecycle

The following states describe the lifecycle of an Asynchronous Streaming job submitted for creation:

Create Job lifecycle

  1. PENDING: The job has been submitted for creation but has not reached its desired state yet. Grepr allocates appropriate resources, starts the job and ensures it's in a healthy state.
  2. RUNNING: The job is running and processing data.
  3. STOPPED: The job has been stopped and is not processing data. Until the desired state is updated to RUNNING, the job will continue to be in a suspended state. When started again, the job will resume processing data from where it left off.
  4. FAILED: The job failed to start. This could happen due to some invalid configuration of the job graph or errors in processing data. In this case, you can either re-submit or update the job after correction.

Update Job lifecycle

Job updates in Grepr are versioned. An update, therefore, means Grepr will update state for both the version getting retired and the version getting deployed concurrently. Grepr additionally allows users to specify a rollbackEnabled parameter while updating the job. If set to true, Grepr will roll back the job to a potentially running previous version in case of a failure. This can be used to ensure there is minimal downtime in the data processing in case there was a misconfiguration.

A successful update job request will return a 202 Accepted response with the updated job details along with a bump in the version number of the job.

The following states describe the lifecycle of an Asynchronous Streaming job submitted for an update. It represents concurrent state transitions of the old and new job versions.

Update Job lifecycle

In case of a failure in the new job version, Grepr will roll back to the old job version if rollbackEnabled is set to true. A new job version (2) will be created with the same job graph as for the job version 0. It will follow the same state transitions as when creating a new job.

The states in this case will be:

  • Old job version: 0, job state: FINISHED, desiredState: FINISHED
  • New job version: 1, job state: FAILED, desiredState: RUNNING
  • Rollback job version: 2, job state: PENDING, desiredState: RUNNING

Delete job lifecycle

The following states describe the lifecycle of an Asynchronous Streaming job submitted for deletion:

Delete Job lifecycle

Batch Jobs

These are potentially short-lived jobs that support querying on user data. These are also defined as job graphs but work on scoped (limited) data.

Asynchronous Batch Jobs

These support asynchronous operations on user data. Grepr exposes APIs to submit and track the lifecycle of these jobs. Since they're short-lived, updates and deletes are not supported on them. Once submitted, you can use the GET /jobs API to track the state attribute of the job. Following state transitions are possible:

Asynchronous Batch Job

One use-case for Async Batch Jobs is backfilling log data, i.e. loading data from the raw store back to the observability tool.

Synchronous Batch Jobs

These support synchronous queries on user data. Grepr exposes APIs to submit these jobs and stream the results back to the client. The lifecycle of these jobs is simpler than the streaming jobs in that they are submitted, processed, and then completed. In some cases, where they're running beyond a max amount of time (default 30s), they will be cancelled.