Track the lifecycle of jobs with the Grepr REST API
Grepr makes APIs available to track the lifecycle of jobs through the state
field returned as part of the Job APIs. The following sections describe the different lifecycles of jobs.
Asynchronous streaming jobs
These are long-running jobs that are also known as pipelines.
Create job lifecycle
The following states describe the lifecycle of an Asynchronous
Streaming
job submitted for creation:
PENDING
: The job has been submitted for creation but has not reached its desired state yet. Grepr allocates appropriate resources, starts the job, and ensures it’s in a healthy state.RUNNING
: The job is running and processing data.STOPPED
: The job has been stopped and is not processing data. When it starts again, the job will resume processing data from where it left off.FAILED
: The job failed to start. This failure could happen due to some invalid configuration of the job graph or errors in processing data. In this case, you can either resubmit or update the job after correction.
Update job lifecycle
Job updates in Grepr are versioned. An update, therefore, means Grepr will update the state for both the version getting retired and the version getting deployed concurrently. Grepr additionally allows users to specify a rollbackEnabled
parameter while updating the job. If set to true
, Grepr will roll back the job to a potentially running previous version in case of a failure. You can use this parameter to ensure minimal downtime in data processing in case of a misconfiguration.
A successful update job request will return a 202 Accepted
response with the updated job details and a bump in the job’s version number.
The following states describe the lifecycle of an Asynchronous
Streaming
job submitted for an update. It represents concurrent state transitions of the old and new job versions.
In case of a failure in the new job version, Grepr will roll back to the old job version if rollbackEnabled
is set to true
. A new job version (2) is created using the same job graph as job version 0. It will follow the same state transitions as when creating a new job.
The states in this case will be:
- Old job version: 0, job state:
FINISHED
, desiredState:FINISHED
- New job version: 1, job state:
FAILED
, desiredState:RUNNING
- Rollback job version: 2, job state:
PENDING
, desiredState:RUNNING
Delete job lifecycle
The following states describe the lifecycle of an Asynchronous
Streaming
job submitted for deletion:
Batch jobs
These are potentially short-lived jobs that support querying on user data. These are also defined as job graphs but work on scoped (limited) data.
Asynchronous batch jobs
These support asynchronous operations on user data. Grepr exposes APIs to submit and track the lifecycle of these jobs. Since they’re short-lived, updates and deletes are not supported on them. Once submitted, you can use the GET /jobs
API to track the state
attribute of the job. Following state transitions are possible:
One use-case for Async Batch Jobs is backfilling log data, i.e. loading data from the raw store back to the observability tool.
Synchronous batch jobs
These support synchronous queries on user data. Grepr exposes APIs to submit these jobs and stream the results back to the client. The lifecycle of these jobs is simpler than the streaming jobs in that they are submitted, processed, and then completed. In some cases, where they’re running beyond a max amount of time (default 30s), they will be cancelled.