Grepr Job States
Grepr makes APIs available to track the lifecycle of jobs through the state
field returned as part of
the Job APIs. The following sections describe the different lifecycles of jobs.
Asynchronous Streaming Jobs
These are long-running jobs that are also known as pipelines.
Create Job lifecycle
The following states describe the lifecycle of an Asynchronous
Streaming
job submitted
for creation:
PENDING
: The job has been submitted for creation but has not reached its desired state yet. Grepr allocates appropriate resources, starts the job and ensures it's in a healthy state.RUNNING
: The job is running and processing data.STOPPED
: The job has been stopped and is not processing data. Until the desired state is updated toRUNNING
, the job will continue to be in a suspended state. When started again, the job will resume processing data from where it left off.FAILED
: The job failed to start. This could happen due to some invalid configuration of the job graph or errors in processing data. In this case, you can either re-submit or update the job after correction.
Update Job lifecycle
Job updates in Grepr are versioned. An update, therefore, means Grepr will update state for both the version getting
retired and the version getting deployed concurrently.
Grepr additionally allows users to specify a rollbackEnabled
parameter while updating the job. If set to true
, Grepr
will roll back the job to a potentially running previous version in case of a failure. This can be used to ensure there
is minimal downtime in the data processing in case there was a misconfiguration.
A successful update job request will return a 202 Accepted
response with the updated job details along with a bump
in the version number of the job.
The following states describe the lifecycle of an Asynchronous
Streaming
job submitted for an update. It represents
concurrent state transitions of the old and new job versions.
In case of a failure in the new job version, Grepr will roll back to the old job version if rollbackEnabled
is set to true
.
A new job version (2) will be created with the same job graph as for the job version 0. It will follow the same state
transitions as when creating a new job.
The states in this case will be:
- Old job version: 0, job state:
FINISHED
, desiredState:FINISHED
- New job version: 1, job state:
FAILED
, desiredState:RUNNING
- Rollback job version: 2, job state:
PENDING
, desiredState:RUNNING
Delete job lifecycle
The following states describe the lifecycle of an Asynchronous
Streaming
job submitted for deletion:
Batch Jobs
These are potentially short-lived jobs that support querying on user data. These are also defined as job graphs but work on scoped (limited) data.
Asynchronous Batch Jobs
These support asynchronous operations on user data. Grepr exposes APIs to submit and track the lifecycle of these jobs.
Since they're short-lived, updates and deletes are not supported on them. Once submitted, you can use the GET /jobs
API
to track the state
attribute of the job. Following state transitions are possible:
One use-case for Async Batch Jobs is backfilling log data, i.e. loading data from the raw store back to the observability tool.
Synchronous Batch Jobs
These support synchronous queries on user data. Grepr exposes APIs to submit these jobs and stream the results back to the client. The lifecycle of these jobs is simpler than the streaming jobs in that they are submitted, processed, and then completed. In some cases, where they're running beyond a max amount of time (default 30s), they will be cancelled.