ovo.core.scheduler.healthomics_scheduler

Module Contents

Classes

API

class ovo.core.scheduler.healthomics_scheduler.HealthOmicsScheduler(name: str, workdir: str, reference_files_dir: str, aws: ovo.core.aws.AWSSessionManager, allow_submit: bool = True, submission_args: dict = None)

Bases: ovo.core.scheduler.base_scheduler.Scheduler

submit(pipeline_name: str, params: dict = None, submission_args: dict = None) str

Submits a workflow asynchronously and returns the PID.

Parameters:
  • pipeline_name – Workflow name and revision (ovo.rfdiffusion-end-to-end or github url with @version)

  • params – Dictionary of parameters to pass to the workflow.

  • submission_args – Submission arguments for the scheduler, overrides values in self.submission_args

Returns:

Scheduler job ID

get_status_label(job_id: str) str

Get human-readable job status label. Should NOT be used to determine job status.

Can return “Cancelled”, “Completed”, “Deleted”, “Failed”, “Pending”, “Running”, “Starting”, “Stopping”

get_result(job_id: str) str | bool | None

Get job result: True if successful, False if failed, None if running.

_get_log_events(job_id: str, task_id: str = None, preview: bool = False) list[mypy_boto3_logs.type_defs.OutputLogEventTypeDef]
get_log(job_id: str, task_id: str = None, preview: bool = False) str | None

Get job execution log

Parameters:
  • job_id – Scheduler job ID (DesignJob.job_id or DescriptorJob.job_id)

  • task_id – Task id of individual task (workdir for nextflow, task id for AWS Omics), None for entire job log

  • preview – Whether to return only the last 10 lines

Returns:

Log string or None if not available

abstractmethod cancel(job_id)

Cancel job execution

get_output_dir(job_id: str) str

Get job output path

get_job_start_time(job_id: str) datetime.datetime | None

Get job start time

get_job_stop_time(job_id: str) datetime.datetime | None

Get job end time

get_startup_time_minutes()

Get startup time of a task (in minutes)

get_tasks(job_id: str) pandas.DataFrame | None

Get job tasks as a DataFrame with columns: task_id, name, status, duration_seconds + custom columns from the scheduler