ovo.core.logic.descriptor_logic

Module Contents

Functions

get_available_descriptors

Return all descriptor keys found in DB for the given design ids.

get_available_descriptors_per_job

Return all descriptor keys found in DB for the given design ids and the specified descriptor job.

get_wide_descriptor_table

Return a wide descriptor table for the given design ids. The table will have design ids as index and human-readable descriptor names (or keys if human_readable=False) as columns.

submit_descriptor_workflow

Submit a descriptor workflow to the scheduler and create a DescriptorJob in the DB.

prepare_design_structures

Prepare a txt file (or single pdb file if single design) with PDB file paths, all stored in a provided workdir

prepare_design_sequences

Prepare a csv file with a design_id column and a sequence column for each chain and store it in a provided workdir

prepare_refolding_params

prepare_foldseek_clustering_workflow_params

get_log

Get the log of a descriptor job from the scheduler.

process_results

Process results of a successful workflow - save DescriptorValues to database

read_descriptor_file_values

Process CSV/JSONL files containing descriptor values (one file per batch, one row per design), return list of DescriptorValues to be inserted into DB.

read_per_design_files

Process per-design output files such as predicted PDB structures, return DescriptorValues (containing the stored file path as DescriptorValue.value) to be inserted into DB.

_store_design_file_descriptor

save_descriptor_job_for_design_job

find_id_column

generate_descriptor_values_for_design

Generate DescriptorValue objects for a given design based on descriptor tables.

update_and_process_descriptors

Update descriptor jobs and process finished descriptor jobs

export_proteinqc_excel

export_design_descriptors_excel

Export Excel file with design descriptors for the given design ids.

tool_supports_scheduler

get_available_schedulers

Get available schedulers based on the clustering tools selected by the user.

get_descriptor_metadata_table

Get a table with metadata for the given descriptor keys.

API

ovo.core.logic.descriptor_logic.get_available_descriptors(design_ids: list[str]) dict[str, ovo.core.database.models.Descriptor]

Return all descriptor keys found in DB for the given design ids.

ovo.core.logic.descriptor_logic.get_available_descriptors_per_job(design_ids: list[str], descriptor_job_id: str) dict[str, ovo.core.database.models.Descriptor]

Return all descriptor keys found in DB for the given design ids and the specified descriptor job.

ovo.core.logic.descriptor_logic.get_wide_descriptor_table(*, pool_ids: Collection[str] = None, design_ids: Collection[str] = None, descriptor_keys: Collection[str] = None, human_readable=True, nested=False, descriptor_job_id: str = None, **filters) pandas.DataFrame

Return a wide descriptor table for the given design ids. The table will have design ids as index and human-readable descriptor names (or keys if human_readable=False) as columns.

ovo.core.logic.descriptor_logic.submit_descriptor_workflow(workflow: ovo.core.database.models.DescriptorWorkflow, scheduler_key: str, project_id: str, pipeline_name: str = None, submission_args: dict = None)

Submit a descriptor workflow to the scheduler and create a DescriptorJob in the DB.

Parameters:
  • workflow – DescriptorWorkflow object to submit

  • scheduler_key – Key of the scheduler to use

  • project_id – ID of the Project to associate the DescriptorJob with

  • pipeline_name – Override the pipeline name to submit, e.g. ovo.proteinqc or a github url with @version

  • submission_args – Extra submission arguments to override in the scheduler, e.g. {“profile”: “conda”} or {“stub”: True}

Returns:

DescriptorJob

ovo.core.logic.descriptor_logic.prepare_design_structures(designs: list[ovo.core.database.models.Design], workdir: str)

Prepare a txt file (or single pdb file if single design) with PDB file paths, all stored in a provided workdir

ovo.core.logic.descriptor_logic.prepare_design_sequences(designs: list[ovo.core.database.models.Design], workdir: str)

Prepare a csv file with a design_id column and a sequence column for each chain and store it in a provided workdir

ovo.core.logic.descriptor_logic.prepare_refolding_params(workflow: ovo.core.database.models_refolding.RefoldingWorkflow, workdir: str) dict
ovo.core.logic.descriptor_logic.prepare_foldseek_clustering_workflow_params(workflow: ovo.core.database.models_clustering.FoldseekClusteringWorkflow, workdir: str) dict
ovo.core.logic.descriptor_logic.get_log(descriptor_job: ovo.core.database.models.DescriptorJob, tail: int = None) str

Get the log of a descriptor job from the scheduler.

ovo.core.logic.descriptor_logic.process_results(descriptor_job: ovo.core.database.models.DescriptorJob, callback: Callable = None, wait: bool = True)

Process results of a successful workflow - save DescriptorValues to database

ovo.core.logic.descriptor_logic.read_descriptor_file_values(descriptor_job: ovo.core.database.models.DescriptorJob, design_id_mapping: dict[str, str | tuple], filenames: dict[str, str] = None, descriptor_tables: dict[str, pandas.DataFrame] = None) list[ovo.core.database.models.DescriptorValue]

Process CSV/JSONL files containing descriptor values (one file per batch, one row per design), return list of DescriptorValues to be inserted into DB.

Parameters:
  • descriptor_job – DescriptorJob object

  • design_id_mapping – Mapping from design.id to table_id (basename of PDB file = id column in descriptor output file)

  • filenames – Dict of “pipeline_name|tool_key” -> filename in output directory (with or without file extension - will look for .csv or .jsonl)

  • descriptor_tables – Optional dictionary of pre-loaded descriptor tables (tool_key -> pd.DataFrame).

Returns:

List of DescriptorValue objects to be saved to DB

ovo.core.logic.descriptor_logic.read_per_design_files(descriptor_job: ovo.core.database.models.DescriptorJob, design_id_mapping: dict[str, str | tuple], design_files: dict[str, tuple[str, str, str]], callback: Callable = None) list[ovo.core.database.models.DescriptorValue]

Process per-design output files such as predicted PDB structures, return DescriptorValues (containing the stored file path as DescriptorValue.value) to be inserted into DB.

Parameters:
  • descriptor_job – DescriptorJob object

  • design_id_mapping – Mapping from design.id to table_id (basename of PDB file = id column in descriptor output file)

  • design_files – Optional dict of filename produced by pipeline -> (subdir in storage, file suffix in storage, descriptor_key), for individual design files (e.g. per-design PDB output files or CSV files with more detailed descriptors, to be stored as storage path descriptors). For example pssm/{}.csv (with {} replaced by value of design_id_mapping) -> (descriptors, _pssm.csv, some|descriptor|key) Will store the descriptor files under pool/{pool_id}/descriptors/{descriptor_job_id}/{design_id}{file_suffix}.

  • callback – Optional callback function for reporting progress, will be called with value between 0 and 1 and text description of the current step

Returns:

List of DescriptorValue objects to be saved to DB

ovo.core.logic.descriptor_logic._store_design_file_descriptor(source_path: str, storage_path: str, descriptor_key: str, descriptor_job_id: str, design_id: str, chains: list[str]) ovo.core.database.models.DescriptorValue
ovo.core.logic.descriptor_logic.save_descriptor_job_for_design_job(design_job: ovo.core.database.models.DesignJob, project_id: str, chains: list[str], design_ids: list[str]) ovo.core.database.models.DescriptorJob
ovo.core.logic.descriptor_logic.find_id_column(df: pandas.DataFrame, df_name: str)
ovo.core.logic.descriptor_logic.generate_descriptor_values_for_design(design_id: str, table_ids: str | tuple, descriptor_job_id: str | None, descriptor_tables, chains: list[str]) list[ovo.core.database.models.DescriptorValue]

Generate DescriptorValue objects for a given design based on descriptor tables.

Requirements (otherwise an error is raised, leading to interrupted processing of the job):

  • Each descriptor table must contain at least one recognized descriptor (defined by Descriptor objects)

  • One of the design’s table_ids must be found in each descriptor table (each design should be present)

  • Descriptor table index must be unique

Parameters:
  • design_id – ID of the design

  • table_ids – ID(s) corresponding to the design in the dataframes (can be a single string or a tuple of strings)

  • descriptor_job_id – ID of the DescriptorJob

  • descriptor_tables – Dictionary of descriptor tables (tool_key -> pd.DataFrame, indexed by table_id)

  • chains – List of chain IDs the descriptors apply to

Returns:

List of DescriptorValue objects

ovo.core.logic.descriptor_logic.update_and_process_descriptors(descriptor_jobs: List[ovo.core.database.models.DescriptorJob], error_callback: Callable)

Update descriptor jobs and process finished descriptor jobs

ovo.core.logic.descriptor_logic.export_proteinqc_excel(design_ids: list[str], output_path: str = None)
ovo.core.logic.descriptor_logic.export_design_descriptors_excel(df: pandas.DataFrame, output_path=None) io.BytesIO | None

Export Excel file with design descriptors for the given design ids.

Parameters:
  • df – Dataframe with any values, descriptor columns should be descriptor keys

  • output_path – If given, write the Excel file to this path. If None, return the bytes of the Excel file.

ovo.core.logic.descriptor_logic.tool_supports_scheduler(tool: ovo.core.database.models_clustering.ProteinClusteringTool, scheduler: ovo.core.scheduler.base_scheduler.Scheduler) bool
ovo.core.logic.descriptor_logic.get_available_schedulers(tools: List[ovo.core.database.models_clustering.ProteinClusteringTool]) List[ovo.core.scheduler.base_scheduler.Scheduler]

Get available schedulers based on the clustering tools selected by the user.

ovo.core.logic.descriptor_logic.get_descriptor_metadata_table(descriptor_keys: Collection[str]) pandas.DataFrame

Get a table with metadata for the given descriptor keys.