OVO Database¶
Data model¶
In the OVO data model, a Design corresponds to one designed molecule defined by its sequence.
Designs are organized into Pools, where each Pool is created by a single submitted workflow
or uploaded by the user.
Submission parameters and job status metadata are stored in a separate DesignJob model
using a workflow field storing a JSON-serialized instance of a DesignWorkflow dataclass.
The DesignWorkflow dataclasses define workflow parameters and encapsulate use-case-specific logic.
New use-cases can be added by OVO plugins that implement their own DesignWorkflow subclasses.
Each Pool is identified by a randomly generated three-letter code (such as qki) that is
unique within the current database instance. This ID is used as a prefix for the ID of each Design
along with the backbone and sequence identifier (such as ovo_qki_0045_cycle01), ensuring that each
design can be uniquely tracked to its job of origin (and the corresponding workflow parameters).
Pools can be grouped into Rounds, corresponding to the iterative design-make-test-analyze loop,
and further into Projects, enabling sustainable organization in a multi-user setting.
Each design can be annotated with descriptors using the DescriptorValue model, which
is a key-value “tall” representation where all descriptor values are stored in a single
value column and associated with the descriptor using a descriptor_key column
(as opposed to a “wide” representation which would use a separate column for each descriptor).
Each DescriptorValue row is a single value associated with a single descriptor_key,
Design and DescriptorJob, storing workflow parameters in a DescriptorWorkflow dataclass, analogous to design jobs.
Different types of Descriptors (such as AlphaFold2 pLDDT, Rosetta ddG, or total positive patch area)
are uniquely defined by their descriptor_key and provide additional metadata such as their
human-readable name and description. The Descriptor type objects are not defined in the database
but inside OVO codebase and extended by plugins. Protein structures and other files are not stored
directly in the database but referenced by their storage path as managed by the Storage class.
The structure path string can be stored as a field of the Design model or as a special type of Descriptor.
Interacting with the database¶
Database rows can be retrieved from the database using the db object using select and get methods:
from ovo import db
# List all projects
db.Project.select()
# [Project(author='username', created_date_utc=datetime.datetime(2025, 11, 13, 9, 58, 36, 339174), id='b8e657bb-b0b0-423e-9235-383a6d8f74e5', name='OVO Publication Examples 1', public=True)]
# Get project by name
project = db.Project.get(name="OVO Publication Examples 1")
print(project.id)
# b8e657bb-b0b0-423e-9235-383a6d8f74e5
# Get Pool object
pool = db.Pool.get(id="qki")
print(pool.id)
print(pool.name)
# qki
# My first design pool
# Get list of Design objects in the pool
all_designs = db.Design.select(pool_id="qki")
print(all_designs[0].id)
# 'ovo_qki_0001_cycle01'
# Get only accepted designs
accepted_designs = db.Design.select(pool_id="qki", accepted=True)
print(accepted_designs[0].id)
# 'ovo_qki_0125_cycle02'
Use the project_logic module to retrieve projects, rounds and pools:
from ovo import db, project_logic
project, round = project_logic.get_or_create_project_round(
"OVO Publication Examples 1",
"Round 1",
)
print(round.id)
Use the design_logic module for convenience functions to retrieve designs and pools:
from ovo import db, design_logic
project = db.Project.get(name="OVO Publication Examples 1")
pools = design_logic.get_pools_table(project_id=project.id)
pools.head()
# Round ID Name ...
# 0 Binder design bbc 4ZXB 1000 designs default weights ligandmpnn ...
# 1 Binder design qki Top designs diversification ...
# 2 Motif scaffolding jov 1A41 6*100*8 designs default weights ...
# 3 Interface scaffolding zuk 5IUS 500*5 designs PD1 interface ...
# 4 Binder design mmo 4ZXB 1000 designs beta sheet ...
Use the descriptor_logic module for convenience functions to retrieve descriptor values:
from ovo import db, descriptor_logic
# Get designs as a pandas DataFrame
designs = db.Design.select_dataframe(pool_id="qki", accepted=True)
designs.head()
# pool_id structure_path ...
# id ...
# ovo_qki_0045_cycle01 qki project/fd3f1fcc-1dda-4076-ad4e-7b885ec90032/p... ...
# ovo_qki_0045_cycle02 qki project/fd3f1fcc-1dda-4076-ad4e-7b885ec90032/p... ...
# ovo_qki_0045_cycle04 qki project/fd3f1fcc-1dda-4076-ad4e-7b885ec90032/p... ...
# ovo_qki_0126_cycle01 qki project/fd3f1fcc-1dda-4076-ad4e-7b885ec90032/p... ...
# ovo_qki_0193_cycle01 qki project/fd3f1fcc-1dda-4076-ad4e-7b885ec90032/p... ...
# Get wide descriptor table for the designs
values = descriptor_logic.get_wide_descriptor_table(design_ids=designs.index)
values = designs.join(values) # join with metadata from designs table
values.head()
# pool_id ... Radius of gyration Rosetta ddG
# id ...
# ovo_qki_0045_cycle01 qki ... 75.842162 -47.569473
# ovo_qki_0045_cycle02 qki ... 75.842162 -46.606552
# ovo_qki_0045_cycle04 qki ... 75.842162 -48.614517
# ovo_qki_0126_cycle01 qki ... 65.733242 -33.502827
# ovo_qki_0193_cycle01 qki ... 51.979126 -20.721975
# Show available descriptors
print(values.columns.tolist())
# ['pool_id', 'structure_path', 'structure_descriptor_key', 'accepted', 'spec', 'contig_index', 'Sequence A', 'AF2 iPAE', ...]
# Paths to PDB files are also stored as descriptors:
paths = values[[
"RFdiffusion backbone design",
"ProteinMPNN FastRelax sequence design",
"AlphaFold2 Initial Guess prediction"
]]
paths.head()
# RFdiffusion backbone design ...
# id
# ovo_qki_01_01_cycle01 project/b8e657bb-b0b0-423e-9235-383a6d8f74e5/p ...
# ovo_qki_01_11_cycle02 project/b8e657bb-b0b0-423e-9235-383a6d8f74e5/p ...
# ovo_qki_01_14_cycle01 project/b8e657bb-b0b0-423e-9235-383a6d8f74e5/p ...
# ovo_qki_01_15_cycle02 project/b8e657bb-b0b0-423e-9235-383a6d8f74e5/p ...
# ovo_qki_01_21_cycle03 project/b8e657bb-b0b0-423e-9235-383a6d8f74e5/p ...
# ovo_qki_01_23_cycle02 project/b8e657bb-b0b0-423e-9235-383a6d8f74e5/p ...
See Jupyter notebooks examples for more detailed examples of interacting with the database.