ovo.core.utils.pdb

Module Contents

Classes

Functions

fix_contigs

get_pdb

detect_glycosylation_sites

Find glycosylation sites in the PDB file

calculate_coords_from_transformed_displacements

Calculate the coordinates of the glycan atoms based on the transformed displacements

add_glycan_to_pdb

Add glycan atoms to the PDB string

get_atom_coordinates

_align_sequences_and_get_indices

Aligns multiple protein sequences based on their amino acid sequences. Returns a tuple, a list of aligned sequences and list of indices for each sequence that correspond to the aligned residues.

align_multiple_proteins_pdb

Aligns multiple protein structures based on their atoms (CA or all).

get_aligned_structure_as_string

Returns the PDB representation of the given structure as a string.

pad_line

Helper function to pad line to 80 characters in case it is shorter

pdb_to_mmcif_iter

ADAPTED from pdb-tools pdb_tocif - added section that converts b-factor to pLDDT metric

pdb_to_mmcif

get_sequences_from_pdb_str

Get the sequence of a structure from the pdb file.

get_remark_header

Get the REMARK header from the PDB file.

get_standardized_remarks_from_pdb_str

Parse “standardized” remarks from the PDB file

trim_pdb_str

filter_pdb_str

Filter a PDB string to only include specified segments.

check_rfdiffusion_input

Make sure that structure does not contain chain breaks without a gap in residue numbering (required in refolding step)

Data

API

ovo.core.utils.pdb.aa3to1

None

ovo.core.utils.pdb.transformed_displacements

‘array(…)’

ovo.core.utils.pdb.fix_contigs(contigs, parsed_pdb)
ovo.core.utils.pdb.get_pdb(pdb_code: str) bytes
class ovo.core.utils.pdb.PDBSegmentSelector(segments: list[str])

Bases: Bio.PDB.Select

accept_model(model)
accept_chain(chain)
accept_residue(residue)
accept_atom(model)
ovo.core.utils.pdb.detect_glycosylation_sites(atom_ppdb: pandas.DataFrame, chains: list[str] | str | None = None, query_atoms: list[str] | None = None, cyclic: bool = False) dict | None

Find glycosylation sites in the PDB file

Param:

atom_ppdb: Pandas DataFrame with the ATOM records of the PDB file chains: str or list of str with the chain IDs to search for glycosylation sites query_atoms : list of str with the atom names to search for glycosylation sites

Returns:

glycosylation_dict: dictionary with the coordinates of the query glycosylated atoms

ovo.core.utils.pdb.calculate_coords_from_transformed_displacements(P1, P2) numpy.ndarray

Calculate the coordinates of the glycan atoms based on the transformed displacements

It uses the pre-calculated transformed displacements to calculate the coordinates of the glycan atoms

Param:

P1, P2: 3D coordinates in the PDB reference frame of the ND2 and CB atoms of the glycosylated residue

Returns:

3D coordinates in the PDB reference frame of the glycan atoms

ovo.core.utils.pdb.add_glycan_to_pdb(pdb_str: str) tuple[str, list[str] | None]

Add glycan atoms to the PDB string

ovo.core.utils.pdb.get_atom_coordinates(structure: Bio.PDB.Structure.Structure, chain_id: str | None, residues: list[int] | None, all_atom: bool = False, model_index=0) tuple[list[dict[str, numpy.ndarray]], list[Bio.PDB.Residue.Residue]]
ovo.core.utils.pdb._align_sequences_and_get_indices(seqs: list[str])

Aligns multiple protein sequences based on their amino acid sequences. Returns a tuple, a list of aligned sequences and list of indices for each sequence that correspond to the aligned residues.

ovo.core.utils.pdb.align_multiple_proteins_pdb(pdb_strs: list[str], chain_residue_mappings: list[list[tuple[str, list[int] | None]] | None], force_sequence_alignment: bool = False, all_atom: bool = False, verbose: bool = False) tuple[list[str], float]

Aligns multiple protein structures based on their atoms (CA or all).

Parameters:
  • pdb_strs – list of PDB strings

  • chain_residue_mappings – list of lists of tuples with chain ID and residues to align, if None provided, then whole chain/structure is aligned

  • force_sequence_alignment – if True, always align based on sequence even if lengths match

  • all_atom – if True, align using all atoms from matched residues (not just CA atoms)

  • verbose – if True, print information about the alignment process

ovo.core.utils.pdb.get_aligned_structure_as_string(structure) str

Returns the PDB representation of the given structure as a string.

ovo.core.utils.pdb.pad_line(line)

Helper function to pad line to 80 characters in case it is shorter

ovo.core.utils.pdb.pdb_to_mmcif_iter(pdb_data: str, structure_id: str, bfactor_to_plddt=False, fractional_plddt=False)

ADAPTED from pdb-tools pdb_tocif - added section that converts b-factor to pLDDT metric

Convert a structure in PDB format to mmCIF format.

This function is a generator.

Parameters

pdb_data: string with PDB data structure_id: entry ID bfactor_to_plddt: convert b-factor to alphafold pLDDT metric fractional_plddt: multiply pLDDT by 100 to get 0-100, used for ESMFold PDB which has values in 0-1 range

Yields

str (line-by-line) The structure in mmCIF format.

ovo.core.utils.pdb.pdb_to_mmcif(pdb_data: str, structure_id: str, bfactor_to_plddt=False, fractional_plddt=False)
exception ovo.core.utils.pdb.ChainNotFoundError

Bases: Exception

ovo.core.utils.pdb.get_sequences_from_pdb_str(pdb_str: str, chains: list[str] = None, by_residue_number: bool = False) dict[str, str] | dict[str, dict[str, str]]

Get the sequence of a structure from the pdb file.

Parameters:
  • pdb_str – str, PDB file contents as string

  • chains – list of str, chain IDs to extract sequences from, if None, all chains are extracted

  • by_residue_number – if True, return a dict with residue numbers (strings) as keys and amino acids as values

Sequence is extracted based on CA atoms. If a residue has CA atom, it is included in the sequence, either as the amino acid letter (if one of 20 standard amino acids) or as X (if non-standard amino acid). If a residue does not have CA atom (e.g. a ligand), it is ignored.

Chain breaks (for example a jump from 123 to 134) are NOT filled with X but ignored.

Assumes sorted residue ids in the PDB file within each chain. (E.g. BioPython does as well return it in the original order.)

Only the first model in the PDB file is parsed (everything after ENDMDL is ignored).

ovo.core.utils.pdb.get_remark_header(pdb_path: str) tuple[str, list[str]]

Get the REMARK header from the PDB file.

Parameters:

pdb_path – str, path to the PDB file

Returns:

str, REMARK header

ovo.core.utils.pdb.REMARK_KEYS

[‘Input contig’, ‘Standardized contig’, ‘Chains’, ‘Input hotspots’, ‘Standardized hotspots’]

ovo.core.utils.pdb.get_standardized_remarks_from_pdb_str(pdb_str: str) dict[str, str]

Parse “standardized” remarks from the PDB file

Parameters:

pdb_str – PDB file contents as string

Returns:

dict, parsed remarks {“Input contig”: “A45-46/10-15/A45-46/0 B24-26/5/B24-26/0 C10-20”, …} Example: example header to read REMARK 1 Input contig: “A45-46/10-15/A45-46/0 B24-26/5/B24-26/0 “ REMARK 1 Input contig: “C10-20” REMARK 1 Standardized contig: “A45-46/13-13/A45-46/0 B24-26/5-5/B24-26/” REMARK 1 Standardized contig: “0 C10-20/0” REMARK 1 Chains: “A B C” REMARK 1 Input hotspots: REMARK 1 Standardized hotspots:

ovo.core.utils.pdb.trim_pdb_str(pdb_input_string: str, target_chain: str, start_res: int, end_res: int) str
ovo.core.utils.pdb.filter_pdb_str(pdb_input_string: str, segments: list[str], add_ter=False) str

Filter a PDB string to only include specified segments.

Parameters:
  • pdb_input_string – str, input PDB string

  • segments – list of str, segments to include, e.g. [“A5”, “A10-20”, “B30-40”, “C”]

  • add_ter – Order ATOMs in PDB based on order in selected_segments, add TER and END in between

Returns:

str, filtered PDB string

ovo.core.utils.pdb.check_rfdiffusion_input(pdb_input_string: str, target_chain: str, start_res_trimmed_chain: int, end_res_trimmed_chain: int, max_amide_distance: float = 3.0)

Make sure that structure does not contain chain breaks without a gap in residue numbering (required in refolding step)

Also make sure that no insertion codes are present.

Raises ValueError if chain breaks are found.