ovo.core.utils.pdb

Module Contents

Classes

Functions

parse_pdb

extract xyz coords for all heavy atoms

parse_pdb_lines

fix_contigs

get_pdb

detect_glycosylation_sites

Find glycosylation sites in the PDB file

calculate_coords_from_transformed_displacements

Calculate the coordinates of the glycan atoms based on the transformed displacements

add_glycan_to_pdb

Add glycan atoms to the PDB string

get_atom_coordinates

_align_sequences_and_get_indices

Aligns multiple protein sequences based on their amino acid sequences. Returns a tuple, a list of aligned sequences and list of indices for each sequence that correspond to the aligned residues.

align_multiple_proteins_pdb

Aligns multiple protein structures based on their atoms (CA or all).

get_aligned_structure_as_string

Returns the PDB representation of the given structure as a string.

pad_line

Helper function to pad line to 80 characters in case it is shorter

pdb_to_mmcif_iter

ADAPTED from pdb-tools pdb_tocif - added section that converts b-factor to pLDDT metric

pdb_to_mmcif

get_sequences_from_pdb_str

Get the sequence of a structure from the pdb file.

get_remark_header

Get the REMARK header from the PDB file.

get_standardized_remarks_from_pdb_str

Parse “standardized” remarks from the PDB file

trim_pdb_str

filter_pdb_str

Filter a PDB string to only include specified segments.

Data

API

ovo.core.utils.pdb.num2aa

[‘ALA’, ‘ARG’, ‘ASN’, ‘ASP’, ‘CYS’, ‘GLN’, ‘GLU’, ‘GLY’, ‘HIS’, ‘ILE’, ‘LEU’, ‘LYS’, ‘MET’, ‘PHE’, ‘…

ovo.core.utils.pdb.aa2num

None

ovo.core.utils.pdb.aa2long

[(’ N ‘, ‘ CA ‘, ‘ C ‘, ‘ O ‘, ‘ CB ‘, None, None, None, None, None, None, None, None, None, ‘ H …

ovo.core.utils.pdb.aa3to1

None

ovo.core.utils.pdb.transformed_displacements

‘array(…)’

ovo.core.utils.pdb.parse_pdb(filename, **kwargs)

extract xyz coords for all heavy atoms

ovo.core.utils.pdb.parse_pdb_lines(lines, parse_hetatom=False, ignore_het_h=True)
ovo.core.utils.pdb.fix_contigs(contigs, parsed_pdb)
ovo.core.utils.pdb.get_pdb(pdb_code: str) bytes
class ovo.core.utils.pdb.PDBSegmentSelector(segments: list[str])

Bases: Bio.PDB.Select

accept_model(model)
accept_chain(chain)
accept_residue(residue)
accept_atom(model)
ovo.core.utils.pdb.detect_glycosylation_sites(atom_ppdb: pandas.DataFrame, chains: list[str] | str | None = None, query_atoms: list[str] | None = None, cyclic: bool = False) dict | None

Find glycosylation sites in the PDB file

Param:

atom_ppdb: Pandas DataFrame with the ATOM records of the PDB file chains: str or list of str with the chain IDs to search for glycosylation sites query_atoms : list of str with the atom names to search for glycosylation sites

Returns:

glycosylation_dict: dictionary with the coordinates of the query glycosylated atoms

ovo.core.utils.pdb.calculate_coords_from_transformed_displacements(P1, P2) numpy.ndarray

Calculate the coordinates of the glycan atoms based on the transformed displacements

It uses the pre-calculated transformed displacements to calculate the coordinates of the glycan atoms

Param:

P1, P2: 3D coordinates in the PDB reference frame of the ND2 and CB atoms of the glycosylated residue

Returns:

3D coordinates in the PDB reference frame of the glycan atoms

ovo.core.utils.pdb.add_glycan_to_pdb(pdb_str: str) tuple[str, list[str] | None]

Add glycan atoms to the PDB string

ovo.core.utils.pdb.get_atom_coordinates(structure: Bio.PDB.Structure.Structure, chain_id: str | None, residues: list[int] | None, all_atom: bool = False, model_index=0) tuple[list[dict[str, numpy.ndarray]], list[Bio.PDB.Residue.Residue]]
ovo.core.utils.pdb._align_sequences_and_get_indices(seqs: list[str])

Aligns multiple protein sequences based on their amino acid sequences. Returns a tuple, a list of aligned sequences and list of indices for each sequence that correspond to the aligned residues.

ovo.core.utils.pdb.align_multiple_proteins_pdb(pdb_strs: list[str], chain_residue_mappings: list[list[tuple[str, list[int] | None]] | None], force_sequence_alignment: bool = False, all_atom: bool = False, verbose: bool = False) tuple[list[str], float]

Aligns multiple protein structures based on their atoms (CA or all).

Parameters:
  • pdb_strs – list of PDB strings

  • chain_residue_mappings – list of lists of tuples with chain ID and residues to align, if None provided, then whole chain/structure is aligned

  • force_sequence_alignment – if True, always align based on sequence even if lengths match

  • all_atom – if True, align using all atoms from matched residues (not just CA atoms)

  • verbose – if True, print information about the alignment process

ovo.core.utils.pdb.get_aligned_structure_as_string(structure) str

Returns the PDB representation of the given structure as a string.

ovo.core.utils.pdb.pad_line(line)

Helper function to pad line to 80 characters in case it is shorter

ovo.core.utils.pdb.pdb_to_mmcif_iter(pdb_data: str, structure_id: str, bfactor_to_plddt=False, fractional_plddt=False)

ADAPTED from pdb-tools pdb_tocif - added section that converts b-factor to pLDDT metric

Convert a structure in PDB format to mmCIF format.

This function is a generator.

Parameters

pdb_data: string with PDB data structure_id: entry ID bfactor_to_plddt: convert b-factor to alphafold pLDDT metric fractional_plddt: multiply pLDDT by 100 to get 0-100, used for ESMFold PDB which has values in 0-1 range

Yields

str (line-by-line) The structure in mmCIF format.

ovo.core.utils.pdb.pdb_to_mmcif(pdb_data: str, structure_id: str, bfactor_to_plddt=False, fractional_plddt=False)
exception ovo.core.utils.pdb.ChainNotFoundError

Bases: Exception

ovo.core.utils.pdb.get_sequences_from_pdb_str(pdb_str: str, chains: list[str] = None, by_residue_number: bool = False) dict[str, str] | dict[str, dict[str, str]]

Get the sequence of a structure from the pdb file.

Parameters:
  • pdb_str – str, PDB file contents as string

  • chains – list of str, chain IDs to extract sequences from, if None, all chains are extracted

  • by_residue_number – if True, return a dict with residue numbers (strings) as keys and amino acids as values

Chain breaks (for example a jump from 123 to 134) are NOT filled with X but ignored.

ovo.core.utils.pdb.get_remark_header(pdb_path: str) tuple[str, list[str]]

Get the REMARK header from the PDB file.

Parameters:

pdb_path – str, path to the PDB file

Returns:

str, REMARK header

ovo.core.utils.pdb.REMARK_KEYS

[‘Input contig’, ‘Standardized contig’, ‘Chains’, ‘Input hotspots’, ‘Standardized hotspots’]

ovo.core.utils.pdb.get_standardized_remarks_from_pdb_str(pdb_str: str) dict[str, str]

Parse “standardized” remarks from the PDB file

Parameters:

pdb_str – PDB file contents as string

Returns:

dict, parsed remarks {“Input contig”: “A45-46/10-15/A45-46/0 B24-26/5/B24-26/0 C10-20”, …} Example: example header to read REMARK 1 Input contig: “A45-46/10-15/A45-46/0 B24-26/5/B24-26/0 “ REMARK 1 Input contig: “C10-20” REMARK 1 Standardized contig: “A45-46/13-13/A45-46/0 B24-26/5-5/B24-26/” REMARK 1 Standardized contig: “0 C10-20/0” REMARK 1 Chains: “A B C” REMARK 1 Input hotspots: REMARK 1 Standardized hotspots:

ovo.core.utils.pdb.trim_pdb_str(pdb_input_string: str, target_chain: str, start_res: int, end_res: int) str
ovo.core.utils.pdb.filter_pdb_str(pdb_input_string: str, segments: list[str], add_ter=False) str

Filter a PDB string to only include specified segments.

Parameters:
  • pdb_input_string – str, input PDB string

  • segments – list of str, segments to include, e.g. [“A5”, “A10-20”, “B30-40”, “C”]

  • add_ter – Order ATOMs in PDB based on order in selected_segments, add TER and END in between

Returns:

str, filtered PDB string