ovo.core.utils.pdb¶
Module Contents¶
Classes¶
Functions¶
extract xyz coords for all heavy atoms |
|
Find glycosylation sites in the PDB file |
|
Calculate the coordinates of the glycan atoms based on the transformed displacements |
|
Add glycan atoms to the PDB string |
|
Aligns multiple protein sequences based on their amino acid sequences. Returns a tuple, a list of aligned sequences and list of indices for each sequence that correspond to the aligned residues. |
|
Aligns multiple protein structures based on their atoms (CA or all). |
|
Returns the PDB representation of the given structure as a string. |
|
Helper function to pad line to 80 characters in case it is shorter |
|
ADAPTED from pdb-tools pdb_tocif - added section that converts b-factor to pLDDT metric |
|
Get the sequence of a structure from the pdb file. |
|
Get the REMARK header from the PDB file. |
|
Parse “standardized” remarks from the PDB file |
|
Filter a PDB string to only include specified segments. |
Data¶
API¶
- ovo.core.utils.pdb.num2aa¶
[‘ALA’, ‘ARG’, ‘ASN’, ‘ASP’, ‘CYS’, ‘GLN’, ‘GLU’, ‘GLY’, ‘HIS’, ‘ILE’, ‘LEU’, ‘LYS’, ‘MET’, ‘PHE’, ‘…
- ovo.core.utils.pdb.aa2num¶
None
- ovo.core.utils.pdb.aa2long¶
[(’ N ‘, ‘ CA ‘, ‘ C ‘, ‘ O ‘, ‘ CB ‘, None, None, None, None, None, None, None, None, None, ‘ H …
- ovo.core.utils.pdb.aa3to1¶
None
- ovo.core.utils.pdb.transformed_displacements¶
‘array(…)’
- ovo.core.utils.pdb.parse_pdb(filename, **kwargs)¶
extract xyz coords for all heavy atoms
- ovo.core.utils.pdb.parse_pdb_lines(lines, parse_hetatom=False, ignore_het_h=True)¶
- ovo.core.utils.pdb.fix_contigs(contigs, parsed_pdb)¶
- ovo.core.utils.pdb.get_pdb(pdb_code: str) bytes¶
- class ovo.core.utils.pdb.PDBSegmentSelector(segments: list[str])¶
Bases:
Bio.PDB.Select- accept_model(model)¶
- accept_chain(chain)¶
- accept_residue(residue)¶
- accept_atom(model)¶
- ovo.core.utils.pdb.detect_glycosylation_sites(atom_ppdb: pandas.DataFrame, chains: list[str] | str | None = None, query_atoms: list[str] | None = None, cyclic: bool = False) dict | None¶
Find glycosylation sites in the PDB file
- Param:
atom_ppdb: Pandas DataFrame with the ATOM records of the PDB file chains: str or list of str with the chain IDs to search for glycosylation sites query_atoms : list of str with the atom names to search for glycosylation sites
- Returns:
glycosylation_dict: dictionary with the coordinates of the query glycosylated atoms
- ovo.core.utils.pdb.calculate_coords_from_transformed_displacements(P1, P2) numpy.ndarray¶
Calculate the coordinates of the glycan atoms based on the transformed displacements
It uses the pre-calculated transformed displacements to calculate the coordinates of the glycan atoms
- Param:
P1, P2: 3D coordinates in the PDB reference frame of the ND2 and CB atoms of the glycosylated residue
- Returns:
3D coordinates in the PDB reference frame of the glycan atoms
- ovo.core.utils.pdb.add_glycan_to_pdb(pdb_str: str) tuple[str, list[str] | None]¶
Add glycan atoms to the PDB string
- ovo.core.utils.pdb.get_atom_coordinates(structure: Bio.PDB.Structure.Structure, chain_id: str | None, residues: list[int] | None, all_atom: bool = False, model_index=0) tuple[list[dict[str, numpy.ndarray]], list[Bio.PDB.Residue.Residue]]¶
- ovo.core.utils.pdb._align_sequences_and_get_indices(seqs: list[str])¶
Aligns multiple protein sequences based on their amino acid sequences. Returns a tuple, a list of aligned sequences and list of indices for each sequence that correspond to the aligned residues.
- ovo.core.utils.pdb.align_multiple_proteins_pdb(pdb_strs: list[str], chain_residue_mappings: list[list[tuple[str, list[int] | None]] | None], force_sequence_alignment: bool = False, all_atom: bool = False, verbose: bool = False) tuple[list[str], float]¶
Aligns multiple protein structures based on their atoms (CA or all).
- Parameters:
pdb_strs – list of PDB strings
chain_residue_mappings – list of lists of tuples with chain ID and residues to align, if None provided, then whole chain/structure is aligned
force_sequence_alignment – if True, always align based on sequence even if lengths match
all_atom – if True, align using all atoms from matched residues (not just CA atoms)
verbose – if True, print information about the alignment process
- ovo.core.utils.pdb.get_aligned_structure_as_string(structure) str¶
Returns the PDB representation of the given structure as a string.
- ovo.core.utils.pdb.pad_line(line)¶
Helper function to pad line to 80 characters in case it is shorter
- ovo.core.utils.pdb.pdb_to_mmcif_iter(pdb_data: str, structure_id: str, bfactor_to_plddt=False, fractional_plddt=False)¶
ADAPTED from pdb-tools pdb_tocif - added section that converts b-factor to pLDDT metric
Convert a structure in PDB format to mmCIF format.
This function is a generator.
Parameters
pdb_data: string with PDB data structure_id: entry ID bfactor_to_plddt: convert b-factor to alphafold pLDDT metric fractional_plddt: multiply pLDDT by 100 to get 0-100, used for ESMFold PDB which has values in 0-1 range
Yields
str (line-by-line) The structure in mmCIF format.
- ovo.core.utils.pdb.pdb_to_mmcif(pdb_data: str, structure_id: str, bfactor_to_plddt=False, fractional_plddt=False)¶
- exception ovo.core.utils.pdb.ChainNotFoundError¶
Bases:
Exception
- ovo.core.utils.pdb.get_sequences_from_pdb_str(pdb_str: str, chains: list[str] = None, by_residue_number: bool = False) dict[str, str] | dict[str, dict[str, str]]¶
Get the sequence of a structure from the pdb file.
- Parameters:
pdb_str – str, PDB file contents as string
chains – list of str, chain IDs to extract sequences from, if None, all chains are extracted
by_residue_number – if True, return a dict with residue numbers (strings) as keys and amino acids as values
Chain breaks (for example a jump from 123 to 134) are NOT filled with X but ignored.
- ovo.core.utils.pdb.get_remark_header(pdb_path: str) tuple[str, list[str]]¶
Get the REMARK header from the PDB file.
- Parameters:
pdb_path – str, path to the PDB file
- Returns:
str, REMARK header
- ovo.core.utils.pdb.REMARK_KEYS¶
[‘Input contig’, ‘Standardized contig’, ‘Chains’, ‘Input hotspots’, ‘Standardized hotspots’]
- ovo.core.utils.pdb.get_standardized_remarks_from_pdb_str(pdb_str: str) dict[str, str]¶
Parse “standardized” remarks from the PDB file
- Parameters:
pdb_str – PDB file contents as string
- Returns:
dict, parsed remarks {“Input contig”: “A45-46/10-15/A45-46/0 B24-26/5/B24-26/0 C10-20”, …} Example: example header to read REMARK 1 Input contig: “A45-46/10-15/A45-46/0 B24-26/5/B24-26/0 “ REMARK 1 Input contig: “C10-20” REMARK 1 Standardized contig: “A45-46/13-13/A45-46/0 B24-26/5-5/B24-26/” REMARK 1 Standardized contig: “0 C10-20/0” REMARK 1 Chains: “A B C” REMARK 1 Input hotspots: REMARK 1 Standardized hotspots:
- ovo.core.utils.pdb.trim_pdb_str(pdb_input_string: str, target_chain: str, start_res: int, end_res: int) str¶
- ovo.core.utils.pdb.filter_pdb_str(pdb_input_string: str, segments: list[str], add_ter=False) str¶
Filter a PDB string to only include specified segments.
- Parameters:
pdb_input_string – str, input PDB string
segments – list of str, segments to include, e.g. [“A5”, “A10-20”, “B30-40”, “C”]
add_ter – Order ATOMs in PDB based on order in selected_segments, add TER and END in between
- Returns:
str, filtered PDB string