ovo.core.utils.pdb¶
Module Contents¶
Classes¶
Functions¶
Find glycosylation sites in the PDB file |
|
Calculate the coordinates of the glycan atoms based on the transformed displacements |
|
Add glycan atoms to the PDB string |
|
Aligns multiple protein sequences based on their amino acid sequences. Returns a tuple, a list of aligned sequences and list of indices for each sequence that correspond to the aligned residues. |
|
Aligns multiple protein structures based on their atoms (CA or all). |
|
Returns the PDB representation of the given structure as a string. |
|
Helper function to pad line to 80 characters in case it is shorter |
|
ADAPTED from pdb-tools pdb_tocif - added section that converts b-factor to pLDDT metric |
|
Get the sequence of a structure from the pdb file. |
|
Get the REMARK header from the PDB file. |
|
Parse “standardized” remarks from the PDB file |
|
Filter a PDB string to only include specified segments. |
|
Make sure that structure does not contain chain breaks without a gap in residue numbering (required in refolding step) |
Data¶
API¶
- ovo.core.utils.pdb.aa3to1¶
None
- ovo.core.utils.pdb.transformed_displacements¶
‘array(…)’
- ovo.core.utils.pdb.fix_contigs(contigs, parsed_pdb)¶
- ovo.core.utils.pdb.get_pdb(pdb_code: str) bytes¶
- class ovo.core.utils.pdb.PDBSegmentSelector(segments: list[str])¶
Bases:
Bio.PDB.Select- accept_model(model)¶
- accept_chain(chain)¶
- accept_residue(residue)¶
- accept_atom(model)¶
- ovo.core.utils.pdb.detect_glycosylation_sites(atom_ppdb: pandas.DataFrame, chains: list[str] | str | None = None, query_atoms: list[str] | None = None, cyclic: bool = False) dict | None¶
Find glycosylation sites in the PDB file
- Param:
atom_ppdb: Pandas DataFrame with the ATOM records of the PDB file chains: str or list of str with the chain IDs to search for glycosylation sites query_atoms : list of str with the atom names to search for glycosylation sites
- Returns:
glycosylation_dict: dictionary with the coordinates of the query glycosylated atoms
- ovo.core.utils.pdb.calculate_coords_from_transformed_displacements(P1, P2) numpy.ndarray¶
Calculate the coordinates of the glycan atoms based on the transformed displacements
It uses the pre-calculated transformed displacements to calculate the coordinates of the glycan atoms
- Param:
P1, P2: 3D coordinates in the PDB reference frame of the ND2 and CB atoms of the glycosylated residue
- Returns:
3D coordinates in the PDB reference frame of the glycan atoms
- ovo.core.utils.pdb.add_glycan_to_pdb(pdb_str: str) tuple[str, list[str] | None]¶
Add glycan atoms to the PDB string
- ovo.core.utils.pdb.get_atom_coordinates(structure: Bio.PDB.Structure.Structure, chain_id: str | None, residues: list[int] | None, all_atom: bool = False, model_index=0) tuple[list[dict[str, numpy.ndarray]], list[Bio.PDB.Residue.Residue]]¶
- ovo.core.utils.pdb._align_sequences_and_get_indices(seqs: list[str])¶
Aligns multiple protein sequences based on their amino acid sequences. Returns a tuple, a list of aligned sequences and list of indices for each sequence that correspond to the aligned residues.
- ovo.core.utils.pdb.align_multiple_proteins_pdb(pdb_strs: list[str], chain_residue_mappings: list[list[tuple[str, list[int] | None]] | None], force_sequence_alignment: bool = False, all_atom: bool = False, verbose: bool = False) tuple[list[str], float]¶
Aligns multiple protein structures based on their atoms (CA or all).
- Parameters:
pdb_strs – list of PDB strings
chain_residue_mappings – list of lists of tuples with chain ID and residues to align, if None provided, then whole chain/structure is aligned
force_sequence_alignment – if True, always align based on sequence even if lengths match
all_atom – if True, align using all atoms from matched residues (not just CA atoms)
verbose – if True, print information about the alignment process
- ovo.core.utils.pdb.get_aligned_structure_as_string(structure) str¶
Returns the PDB representation of the given structure as a string.
- ovo.core.utils.pdb.pad_line(line)¶
Helper function to pad line to 80 characters in case it is shorter
- ovo.core.utils.pdb.pdb_to_mmcif_iter(pdb_data: str, structure_id: str, bfactor_to_plddt=False, fractional_plddt=False)¶
ADAPTED from pdb-tools pdb_tocif - added section that converts b-factor to pLDDT metric
Convert a structure in PDB format to mmCIF format.
This function is a generator.
Parameters
pdb_data: string with PDB data structure_id: entry ID bfactor_to_plddt: convert b-factor to alphafold pLDDT metric fractional_plddt: multiply pLDDT by 100 to get 0-100, used for ESMFold PDB which has values in 0-1 range
Yields
str (line-by-line) The structure in mmCIF format.
- ovo.core.utils.pdb.pdb_to_mmcif(pdb_data: str, structure_id: str, bfactor_to_plddt=False, fractional_plddt=False)¶
- exception ovo.core.utils.pdb.ChainNotFoundError¶
Bases:
Exception
- ovo.core.utils.pdb.get_sequences_from_pdb_str(pdb_str: str, chains: list[str] = None, by_residue_number: bool = False) dict[str, str] | dict[str, dict[str, str]]¶
Get the sequence of a structure from the pdb file.
- Parameters:
pdb_str – str, PDB file contents as string
chains – list of str, chain IDs to extract sequences from, if None, all chains are extracted
by_residue_number – if True, return a dict with residue numbers (strings) as keys and amino acids as values
Sequence is extracted based on CA atoms. If a residue has CA atom, it is included in the sequence, either as the amino acid letter (if one of 20 standard amino acids) or as X (if non-standard amino acid). If a residue does not have CA atom (e.g. a ligand), it is ignored.
Chain breaks (for example a jump from 123 to 134) are NOT filled with X but ignored.
Assumes sorted residue ids in the PDB file within each chain. (E.g. BioPython does as well return it in the original order.)
Only the first model in the PDB file is parsed (everything after ENDMDL is ignored).
- ovo.core.utils.pdb.get_remark_header(pdb_path: str) tuple[str, list[str]]¶
Get the REMARK header from the PDB file.
- Parameters:
pdb_path – str, path to the PDB file
- Returns:
str, REMARK header
- ovo.core.utils.pdb.REMARK_KEYS¶
[‘Input contig’, ‘Standardized contig’, ‘Chains’, ‘Input hotspots’, ‘Standardized hotspots’]
- ovo.core.utils.pdb.get_standardized_remarks_from_pdb_str(pdb_str: str) dict[str, str]¶
Parse “standardized” remarks from the PDB file
- Parameters:
pdb_str – PDB file contents as string
- Returns:
dict, parsed remarks {“Input contig”: “A45-46/10-15/A45-46/0 B24-26/5/B24-26/0 C10-20”, …} Example: example header to read REMARK 1 Input contig: “A45-46/10-15/A45-46/0 B24-26/5/B24-26/0 “ REMARK 1 Input contig: “C10-20” REMARK 1 Standardized contig: “A45-46/13-13/A45-46/0 B24-26/5-5/B24-26/” REMARK 1 Standardized contig: “0 C10-20/0” REMARK 1 Chains: “A B C” REMARK 1 Input hotspots: REMARK 1 Standardized hotspots:
- ovo.core.utils.pdb.trim_pdb_str(pdb_input_string: str, target_chain: str, start_res: int, end_res: int) str¶
- ovo.core.utils.pdb.filter_pdb_str(pdb_input_string: str, segments: list[str], add_ter=False) str¶
Filter a PDB string to only include specified segments.
- Parameters:
pdb_input_string – str, input PDB string
segments – list of str, segments to include, e.g. [“A5”, “A10-20”, “B30-40”, “C”]
add_ter – Order ATOMs in PDB based on order in selected_segments, add TER and END in between
- Returns:
str, filtered PDB string
- ovo.core.utils.pdb.check_rfdiffusion_input(pdb_input_string: str, target_chain: str, start_res_trimmed_chain: int, end_res_trimmed_chain: int, max_amide_distance: float = 3.0)¶
Make sure that structure does not contain chain breaks without a gap in residue numbering (required in refolding step)
Also make sure that no insertion codes are present.
Raises ValueError if chain breaks are found.