ovo.core.storage¶
Module Contents¶
Classes¶
Class for managing file storage in local filesystem or S3 bucket |
API¶
- class ovo.core.storage.Storage(storage_root: str, aws: ovo.core.aws.AWSSessionManager | None, verbose: bool = False, memory_cache_limit_bytes=50 * 1024 * 1024, disk_cache_limit_bytes=200 * 1024 * 1024, memory_cache_limit_per_file_bytes=5 * 1024 * 1024)¶
Class for managing file storage in local filesystem or S3 bucket
Initialization
- _clear_temp_dir()¶
- _cache_read(file_path)¶
- _cache_store(file_path, content: str | bytes)¶
- static parse_path(path: str) tuple[str, str, str]¶
- list_dir(abs_path: str, only_dir=False, recursive=False) list[str]¶
List all files in the directory, return list of paths relative to provided path
- Parameters:
abs_path – path to the source directory
only_dir – if True, only list directories, if False, list all files
recursive – if True, list all files in the directory recursively (relative to provided path)
- Returns:
list of files in the directory
- file_exists(storage_path) bool¶
Search for the file in the local filesystem or in the S3 bucket
- Parameters:
storage_path – relative (within storage root) or absolute path to the source file
- Returns:
True if the file exists, False otherwise
- read_file_bytes(storage_path, cache_store: bool = True) bytes¶
- read_file_str(storage_path: str, cache_store: bool = True) str¶
Read the file from the source filesystem or from the source S3 bucket
- read_file_pickle(storage_path: str)¶
Read the file from the source filesystem or from the source S3 bucket and parse it as a pickle.
- resolve_path(storage_path: str) str¶
Resolve the storage path to an absolute path in the local filesystem or S3 bucket.
- Parameters:
storage_path – relative (within storage root) or absolute path to the source file
- Returns:
absolute path to the file in the local filesystem or S3 bucket
- store_file_path(source_abs_path: str, storage_rel_path: str, overwrite: bool = True) str¶
Store the file in the local filesystem or in the S3 bucket
- Parameters:
source_abs_path – abs path (or s3 URI) to the source file
storage_rel_path – path to the storage file
overwrite – if True, overwrite the file if it exists
- Returns:
relative path to the stored file
- store_file_bytes(file_bytes: bytes, storage_rel_path: str, overwrite: bool = True) str¶
Store the file string in the source filesystem or source S3 bucket
- Parameters:
file_bytes – file bytes to store
storage_rel_path – relative path to the storage file
overwrite – if True, overwrite the file if it exists
- Returns:
relative path to the stored file
- store_file_str(file_str: str, storage_rel_path: str, overwrite: bool = True) str¶
Store the file string in the source filesystem or source S3 bucket
- Parameters:
file_str – file string to store
storage_rel_path – relative path to the storage file
overwrite – if True, overwrite the file if it exists
- store_input(project_id: str, file_path: str = None, file_bytes: bytes = None, file_str: str = None, filename: str = None, overwrite: bool = True) str¶
Store the file string in the source filesystem or source S3 bucket
- Parameters:
project_id – project ID
file_path – path to the source file (when loading from path)
file_bytes – file bytes to store (when loading from bytes)
file_str – file string to store (when loading from string)
filename – filename to use when storing the file (required when loading from bytes or string)
overwrite – if True, overwrite the file if it exists
- create_zip(storage_paths_by_dir: dict[str, list[str]]) bytes¶
Create zip file
- Parameters:
storage_paths_by_dir – dictionary with the storage paths by directory
- Returns:
zip content with the stored files
- sync_file(storage_path: str, local_destination_path: str, link: bool = False)¶
Download/copy stored file to local directory
- Parameters:
storage_path – path to the stored file
local_destination_path – path to the destination file
link – if True, and if supported, create a symbolic link to the file instead of copying it
- sync_files(storage_paths: list[str], local_destination_dir: str, preserve_subdirs=False)¶
Download/copy list of stored files to local directory
- Parameters:
storage_paths – list of files to be downloaded to destination directory
local_destination_dir – path to destination directory (will be created if not exists)
preserve_subdirs – if True (or if recursive=True), preserve subdirectories in the local destination directory
- is_local_path(path)¶
- sync_directory(source_dir: str, local_destination_dir: str, link=False)¶
Download/copy directory to local directory
- Parameters:
source_dir – source directory, path should be absolute or relative to storage root
local_destination_dir – path to destination directory (will be created if not exists)
link – Use symbolic links instead of copying files when possible.
- prepare_workflow_input(storage_path: str, workdir: str, input_bytes: bytes = None, name: str = None, custom_hash: str = None) str¶
Prepare workflow input files for submission by storing them in the provided scheduler workdir under unique hashes based on each file’s contents.
- Parameters:
storage_path – path to the stored file
workdir – destination (working directory of the scheduler)
input_bytes – optional - do not read the storage_path, instead use the provided bytes
name – optional - rename the file to provided name (without extension, original extension will be appended)
custom_hash – use custom hash for the subdirectory name instead of using a generated hash of file contents
- prepare_workflow_inputs(storage_paths: list[str], workdir: str, names: List[str] = None, return_paths=False, single_directory=False) str | list[str]¶
Prepare workflow input files for submission by storing them in the provided scheduler workdir under unique hashes based on each file’s contents.
Each file will be stored in a separate subdirectory based on a hash of the file contents. This is useful when the same file can be submitted multiple times in different sets of files but should only be stored once.
When return_paths=False (default), returns a txt file with the final file paths (or directly the file path in case of single input file). When return_paths=True, returns list of file paths. When single_directory=True, creates single directory and returns its path. Requires filenames to be unique.
- Parameters:
storage_paths – list of files to be downloaded to destination directory
workdir – destination (working directory of the scheduler)
names – rename each file to provided name (list of names of same length as storage_paths, without file extension, original extension will be added)