Dataset
- class text_renderer.dataset.Dataset(data_dir: str, jpg_quality: int = 95)[source]
Abstract base class for dataset storage and retrieval.
This class provides a common interface for storing generated text images and their corresponding labels. It supports both image file storage and database storage formats.
- Parameters:
data_dir (str) – Directory path for storing dataset files
jpg_quality (int) – JPEG compression quality (1-100, default: 95)
- encode_param() list [source]
Get JPEG encoding parameters for image compression.
- Returns:
OpenCV JPEG encoding parameters
- Return type:
list
- read(name: str) Dict [source]
Read an image and its metadata from the dataset.
- Parameters:
name (str) – Unique identifier for the image
- Returns:
- Dictionary containing:
”image”: Image data as numpy array
”label”: Text label for the image
”size”: [width, height] of the image
- Return type:
Dict
- read_count() int [source]
Get the total number of samples in the dataset.
- Returns:
Number of samples in the dataset
- Return type:
int
- class text_renderer.dataset.LmdbDataset(data_dir: str)[source]
Save generated images into LMDB database format.
This dataset implementation stores images in LMDB (Lightning Memory-Mapped Database) format, which is compatible with PaddleOCR and provides efficient storage and retrieval for large datasets.
- LMDB Keys format:
image-{name}: Image raw bytes (JPEG encoded)
label-{name}: Text label as string
size-{name}: Image dimensions as “width,height” string
num-samples: Total number of samples in the dataset
- Parameters:
data_dir (str) – Directory path for the LMDB database
Initialize the LMDB dataset.
- Parameters:
data_dir (str) – Directory path for the LMDB database
- class text_renderer.dataset.ImgDataset(data_dir: str)[source]
Save generated images as JPEG files with labels and metadata in JSON.
This dataset implementation stores images as individual JPEG files in an ‘images’ subdirectory and maintains a JSON file with labels and metadata.
- JSON file format:
- {
- “labels”: {
“000000000”: “test”, “000000001”: “text2”
}, “sizes”: {
“000000000”: [width, height], “000000001”: [width, height]
}, “num-samples”: 2
}
- Parameters:
data_dir (str) – Directory path for storing dataset files