Dataset

class text_renderer.dataset.Dataset(data_dir: str, jpg_quality: int = 95)[source]

Abstract base class for dataset storage and retrieval.

This class provides a common interface for storing generated text images and their corresponding labels. It supports both image file storage and database storage formats.

Parameters:
  • data_dir (str) – Directory path for storing dataset files

  • jpg_quality (int) – JPEG compression quality (1-100, default: 95)

close()[source]

Close the dataset and release any resources.

encode_param() list[source]

Get JPEG encoding parameters for image compression.

Returns:

OpenCV JPEG encoding parameters

Return type:

list

read(name: str) Dict[source]

Read an image and its metadata from the dataset.

Parameters:

name (str) – Unique identifier for the image

Returns:

Dictionary containing:
  • ”image”: Image data as numpy array

  • ”label”: Text label for the image

  • ”size”: [width, height] of the image

Return type:

Dict

read_count() int[source]

Get the total number of samples in the dataset.

Returns:

Number of samples in the dataset

Return type:

int

write(name: str, image: ndarray, label: str)[source]

Write an image and its label to the dataset.

Parameters:
  • name (str) – Unique identifier for the image

  • image (np.ndarray) – Image data as numpy array

  • label (str) – Text label corresponding to the image

write_count(count: int)[source]

Write the total count of samples to the dataset.

Parameters:

count (int) – Total number of samples

class text_renderer.dataset.LmdbDataset(data_dir: str)[source]

Save generated images into LMDB database format.

This dataset implementation stores images in LMDB (Lightning Memory-Mapped Database) format, which is compatible with PaddleOCR and provides efficient storage and retrieval for large datasets.

LMDB Keys format:
  • image-{name}: Image raw bytes (JPEG encoded)

  • label-{name}: Text label as string

  • size-{name}: Image dimensions as “width,height” string

  • num-samples: Total number of samples in the dataset

Parameters:

data_dir (str) – Directory path for the LMDB database

Initialize the LMDB dataset.

Parameters:

data_dir (str) – Directory path for the LMDB database

class text_renderer.dataset.ImgDataset(data_dir: str)[source]

Save generated images as JPEG files with labels and metadata in JSON.

This dataset implementation stores images as individual JPEG files in an ‘images’ subdirectory and maintains a JSON file with labels and metadata.

JSON file format:
{
“labels”: {

“000000000”: “test”, “000000001”: “text2”

}, “sizes”: {

“000000000”: [width, height], “000000001”: [width, height]

}, “num-samples”: 2

}

Parameters:

data_dir (str) – Directory path for storing dataset files