Dataflow handling class

tefla.dataset.dataflow.Dataflow (dataset, num_readers=1, shuffle=True, num_epochs=None, min_queue_examples=1024, capacity=2048)

Args

  • dataset: an instance of the dataset class
  • num_readers: num of readers to read the dataset
  • shuffle: a bool, shuffle the dataset
  • num_epochs: total number of epoch for training or validation
  • min_queue_examples: minimum number of items after dequeue
  • capacity: total queue capacity

Methods

batch_inputs (batch_size, train, tfrecords_image_size, crop_size, im_size=None, bbox=None, image_preprocessing=None, num_preprocess_threads=16)

Args
  • dataset: instance of Dataset class specifying the dataset.
  • See dataset.py for details.
  • batch_size: integer
  • train: boolean
  • crop_size: training time image size. a int or tuple
  • tfrecords_image_size: a list with original image size used to encode image in tfrecords e.g.: [width, height, channel]
  • image_processing: a function to process image
  • num_preprocess_threads: integer, total number of preprocessing threads
Returns

images: 4-D float Tensor of a batch of images labels: 1-D integer Tensor of [batch_size].

get (items, image_size, resize_size=None)

Args
  • items: a list, with items to get from the dataset e.g.: ['image', 'label']
  • image_size: a list with original image size e.g.: [width, height, channel]
  • resize_size: if image resize required, provide a list of width and height e.g.: [width, height]

get_batch (batch_size, target_probs, image_size, resize_size=None, crop_size=[32, 32, 3], image_preprocessing=None, num_preprocess_threads=32, init_probs=None, enqueue_many=True, queue_capacity=2048, threads_per_queue=4, name='balancing_op', data_balancing=True)

Stochastically creates batches based on per-class probabilities. This method discards examples. Internally, it creates one queue to amortize the cost of disk reads, and one queue to hold the properly-proportioned batch.

Args
  • batch_size: a int, batch_size
  • target_probs: probabilities of class samples to be present in the batch
  • image_size: a list with original image size e.g.: [width, height, channel]
  • resize_size: if image resize required, provide a list of width and height e.g.: [width, height]
  • init_probs: initial probs of data sample in the first batch
  • enqueue_many: bool, if true, interpret input tensors as having a batch dimension.
  • queue_capacity: Capacity of the large queue that holds input examples.
  • threads_per_queue: Number of threads for the large queue that holds input examples and for the final queue with the proper class proportions.
  • name: a optional scope/name of the op

prefetch (tensor_dict, capacity) Creates a FIFO queue to asynchronously enqueue tensor_dicts and returns a dequeue op that evaluates to a tensor_dict. This function is useful in prefetching preprocessed tensors so that the data is readily available for consumers.

Args
  • tensor_dict: a dictionary of tensors to prefetch.
  • capacity: the size of the prefetch queue.
Returns

a FIFO prefetcher queue