Spatial Transformer Layer

tefla.core.special_layers.spatialtransformer (U, theta, batch_size=64, downsample_factor=1.0, num_transform=1, name='SpatialTransformer', **kwargs)

Implements a spatial transformer layer as described in [1]. It's based on lasagne implementation in [2], modified by Mrinal Haloi

Args

  • U: float The output of a convolutional net should have the shape [batch_size, height, width, num_channels].
  • theta: float The output of the localisation network should be [batch_size, num_transform, 6] or [batch_size, 6] if num_transform=1 python`theta`` to : - identity = np.array([[1., 0., 0.], - [0., 1., 0.]]) - identity = identity.flatten() - theta = tf.Variable(initial_value=identity)
  • downsample_factor: a float, determines output shape, downsample input shape by downsample_factor

Returns

spatial transformed output of the network


Subsamples the input along the spatial dimensions

tefla.core.special_layers.subsample (inputs, factor, name=None)

Args

  • inputs: A Tensor of size [batch, height_in, width_in, channels].
  • factor: The subsampling factor.
  • name: Optional variable_scope.

Returns

output: A Tensor of size [batch, height_out, width_out, channels] with the input, either intact (if factor == 1) or subsampled (if factor > 1).


Strided 2-D convolution with 'SAME' padding

tefla.core.special_layers.conv2d_same (inputs, num_outputs, kernel_size, stride, rate=1, name=None, **kwargs)

When stride > 1, then we do explicit zero-padding, followed by conv2d with 'VALID' padding.

Note that

net = conv2d_same(inputs, num_outputs, 3, stride=stride)

is equivalent to

net = conv2d(inputs, num_outputs, 3, stride=1, padding='SAME') net = subsample(net, factor=stride)

whereas

net = conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')

is different when the input's height or width is even, which is why we add the current function. For more details, see ResnetUtilsTest.testConv2DSameEven().

Args

  • inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
  • num_outputs: An integer, the number of output filters.
  • kernel_size: An int with the kernel_size of the filters.
  • stride: An integer, the output stride.
  • rate: An integer, rate for atrous convolution.
  • name: name.

Returns

output: A 4-D tensor of size [batch, height_out, width_out, channels] with the convolution output.


Bottleneck residual unit variant with BN before convolutions

tefla.core.special_layers.bottleneck_v1 (inputs, depth, depth_bottleneck, stride, rate=1, name=None, **kwargs)

This is the full preactivation residual unit variant proposed in [2]. See Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck variant which has an extra bottleneck layer.

When putting together two consecutive ResNet blocks that use this unit, one should use stride = 2 in the last unit of the first block.

Args

  • inputs: A tensor of size [batch, height, width, channels].
  • depth: The depth of the ResNet unit output.
  • depth_bottleneck: The depth of the bottleneck layers.
  • stride: The ResNet unit's stride. Determines the amount of downsampling of the units output compared to its input.
  • rate: An integer, rate for atrous convolution.
  • outputs_collections: Collection to add the ResNet unit output.
  • name: Optional variable_scope.

Returns

The ResNet unit's output.


Bottleneck residual unit variant with BN before convolutions

tefla.core.special_layers.bottleneck_v2 (inputs, depth, depth_bottleneck, stride, rate=1, name=None, **kwargs)

This is the full preactivation residual unit variant proposed in [2]. See Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck variant which has an extra bottleneck layer.

When putting together two consecutive ResNet blocks that use this unit, one should use stride = 2 in the last unit of the first block.

Args

  • inputs: A tensor of size [batch, height, width, channels].
  • depth: The depth of the ResNet unit output.
  • depth_bottleneck: The depth of the bottleneck layers.
  • stride: The ResNet unit's stride. Determines the amount of downsampling of the units output compared to its input.
  • rate: An integer, rate for atrous convolution.
  • outputs_collections: Collection to add the ResNet unit output.
  • name: Optional variable_scope.

Returns

The ResNet unit's output.


DenseCRF over unnormalised predictions

tefla.core.special_layers.dense_crf (probs, img=None, n_classes=21, n_iters=10, sxy_gaussian= (1, 1), compat_gaussian=4, kernel_gaussian=, normalisation_gaussian=, sxy_bilateral= (49, 49), compat_bilateral=2, srgb_bilateral= (13, 13, 13), kernel_bilateral=, normalisation_bilateral=) More details on the arguments at https://github.com/lucasb-eyer/pydensecrf.

Args

  • probs: class probabilities per pixel.
  • img: if given, the pairwise bilateral potential on raw RGB values will be computed.
  • n_iters: number of iterations of MAP inference.
  • sxy_gaussian: standard deviations for the location component of the colour-independent term.
  • compat_gaussian: label compatibilities for the colour-independent term (can be a number, a 1D array, or a 2D array).
  • kernel_gaussian: kernel precision matrix for the colour-independent term (can take values CONST_KERNEL, DIAG_KERNEL, or FULL_KERNEL).
  • normalisation_gaussian: normalisation for the colour-independent term (possible values are NO_NORMALIZATION, NORMALIZE_BEFORE, NORMALIZE_AFTER, NORMALIZE_SYMMETRIC).
  • sxy_bilateral: standard deviations for the location component of the colour-dependent term.
  • compat_bilateral: label compatibilities for the colour-dependent term (can be a number, a 1D array, or a 2D array).
  • srgb_bilateral: standard deviations for the colour component of the colour-dependent term.
  • kernel_bilateral: kernel precision matrix for the colour-dependent term (can take values CONST_KERNEL, DIAG_KERNEL, or FULL_KERNEL).
  • normalisation_bilateral: normalisation for the colour-dependent term (possible values are NO_NORMALIZATION, NORMALIZE_BEFORE, NORMALIZE_AFTER, NORMALIZE_SYMMETRIC).

Returns

Refined predictions after MAP inference.


ResNeXt Block

tefla.core.special_layers.resnext_block (inputs, nb_blocks, out_channels, is_training, reuse, cardinality, downsample=False, downsample_strides=2, activation=, batch_norm=None, batch_norm_args=None, name='ResNeXtBlock', **kwargs) resnext paper https://arxiv.org/pdf/1611.05431.pdf

Args

  • inputs: Tensor. Inputs 4-D Layer.
  • nb_blocks: int. Number of layer blocks.
  • out_channels: int. The number of convolutional filters of the layers surrounding the bottleneck layer.
  • cardinality: int. Number of aggregated residual transformations.
  • downsample: bool. If True, apply downsampling using 'downsample_strides' for strides.
  • downsample_strides: int. The strides to use when downsampling.
  • activation: function (returning a Tensor).
  • batch_norm: bool. If True, apply batch normalization.
  • use_ bias: bool. If True, a bias is used.
  • w_init: function, Weights initialization.
  • b_init: tf.Tensor. Bias initialization.
  • w_regularizer: function. Add a regularizer to this
  • weight_decay: float. Regularizer decay parameter. Default: 0.001.
  • trainable: bool. If True, weights will be trainable.
  • reuse: bool. If True and 'scope' is provided, this layer variables will be reused (shared). override name.
  • name: A name for this layer (optional). Default: 'ResNeXtBlock'.

Returns

4-D Tensor [batch, new height, new width, out_channels].


Embedding

tefla.core.special_layers.embedding (inputs, vocab_dim, embedding_dim, reuse, validate_indices=False, w_init=, trainable=True, normalize=False, vocab_freqs=None, name='Embedding') Embedding layer for a sequence of integer ids or floats.

Args

  • inputs: a 2-D Tensor [samples, ids].
  • vocab_dim: list of int. Vocabulary size (number of ids).
  • embedding_dim: list of int. Embedding size.
  • validate_indices: bool. Whether or not to validate gather indices.
  • w_init: Weights initialization.
  • trainable: bool. If True, weights will be trainable.
  • reuse: bool. If True and 'scope' is provided, this layer variables will be reused (shared).
  • name: A name for this layer (optional). Default: 'Embedding'.

Returns

3-D Tensor [samples, embedded_ids, features].


Gated unit for language modelling

tefla.core.special_layers.gated_layer (inputs, layer, num_units, is_training, reuse, name='gated_layer', **kwargs)

Args

  • inputs: a 3-D/4-D Tensor, input [samples, timesteps, input_dim]
  • layer: a layer, layer to pass the inputs e.g. tefla.core.layers
  • num_units: a int, number of units for each layer
  • is_training: a boolean, Training if its true
  • reuse: bool. If True and 'scope' is provided, this layer variables will be reused (shared).
  • name: A name for this layer (optional). Default: 'gated_layer'.

Returns

a 3-D/4-D Tensor, output of the gated unit


Returns glimpses at the locations

tefla.core.special_layers.glimpseSensor (img, normLoc, minRadius=4, depth=1, sensorBandwidth=12)

Args

  • img: a 4-D Tensor, [batch_size, width, height, channels]
  • normloc: a float, [0, 1] normalized location
  • minRadius: a int, min radius for zooming
  • depth: a int, number of zooms
  • sensorbandwidth: a int, output glimpse size, width/height

Returns

a 5-D tensor of glimpses


Adds a PVA block layer

tefla.core.special_layers.pva_block_v1 (x, num_units, name='pva_block_v1', **kwargs) convolution followed by crelu and scaling

Args

  • x: A 4-D Tensor of with at least rank 2 and value for the last dimension, i.e. [batch_size, in_height, in_width, depth],
  • is_training: Bool, training or testing
  • num_units: Integer or long, the number of output units in the layer.
  • reuse: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
  • filter_size: a int or list/tuple of 2 positive integers specifying the spatial dimensions of of the filters.
  • stride: a int or tuple/list of 2 positive integers specifying the stride at which to compute output.
  • padding: one of "VALID" or "SAME".
  • activation: activation function, set to None to skip it and maintain a linear activation.
  • batch_norm: normalization function to use. If batch_norm is True then google original implementation is used and if another function is provided then it is applied. default set to None for no normalizer function
  • batch_norm_args: normalization function parameters.
  • w_init: An initializer for the weights.
  • w_regularizer: Optional regularizer for the weights.
  • untie_biases: spatial dimensions wise baises
  • b_init: An initializer for the biases. If None skip biases.
  • trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
  • name: Optional name or scope for variable_scope/name_scope.
  • use_bias: Whether to add bias or not

Returns

The 4-D Tensor variable representing the result of the series of operations. e.g.: 4-D Tensor [batch, new_height, new_width, n_output].


Adds a PVA block v2 layer

tefla.core.special_layers.pva_block_v2 (x, num_units, name='pva_block_v2', **kwargs) first batch normalization followed by crelu and scaling, convolution is applied after scalling

Args

  • x: A 4-D Tensor of with at least rank 2 and value for the last dimension, i.e. [batch_size, in_height, in_width, depth],
  • is_training: Bool, training or testing
  • num_units: Integer or long, the number of output units in the layer.
  • reuse: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
  • filter_size: a int or list/tuple of 2 positive integers specifying the spatial dimensions of of the filters.
  • stride: a int or tuple/list of 2 positive integers specifying the stride at which to compute output.
  • padding: one of "VALID" or "SAME".
  • activation: activation function, set to None to skip it and maintain a linear activation.
  • batch_norm: normalization function to use. If batch_norm is True then google original implementation is used and if another function is provided then it is applied. default set to None for no normalizer function
  • batch_norm_args: normalization function parameters.
  • w_init: An initializer for the weights.
  • w_regularizer: Optional regularizer for the weights.
  • untie_biases: spatial dimensions wise baises
  • b_init: An initializer for the biases. If None skip biases.
  • trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
  • name: Optional name or scope for variable_scope/name_scope.
  • use_bias: Whether to add bias or not

Returns

The 4-D Tensor variable representing the result of the series of operations. e.g.: 4-D Tensor [batch, new_height, new_width, n_output].


Performs a pooling operation that results in a fixed size:

tefla.core.special_layers.max_pool_2d_nxn_regions (inputs, output_size, mode='max') output_size x output_size.

Used by spatial_pyramid_pool. Refer to appendix A in [1].

Args

  • inputs: A 4D Tensor (B, H, W, C)
  • output_size: The output size of the pooling operation.
  • mode: The pooling mode {max, avg}

Returns

A list of tensors, for each output bin. The list contains output_size * output_size elements, where each elment is a Tensor (N, C).


Performs spatial pyramid pooling (SPP) over the input

tefla.core.special_layers.spatial_pyramid_pool (inputs, dimensions=[2, 1], mode='max', implementation='kaiming') It will turn a 2D input of arbitrary size into an output of fixed dimenson. Hence, the convlutional part of a DNN can be connected to a dense part with a fixed number of nodes even if the dimensions of the input image are unknown.

The pooling is performed over :math:l pooling levels. Each pooling level :math:i will create :math:M_i output features. :math:M_i is given by :math:n_i * n_i, with :math:n_i as the number of pooling operations per dimension level :math:i.

The length of the parameter dimensions is the level of the spatial pyramid.

Args

  • inputs: A 4D Tensor (B, H, W, C).
  • dimensions: The list of :math:n_i's that define the output dimension
  • of each pooling level :math:i. The length of dimensions is the level of
  • the spatial pyramid.
  • mode: Pooling mode 'max' or 'avg'.
  • implementation: The implementation to use, either 'kaiming' or 'fast'.
  • kamming is the original implementation from the paper, and supports variable
  • sizes of input vectors, which fast does not support.

Returns

A fixed length vector representing the inputs.