Spatial Transformer Layer
tefla.core.special_layers.spatialtransformer (U, theta, batch_size=64, downsample_factor=1.0, num_transform=1, name='SpatialTransformer', **kwargs)
Implements a spatial transformer layer as described in [1]. It's based on lasagne implementation in [2], modified by Mrinal Haloi
Args
- U: float The output of a convolutional net should have the shape [batch_size, height, width, num_channels].
- theta: float
The output of the localisation network should be [batch_size, num_transform, 6] or [batch_size, 6] if num_transform=1
python`theta`` to : - identity = np.array([[1., 0., 0.], - [0., 1., 0.]]) - identity = identity.flatten() - theta = tf.Variable(initial_value=identity)
- downsample_factor: a float, determines output shape, downsample input shape by downsample_factor
Returns
spatial transformed output of the network
Subsamples the input along the spatial dimensions
tefla.core.special_layers.subsample (inputs, factor, name=None)
Args
- inputs: A
Tensor
of size [batch, height_in, width_in, channels]. - factor: The subsampling factor.
- name: Optional variable_scope.
Returns
output: A Tensor
of size [batch, height_out, width_out, channels] with the
input, either intact (if factor == 1) or subsampled (if factor > 1).
Strided 2-D convolution with 'SAME' padding
tefla.core.special_layers.conv2d_same (inputs, num_outputs, kernel_size, stride, rate=1, name=None, **kwargs)
When stride > 1, then we do explicit zero-padding, followed by conv2d with 'VALID' padding.
Note that
net = conv2d_same(inputs, num_outputs, 3, stride=stride)
is equivalent to
net = conv2d(inputs, num_outputs, 3, stride=1, padding='SAME') net = subsample(net, factor=stride)
whereas
net = conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')
is different when the input's height or width is even, which is why we add the current function. For more details, see ResnetUtilsTest.testConv2DSameEven().
Args
- inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
- num_outputs: An integer, the number of output filters.
- kernel_size: An int with the kernel_size of the filters.
- stride: An integer, the output stride.
- rate: An integer, rate for atrous convolution.
- name: name.
Returns
output: A 4-D tensor of size [batch, height_out, width_out, channels] with the convolution output.
Bottleneck residual unit variant with BN before convolutions
tefla.core.special_layers.bottleneck_v1 (inputs, depth, depth_bottleneck, stride, rate=1, name=None, **kwargs)
This is the full preactivation residual unit variant proposed in [2]. See Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck variant which has an extra bottleneck layer.
When putting together two consecutive ResNet blocks that use this unit, one should use stride = 2 in the last unit of the first block.
Args
- inputs: A tensor of size [batch, height, width, channels].
- depth: The depth of the ResNet unit output.
- depth_bottleneck: The depth of the bottleneck layers.
- stride: The ResNet unit's stride. Determines the amount of downsampling of the units output compared to its input.
- rate: An integer, rate for atrous convolution.
- outputs_collections: Collection to add the ResNet unit output.
- name: Optional variable_scope.
Returns
The ResNet unit's output.
Bottleneck residual unit variant with BN before convolutions
tefla.core.special_layers.bottleneck_v2 (inputs, depth, depth_bottleneck, stride, rate=1, name=None, **kwargs)
This is the full preactivation residual unit variant proposed in [2]. See Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck variant which has an extra bottleneck layer.
When putting together two consecutive ResNet blocks that use this unit, one should use stride = 2 in the last unit of the first block.
Args
- inputs: A tensor of size [batch, height, width, channels].
- depth: The depth of the ResNet unit output.
- depth_bottleneck: The depth of the bottleneck layers.
- stride: The ResNet unit's stride. Determines the amount of downsampling of the units output compared to its input.
- rate: An integer, rate for atrous convolution.
- outputs_collections: Collection to add the ResNet unit output.
- name: Optional variable_scope.
Returns
The ResNet unit's output.
DenseCRF over unnormalised predictions
tefla.core.special_layers.dense_crf (probs, img=None, n_classes=21, n_iters=10, sxy_gaussian= (1, 1), compat_gaussian=4, kernel_gaussian=
Args
- probs: class probabilities per pixel.
- img: if given, the pairwise bilateral potential on raw RGB values will be computed.
- n_iters: number of iterations of MAP inference.
- sxy_gaussian: standard deviations for the location component of the colour-independent term.
- compat_gaussian: label compatibilities for the colour-independent term (can be a number, a 1D array, or a 2D array).
- kernel_gaussian: kernel precision matrix for the colour-independent term (can take values CONST_KERNEL, DIAG_KERNEL, or FULL_KERNEL).
- normalisation_gaussian: normalisation for the colour-independent term (possible values are NO_NORMALIZATION, NORMALIZE_BEFORE, NORMALIZE_AFTER, NORMALIZE_SYMMETRIC).
- sxy_bilateral: standard deviations for the location component of the colour-dependent term.
- compat_bilateral: label compatibilities for the colour-dependent term (can be a number, a 1D array, or a 2D array).
- srgb_bilateral: standard deviations for the colour component of the colour-dependent term.
- kernel_bilateral: kernel precision matrix for the colour-dependent term (can take values CONST_KERNEL, DIAG_KERNEL, or FULL_KERNEL).
- normalisation_bilateral: normalisation for the colour-dependent term (possible values are NO_NORMALIZATION, NORMALIZE_BEFORE, NORMALIZE_AFTER, NORMALIZE_SYMMETRIC).
Returns
Refined predictions after MAP inference.
ResNeXt Block
tefla.core.special_layers.resnext_block (inputs, nb_blocks, out_channels, is_training, reuse, cardinality, downsample=False, downsample_strides=2, activation=
Args
- inputs:
Tensor
. Inputs 4-D Layer. - nb_blocks:
int
. Number of layer blocks. - out_channels:
int
. The number of convolutional filters of the layers surrounding the bottleneck layer. - cardinality:
int
. Number of aggregated residual transformations. - downsample:
bool
. If True, apply downsampling using 'downsample_strides' for strides. - downsample_strides:
int
. The strides to use when downsampling. - activation:
function
(returning aTensor
). - batch_norm:
bool
. If True, apply batch normalization. - use_ bias:
bool
. If True, a bias is used. - w_init:
function
, Weights initialization. - b_init:
tf.Tensor
. Bias initialization. - w_regularizer:
function
. Add a regularizer to this - weight_decay:
float
. Regularizer decay parameter. Default: 0.001. - trainable:
bool
. If True, weights will be trainable. - reuse:
bool
. If True and 'scope' is provided, this layer variables will be reused (shared). override name. - name: A name for this layer (optional). Default: 'ResNeXtBlock'.
Returns
4-D Tensor [batch, new height, new width, out_channels].
Embedding
tefla.core.special_layers.embedding (inputs, vocab_dim, embedding_dim, reuse, validate_indices=False, w_init=
Args
- inputs: a 2-D
Tensor
[samples, ids]. - vocab_dim: list of
int
. Vocabulary size (number of ids). - embedding_dim: list of
int
. Embedding size. - validate_indices:
bool
. Whether or not to validate gather indices. - w_init: Weights initialization.
- trainable:
bool
. If True, weights will be trainable. - reuse:
bool
. If True and 'scope' is provided, this layer variables will be reused (shared). - name: A name for this layer (optional). Default: 'Embedding'.
Returns
3-D Tensor [samples, embedded_ids, features].
Gated unit for language modelling
tefla.core.special_layers.gated_layer (inputs, layer, num_units, is_training, reuse, name='gated_layer', **kwargs)
Args
- inputs: a 3-D/4-D
Tensor
, input [samples, timesteps, input_dim] - layer: a
layer
, layer to pass the inputs e.g.tefla.core.layers
- num_units: a
int
, number of units for each layer - is_training: a
boolean
, Training if its true - reuse:
bool
. If True and 'scope' is provided, this layer variables will be reused (shared). - name: A name for this layer (optional). Default: 'gated_layer'.
Returns
a 3-D/4-D Tensor
, output of the gated unit
Returns glimpses at the locations
tefla.core.special_layers.glimpseSensor (img, normLoc, minRadius=4, depth=1, sensorBandwidth=12)
Args
- img: a 4-D
Tensor
, [batch_size, width, height, channels] - normloc: a
float
, [0, 1] normalized location - minRadius: a
int
, min radius for zooming - depth: a
int
, number of zooms - sensorbandwidth: a
int
, output glimpse size, width/height
Returns
a 5-D tensor
of glimpses
Adds a PVA block layer
tefla.core.special_layers.pva_block_v1 (x, num_units, name='pva_block_v1', **kwargs) convolution followed by crelu and scaling
Args
- x: A 4-D
Tensor
of with at least rank 2 and value for the last dimension, i.e.[batch_size, in_height, in_width, depth]
, - is_training: Bool, training or testing
- num_units: Integer or long, the number of output units in the layer.
- reuse: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
- filter_size: a int or list/tuple of 2 positive integers specifying the spatial dimensions of of the filters.
- stride: a int or tuple/list of 2 positive integers specifying the stride at which to compute output.
- padding: one of
"VALID"
or"SAME"
. - activation: activation function, set to None to skip it and maintain a linear activation.
- batch_norm: normalization function to use. If
batch_norm
isTrue
then google original implementation is used and if another function is provided then it is applied. default set to None for no normalizer function - batch_norm_args: normalization function parameters.
- w_init: An initializer for the weights.
- w_regularizer: Optional regularizer for the weights.
- untie_biases: spatial dimensions wise baises
- b_init: An initializer for the biases. If None skip biases.
- trainable: If
True
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(see tf.Variable). - name: Optional name or scope for variable_scope/name_scope.
- use_bias: Whether to add bias or not
Returns
The 4-D Tensor
variable representing the result of the series of operations.
e.g.: 4-D Tensor
[batch, new_height, new_width, n_output].
Adds a PVA block v2 layer
tefla.core.special_layers.pva_block_v2 (x, num_units, name='pva_block_v2', **kwargs) first batch normalization followed by crelu and scaling, convolution is applied after scalling
Args
- x: A 4-D
Tensor
of with at least rank 2 and value for the last dimension, i.e.[batch_size, in_height, in_width, depth]
, - is_training: Bool, training or testing
- num_units: Integer or long, the number of output units in the layer.
- reuse: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
- filter_size: a int or list/tuple of 2 positive integers specifying the spatial dimensions of of the filters.
- stride: a int or tuple/list of 2 positive integers specifying the stride at which to compute output.
- padding: one of
"VALID"
or"SAME"
. - activation: activation function, set to None to skip it and maintain a linear activation.
- batch_norm: normalization function to use. If
batch_norm
isTrue
then google original implementation is used and if another function is provided then it is applied. default set to None for no normalizer function - batch_norm_args: normalization function parameters.
- w_init: An initializer for the weights.
- w_regularizer: Optional regularizer for the weights.
- untie_biases: spatial dimensions wise baises
- b_init: An initializer for the biases. If None skip biases.
- trainable: If
True
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(see tf.Variable). - name: Optional name or scope for variable_scope/name_scope.
- use_bias: Whether to add bias or not
Returns
The 4-D Tensor
variable representing the result of the series of operations.
e.g.: 4-D Tensor
[batch, new_height, new_width, n_output].
Performs a pooling operation that results in a fixed size:
tefla.core.special_layers.max_pool_2d_nxn_regions (inputs, output_size, mode='max') output_size x output_size.
Used by spatial_pyramid_pool. Refer to appendix A in [1].
Args
- inputs: A 4D Tensor (B, H, W, C)
- output_size: The output size of the pooling operation.
- mode: The pooling mode {max, avg}
Returns
A list of tensors, for each output bin. The list contains output_size * output_size elements, where each elment is a Tensor (N, C).
Performs spatial pyramid pooling (SPP) over the input
tefla.core.special_layers.spatial_pyramid_pool (inputs, dimensions=[2, 1], mode='max', implementation='kaiming') It will turn a 2D input of arbitrary size into an output of fixed dimenson. Hence, the convlutional part of a DNN can be connected to a dense part with a fixed number of nodes even if the dimensions of the input image are unknown.
The pooling is performed over :math:l
pooling levels.
Each pooling level :math:i
will create :math:M_i
output features.
:math:M_i
is given by :math:n_i * n_i
, with :math:n_i
as the number
of pooling operations per dimension level :math:i
.
The length of the parameter dimensions is the level of the spatial pyramid.
Args
- inputs: A 4D Tensor (B, H, W, C).
- dimensions: The list of :math:
n_i
's that define the output dimension - of each pooling level :math:
i
. The length of dimensions is the level of - the spatial pyramid.
- mode: Pooling mode 'max' or 'avg'.
- implementation: The implementation to use, either 'kaiming' or 'fast'.
- kamming is the original implementation from the paper, and supports variable
- sizes of input vectors, which fast does not support.
Returns
A fixed length vector representing the inputs.