basecls.models.swin#

Swin Transformer Series

Swin Transformer: “Hierarchical Vision Transformer using Shifted Windows”

引用

microsoft/Swin-Transformer

basecls.models.swin.window_partition(x, window_size)[源代码]#
参数
  • x – (B, H, W, C)

  • window_size (int) – window size

返回

(num_windows*B, window_size, window_size, C)

返回类型

windows

basecls.models.swin.window_reverse(windows, window_size, H, W)[源代码]#
参数
  • windows – (num_windows*B, window_size, window_size, C)

  • window_size (int) – Window size

  • H (int) – Height of image

  • W (int) – Width of image

返回

(B, H, W, C)

返回类型

x

class basecls.models.swin.WindowAttention(dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[源代码]#

基类:Module

Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window.

参数
  • dim (int) – Number of input channels.

  • window_size (int) – The height and width of the window.

  • num_heads (int) – Number of attention heads.

  • qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (Optional[float]) – Override default qk scale of head_dim ** -0.5 if set.

  • attn_drop (float) – Dropout ratio of attention weight. Default: 0.0

  • proj_drop (float) – Dropout ratio of output. Default: 0.0

forward(x, mask=None)[源代码]#
参数
  • x – input features with shape of (num_windows*B, N, C)

  • mask – (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None

class basecls.models.swin.PatchMerging(dim, input_resolution, norm_name='LN')[源代码]#

基类:Module

Patch Merging Layer.

参数
  • dim (int) – Number of input channels.

  • input_resolution (Tuple[int, int]) – Resolution of input feature.

  • norm_name (str) – Normalization layer. Default: "LN"

forward(x)[源代码]#

x: B, H*W, C

class basecls.models.swin.SwinBlock(dim, input_resolution, num_heads, window_size=7, shift_size=0, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_name='LN', act_name='gelu')[源代码]#

基类:Module

Swin Transformer Block.

参数
  • dim (int) – Number of input channels.

  • input_resolution (Tuple[int, int]) – Input resulotion.

  • num_heads (int) – Number of attention heads.

  • window_size (int) – Window size. Default: 7

  • shift_size (int) – Shift size for SW-MSA. Default: 0

  • ffn_ratio (float) – Ratio of ffn hidden dim to embedding dim. Default: 4.0

  • qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (Optional[float]) – Override default qk scale of head_dim ** -0.5 if set.

  • drop (float) – Dropout ratio of non-attention weight. Default: 0.0

  • attn_drop (float) – Dropout ratio of attention weight. Default: 0.0

  • drop_path (float) – Stochastic depth rate. Default: 0.0

  • norm_name (str) – Normalization layer. Default: "LN"

  • act_name (str) – Activation layer. Default: "gelu"

forward(x)[源代码]#
class basecls.models.swin.SwinBasicLayer(dim, input_resolution, depth, num_heads, window_size, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, downsample=None, norm_name='LN', act_name='gelu')[源代码]#

基类:Module

A basic Swin Transformer layer for one stage.

参数
  • dim (int) – Number of input channels.

  • input_resolution (Tuple[int, int]) – Input resolution.

  • depth (int) – Number of blocks.

  • num_heads (int) – Number of attention heads.

  • window_size (int) – Local window size.

  • ffn_ratio (float) – Ratio of ffn hidden dim to embedding dim.

  • qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (Optional[float]) – Override default qk scale of head_dim ** -0.5 if set.

  • drop (float) – Dropout rate. Default: 0.0

  • attn_drop (float) – Attention dropout rate. Default: 0.0

  • drop_path (float) – Stochastic depth rate. Default: 0.0

  • norm_name (str) – Normalization layer. Default: "LN"

  • act_name (str) – Activation layer. Default: "gelu"

  • downsample (Optional[Module]) – Downsample layer at the end of the layer. Default: None

forward(x)[源代码]#
class basecls.models.swin.SwinTransformer(img_size=224, patch_size=4, in_chans=3, embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, ape=False, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, embed_layer=PatchEmbed, norm_name='LN', act_name='gelu', num_classes=1000, **kwargs)[源代码]#

基类:Module

Swin Transformer
A PyTorch impl of :

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - https://arxiv.org/pdf/2103.14030

参数
  • img_size (int) – Input image size. Default: 224

  • patch_size (int) – Patch size. Default: 4

  • in_chans (int) – Number of input image channels. Default: 3

  • embed_dim (int) – Patch embedding dimension. Default: 96

  • depths (Sequence[int]) – Depth of each Swin Transformer layer.

  • num_heads (Sequence[int]) – Number of attention heads in different layers.

  • window_size (int) – Window size. Default: 7

  • ffn_ratio (float) – Ratio of ffn hidden dim to embedding dim. Default: 4.0

  • qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (Optional[float]) – Override default qk scale of head_dim ** -0.5 if set. Default: None

  • ape (bool) – If True, add absolute position embedding to the patch embedding. Default: False

  • patch_norm (bool) – If True, add normalization after patch embedding. Default: True

  • drop_rate (float) – Dropout rate. Default: 0

  • attn_drop_rate (float) – Attention dropout rate. Default: 0

  • drop_path_rate (float) – Stochastic depth rate. Default: 0.1

  • embed_layer (Module) – Patch embedding layer. Default: PatchEmbed

  • norm_name (str) – Normalization layer. Default: "LN"

  • act_name (str) – Activation layer. Default: "gelu"

  • num_classes (int) – Number of classes for classification head. Default: 1000

forward(x)[源代码]#