basecls.models.swin#

Swin Transformer Series

Swin Transformer: “Hierarchical Vision Transformer using Shifted Windows”

引用

microsoft/Swin-Transformer

basecls.models.swin.window_partition(x, window_size)[源代码]#

参数

x – (B, H, W, C)
window_size (int) – window size

返回

(num_windows*B, window_size, window_size, C)

返回类型

windows

basecls.models.swin.window_reverse(windows, window_size, H, W)[源代码]#

参数

windows – (num_windows*B, window_size, window_size, C)
window_size (int) – Window size
H (int) – Height of image
W (int) – Width of image

返回

(B, H, W, C)

返回类型

class basecls.models.swin.WindowAttention(dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[源代码]#

基类：Module

Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window.

参数

dim (int) – Number of input channels.
window_size (int) – The height and width of the window.
num_heads (int) – Number of attention heads.
qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (Optional[float]) – Override default qk scale of head_dim ** -0.5 if set.
attn_drop (float) – Dropout ratio of attention weight. Default: 0.0
proj_drop (float) – Dropout ratio of output. Default: 0.0

forward(x, mask=None)[源代码]#

参数

x – input features with shape of (num_windows*B, N, C)
mask – (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None

class basecls.models.swin.PatchMerging(dim, input_resolution, norm_name='LN')[源代码]#

基类：Module

Patch Merging Layer.

参数

dim (int) – Number of input channels.
input_resolution (Tuple[int, int]) – Resolution of input feature.
norm_name (str) – Normalization layer. Default: "LN"

forward(x)[源代码]#: x: B, H*W, C

class basecls.models.swin.SwinBlock(dim, input_resolution, num_heads, window_size=7, shift_size=0, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_name='LN', act_name='gelu')[源代码]#

基类：Module

Swin Transformer Block.

参数

dim (int) – Number of input channels.
input_resolution (Tuple[int, int]) – Input resulotion.
num_heads (int) – Number of attention heads.
window_size (int) – Window size. Default: 7
shift_size (int) – Shift size for SW-MSA. Default: 0
ffn_ratio (float) – Ratio of ffn hidden dim to embedding dim. Default: 4.0
qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (Optional[float]) – Override default qk scale of head_dim ** -0.5 if set.
drop (float) – Dropout ratio of non-attention weight. Default: 0.0
attn_drop (float) – Dropout ratio of attention weight. Default: 0.0
drop_path (float) – Stochastic depth rate. Default: 0.0
norm_name (str) – Normalization layer. Default: "LN"
act_name (str) – Activation layer. Default: "gelu"

forward(x)[源代码]#

class basecls.models.swin.SwinBasicLayer(dim, input_resolution, depth, num_heads, window_size, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, downsample=None, norm_name='LN', act_name='gelu')[源代码]#

基类：Module

A basic Swin Transformer layer for one stage.

参数

dim (int) – Number of input channels.
input_resolution (Tuple[int, int]) – Input resolution.
depth (int) – Number of blocks.
num_heads (int) – Number of attention heads.
window_size (int) – Local window size.
ffn_ratio (float) – Ratio of ffn hidden dim to embedding dim.
qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (Optional[float]) – Override default qk scale of head_dim ** -0.5 if set.
drop (float) – Dropout rate. Default: 0.0
attn_drop (float) – Attention dropout rate. Default: 0.0
drop_path (float) – Stochastic depth rate. Default: 0.0
norm_name (str) – Normalization layer. Default: "LN"
act_name (str) – Activation layer. Default: "gelu"
downsample (Optional[Module]) – Downsample layer at the end of the layer. Default: None

forward(x)[源代码]#

class basecls.models.swin.SwinTransformer(img_size=224, patch_size=4, in_chans=3, embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, ape=False, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, embed_layer=PatchEmbed, norm_name='LN', act_name='gelu', num_classes=1000, **kwargs)[源代码]#

基类：Module

Swin Transformer

A PyTorch impl of :: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - https://arxiv.org/pdf/2103.14030

参数

img_size (int) – Input image size. Default: 224
patch_size (int) – Patch size. Default: 4
in_chans (int) – Number of input image channels. Default: 3
embed_dim (int) – Patch embedding dimension. Default: 96
depths (Sequence[int]) – Depth of each Swin Transformer layer.
num_heads (Sequence[int]) – Number of attention heads in different layers.
window_size (int) – Window size. Default: 7
ffn_ratio (float) – Ratio of ffn hidden dim to embedding dim. Default: 4.0
qkv_bias (bool) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (Optional[float]) – Override default qk scale of head_dim ** -0.5 if set. Default: None
ape (bool) – If True, add absolute position embedding to the patch embedding. Default: False
patch_norm (bool) – If True, add normalization after patch embedding. Default: True
drop_rate (float) – Dropout rate. Default: 0
attn_drop_rate (float) – Attention dropout rate. Default: 0
drop_path_rate (float) – Stochastic depth rate. Default: 0.1
embed_layer (Module) – Patch embedding layer. Default: PatchEmbed
norm_name (str) – Normalization layer. Default: "LN"
act_name (str) – Activation layer. Default: "gelu"
num_classes (int) – Number of classes for classification head. Default: 1000

forward(x)[源代码]#