basecls.models.swin#
Swin Transformer Series
Swin Transformer: “Hierarchical Vision Transformer using Shifted Windows”
引用
- basecls.models.swin.window_partition(x, window_size)[源代码]#
- 参数
x – (B, H, W, C)
window_size (
int
) – window size
- 返回
(num_windows*B, window_size, window_size, C)
- 返回类型
windows
- class basecls.models.swin.WindowAttention(dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[源代码]#
基类:
Module
Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window.
- 参数
dim (
int
) – Number of input channels.window_size (
int
) – The height and width of the window.num_heads (
int
) – Number of attention heads.qkv_bias (
bool
) – If True, add a learnable bias to query, key, value. Default:True
qk_scale (
Optional
[float
]) – Override default qk scale ofhead_dim ** -0.5
if set.attn_drop (
float
) – Dropout ratio of attention weight. Default:0.0
proj_drop (
float
) – Dropout ratio of output. Default:0.0
- class basecls.models.swin.PatchMerging(dim, input_resolution, norm_name='LN')[源代码]#
基类:
Module
Patch Merging Layer.
- 参数
- class basecls.models.swin.SwinBlock(dim, input_resolution, num_heads, window_size=7, shift_size=0, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_name='LN', act_name='gelu')[源代码]#
基类:
Module
Swin Transformer Block.
- 参数
dim (
int
) – Number of input channels.num_heads (
int
) – Number of attention heads.window_size (
int
) – Window size. Default:7
shift_size (
int
) – Shift size for SW-MSA. Default:0
ffn_ratio (
float
) – Ratio of ffn hidden dim to embedding dim. Default:4.0
qkv_bias (
bool
) – If True, add a learnable bias to query, key, value. Default:True
qk_scale (
Optional
[float
]) – Override default qk scale ofhead_dim ** -0.5
if set.drop (
float
) – Dropout ratio of non-attention weight. Default:0.0
attn_drop (
float
) – Dropout ratio of attention weight. Default:0.0
drop_path (
float
) – Stochastic depth rate. Default:0.0
norm_name (
str
) – Normalization layer. Default:"LN"
act_name (
str
) – Activation layer. Default:"gelu"
- class basecls.models.swin.SwinBasicLayer(dim, input_resolution, depth, num_heads, window_size, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, downsample=None, norm_name='LN', act_name='gelu')[源代码]#
基类:
Module
A basic Swin Transformer layer for one stage.
- 参数
dim (
int
) – Number of input channels.depth (
int
) – Number of blocks.num_heads (
int
) – Number of attention heads.window_size (
int
) – Local window size.ffn_ratio (
float
) – Ratio of ffn hidden dim to embedding dim.qkv_bias (
bool
) – If True, add a learnable bias to query, key, value. Default:True
qk_scale (
Optional
[float
]) – Override default qk scale ofhead_dim ** -0.5
if set.drop (
float
) – Dropout rate. Default:0.0
attn_drop (
float
) – Attention dropout rate. Default:0.0
drop_path (
float
) – Stochastic depth rate. Default:0.0
norm_name (
str
) – Normalization layer. Default:"LN"
act_name (
str
) – Activation layer. Default:"gelu"
downsample (
Optional
[Module
]) – Downsample layer at the end of the layer. Default:None
- class basecls.models.swin.SwinTransformer(img_size=224, patch_size=4, in_chans=3, embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, ffn_ratio=4.0, qkv_bias=True, qk_scale=None, ape=False, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, embed_layer=PatchEmbed, norm_name='LN', act_name='gelu', num_classes=1000, **kwargs)[源代码]#
基类:
Module
- Swin Transformer
- A PyTorch impl of :
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - https://arxiv.org/pdf/2103.14030
- 参数
img_size (
int
) – Input image size. Default:224
patch_size (
int
) – Patch size. Default:4
in_chans (
int
) – Number of input image channels. Default:3
embed_dim (
int
) – Patch embedding dimension. Default:96
depths (
Sequence
[int
]) – Depth of each Swin Transformer layer.num_heads (
Sequence
[int
]) – Number of attention heads in different layers.window_size (
int
) – Window size. Default:7
ffn_ratio (
float
) – Ratio of ffn hidden dim to embedding dim. Default:4.0
qkv_bias (
bool
) – If True, add a learnable bias to query, key, value. Default:True
qk_scale (
Optional
[float
]) – Override default qk scale of head_dim ** -0.5 if set. Default:None
ape (
bool
) – If True, add absolute position embedding to the patch embedding. Default:False
patch_norm (
bool
) – If True, add normalization after patch embedding. Default:True
drop_rate (
float
) – Dropout rate. Default:0
attn_drop_rate (
float
) – Attention dropout rate. Default:0
drop_path_rate (
float
) – Stochastic depth rate. Default:0.1
embed_layer (
Module
) – Patch embedding layer. Default:PatchEmbed
norm_name (
str
) – Normalization layer. Default:"LN"
act_name (
str
) – Activation layer. Default:"gelu"
num_classes (
int
) – Number of classes for classification head. Default:1000