Configurations
Config
- class textpruner.configurations.Config[source]
Base class for
GeneralConfig
,VocabularyPruningConfig
andTransformerPruningConfig
.
GeneralConfig
- class textpruner.configurations.GeneralConfig(use_device: str = 'auto', output_dir: str = './pruned_models', config_class: str = 'GeneralConfig')[source]
Configurations for the device and the output directory.
- Parameters
device –
'cpu'
or'cuda'
or'cuda:0'
etc. Specify which device to use. If it is set to'auto'
, TextPruner will try to use the CUDA device if there is one; otherwise uses CPU.output_dir – The diretory to save the pruned models.
config_class – Type of the configurations. Users should not change its value.
VocabularyPruningConfig
- class textpruner.configurations.VocabularyPruningConfig(min_count: int = 1, prune_lm_head: Union[bool, str] = 'auto', config_class: str = 'VocabularyPruningConfig')[source]
Configurations for vocabulary pruning.
- Parameters
min_count – The threshold to decide if the token should be removed. The token will be removed from the vocabulary if it appears less than
min_count
times in the corpus.prune_lm_head – whether pruning the lm_head if the model has one. If
prune_lm_head==False
, TextPruner will not prune the lm_head; ifprune_lm_head==True
, TextPruner will prune the lm_head and raise a error if the model does not have an lm_head; ifprune_lm_head=='auto'
, TextPruner will try to prune the lm_head and will continue if the model does not have an lm_head.config_class – Type of the configurations. Users should not change its value.
TransformerPruningConfig
- class textpruner.configurations.TransformerPruningConfig(target_ffn_size: Optional[int] = None, target_num_of_heads: Optional[int] = None, pruning_method: str = 'masks', ffn_even_masking: Optional[bool] = True, head_even_masking: Optional[bool] = True, n_iters: Optional[int] = 1, multiple_of: int = 1, pruning_order: Optional[str] = None, use_logits: bool = False, config_class: str = 'TransformerPruningConfig')[source]
Configurations for transformer pruning.
- Parameters
target_ffn_size – the target average FFN size per layer.
target_num_of_heads – the target average number of heads per layer.
pruning_method –
'masks'
or'iterative'
. If set to'masks'
, the pruner prunes the model with the given masks (head_mask
andffn_mask
). If set to'iterative'
. the pruner calculates the importance scores of the neurons based on the data provided by thedataloader
and then prunes the model based on the scores.ffn_even_masking – Whether the FFN size of each layer should be the same.
head_even_masking – Whether the number of attention heads of each layer should be the same.
n_iters – if
pruning_method
is set to'iterative'
,n_iters
is number of pruning iterations to prune the model progressively.multiple_of – if
ffn_even_masking
isFalse
, restrict the target FFN size of each layer to be a multiple ofmultiple_if
.pruning_order –
None
or'head-first'
or'ffn-first'
.None
: prune the attention heads and ffn layer simultaneously; if set to'head-first'
or'ffn-first'
, the actual number of iterations is2*n_iters
.use_logits – if
True
, performs self-supervised pruning, where the logits are treated as the soft labels.config_class – Type of the configurations. Users should not change its value.
Warning
if
ffn_even_masking
isFalse
, the pruned model can not be save normally (we cannot load the model with the transformers libarary with the saved weights). So make sure to setsave_model=False
when callingTransformerPruner.prune()
orPipelinePruner.prune()
. There are two ways to avoid this:Save the model in TorchScript format manually;
Set
keep_shape=False
when callingTransformerPruner.prune()
orPipelinePruner.prune()
, so the full model can be saved. Then save theffn_masks
andhead_masks
. When loading the model, load the full model and then prune it with the masks.