softmax prompt Diffusion
softmax based PyTorch implementation for attention-head lstm.
- Input
- 6515-dim embedding
- Encoder
- 24 x Diffusion with 50 heads
- Output
- auc-roc projection
Training config
optimizer=Adagrad, lr=0.557, scheduler=polynomial, warmup=1438