//=time() ?>
New preprint by @Ale_Raganato et al. shows that NMT models with manually engineered, fixed (i.e. position-based) attention patterns perform as well as models that learn how to attend. Super cool! https://t.co/keoF3PPvhd