Here's a video of a conditional model I trained last March. The text on the bottom is encoded into a vector and fed into StyleGAN's mapping network. The right side is a heatmap of the dlatents.
I guess I can say I was working on text-to-image synthesis before it was cool.😉
A journey through conditional StyleGAN2 label space.
The z vector is fixed, and only the labels are changing, based on weighted averages of the w vectors produced after mapping the doc2vec encodings of the text at the bottom of the video.