Neural player is collection of unique NFTs generated by a neural network. Our task was to create a multimodal neural network that studies concepts in several modalities, namely in the verbal and visual forms, in order to build a better understanding of the world. The transformer is taught to autoregressively model text and image tokens as a single data stream. On the Christofari cluster, the model was trained for 37 days on 512 TESLA V100 GPUs, and then another 11 days on 128 GPUs - a total of 20352 GPU days.