I have $30. What I want is unusual, in the realm of generator-adversarial networks, and may change the future of computer science, and may earn you additional contracts, but not from me. What I want is complete setup for a specialized neural network, running in Tensorflow, that can be trained on commodity CPU's using vast.ai.
Let me be clear- what I want is a folder with some files and instructions, where it is assumed I am running stock ubuntu 19.10, and I have a [login to view URL] account and a credit card, and know nothing else besides how to execute bash scripts, use SSH, and install packages. I want everything scripted so all I have to do is follow some instructions and then train the network. You are required to test this network yourself in order to verify it will improve with time. The network must be configurable to train and also to generate results, both final and intermediate. For it to function in a useful manner, I must be able to, later on, configure it to dump intermediate products, and also to be able to accept intermediate products as a command line input.
[login to view URL]
[login to view URL]
The outcome I want is a novel trained model which achieves maximum possible compression for a specific data set (ascii text files) with the minimum symbols library necessary to reconstitute it, by virtue of having intuitively learned the mechanics required to compress text.
A compression engine only achieves compression by saving bits overall. If the product of this engine is a model that can create compressed forms of text and reconstitute them, but, when the model's files are added to the compressed text and weighted up, weighs just as much as the original file did, it has not managed to learn to compress the data. For this purpose we must train the engine with a large amount of inputs.
I want to take a very large text corpus, pure ASCII, pure english sentences- the calgary corpus, some article text off wikipedia, some long books from the gutenberg project, [login to view URL] content- convert them(temporarily) to a PNG or other suitable image format- feed the image to a CYCLEGAN encoder- extract an intermediate product- weigh it- feed the intermediate product to the generator- then use a comparative to determine if the output matches the input.
The network is "rewarded" when the amount of bits in the intermediate result is reduced.
The encoder and the generator, in this case, both play a role in "learning". The encoder is learning to compress, the generator, to decompress. The discriminator is not necessarily a functional part of this mechanism, since it's only role is to verify that the text is truthful. We are not trying to fool a discriminator here. We are using the intermediate product of an image in an attempt to naively discover vector relationships with semantic structures in text that can result in arbitrary shapes that have novel dimensional complexity and also discover the most common such shapes, analogous to symbol libraries, and thus learn the most optimum way to compress the input.