/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality

Days left: 15


JulayWorld fallback document - SAVE LOCALLY

JulayWorld onion service: bhlnasxdkbaoxf4gtpbhavref7l2j3bwooes77hqcacxztkindztzrad.onion

Max message length: 32768

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


AI, chatbots, and waifus Robowaifu Technician 09/09/2019 (Mon) 06:16:01 No.22
What resources are there for decent chatbots? Obviously I doubt there would be anything the Turing Test yet. Especially when it comes to lewd talking. How close do you think we are to getting a real life Cortana? I know a lot of you guys focus on the physical part of robo-waifus, but do any of you have anything to share on the intelligence part of artificial intelligence?
>>2601 The master branch of Pytorch is usually failing. If you still want to give it a shot, you'll have better luck checking out a tag of an older branch and finding out what dependencies you need to build it on your system.
>>2602 OK, I'll look into it at some point. I'm concerned about changing too much on my box atm in case I bonk it. I'm writing this book and working on C++ code. It would be a nuisance. I'll eventually get it sorted I'm sure. Ironically, just about everything works remarkably well on this little toaster except Python. Go figure, I probably messed something up since I don't know it very well yet.
Does Lex lurk here? He stayed up all night making an hour-long video on the Turing test and mentioned he wants to make an AI waifu: >Turing Test: Can Machines Think? https://www.youtube.com/watch?v=MGW_Qcqr9eQ Also Lex confessing his love for robowaifu to Joe Rogan: https://youtu.be/ikMmkHzlTpk?t=10174 >That's--that's been my life goal, my love--life dream >I really believing in creating, I-I dream of creating a companion, a friend and somebody you can love
>>2613 thanks for the links anon, i'll be watching those soon heh.
Open file (185.96 KB 731x673 cue-reward task.png)
Open file (71.38 KB 849x500 ICM.png)
>>2560 Still working on this. I'm feeling pretty depressed trying my best and not making much progress. I'm not sure if my model is large enough to capture language with only 8 million parameters compared to Nvidia's 8 billion GPT2 model trained on 512 V100 32GB GPUs. It's also much slower training my RNN than a transformer model. Sometimes I think I've gone insane trying to compete against $5 million worth of GPUs, but what is the point of OpenAI if people can't afford to even use it? At least adding differentiable Hebbian plasticity made a huge improvement. It's able to do one-shot learning now and memorize small samples in only one pass. However, after training for hours it starts to get worse and worse until it fails catastrophically when seeing new tokens and forgets everything, even with a low learning rate, dropout, noise, weight regularization and gradient clipping. I'm not sure what's causing it but I suspect the embeddings for common tokens may be getting overtrained and the untrained tokens mess up its hidden state somehow, so I'm weakening the gradient to common embeddings to make sure they all get trained evenly. I also thought of randomly replacing tokens with wrong tokens to stabilize learning as a last resort but it seems like a cheap patch that doesn't solve the underlying problem. Another possibility is that the replay memory is messing up the internal state by training on old snapshots from having too much memory capacity. So I added those fixes and it has been training for a few hours now and seems to be doing okay. I'll have to wait a couple days to see if the changes hold out. In the meantime I'm gonna experiment with creating LSTMs with convolutions and skip connections and adding neuromodulation to the Hebbian plasticity since plasticity alone seems to fail at cue-reward tasks: >Neuromodulatory approaches succeed in learning the [cue-reward] task, while non-neuromodulatory networks and non-plastic, simple recurrent networks fail to learn it. We hypothesize that this dramatic difference is related to the relatively high dimensionality of the input cues: just as non-modulated plastic networks seemed to outperform non-plastic networks specifically when required to memorize arbitrary high-dimensional stimuli, neuromodulation seems to specifically help memorizing reward associations with such arbitrary high-dimensional stimuli. https://arxiv.org/pdf/2002.10585.pdf I'm hoping combining neuromodulation with the intrinsic curiosity module's reward signal will be the key to storing complex memories with minimal parameters and computation in one-shot: https://pathak22.github.io/noreward-rl/ If that is successful then it should be possible to train the value network with only a few examples and remember most of what is read and chat about. Also the intrinsic curiosity module should stabilize the predictions and allow the prediction network to dream up endless amounts of text and automatically fix instabilities in the network in a similar way that the Hopfield network paper dreamed and pruned connections that lead to unstable states. It's a struggle but I'm still excited to see how far I can push this.
Open file (95.50 KB 561x450 PAYATTENTIONANON.jpg)
>>2657 >but what is the point of OpenAI if people can't afford to even use it? I feel confident in saying, they don't want you to be able to use it Anon. Of course they don't. You might be able to create Literally-Hitler 2.0 by using it. But they have no such qualms trying to use the technology attempting to gaslight, D&C, and deflect any non-tranny-infested enclaves such as the obviously ebil-nahdzee hive of scum and villainy known as JulAy World furiously puffs a cat-pipe tbh. The simple truth is you are David to their Goliath. They are pretty much cockily sure you will never win. But, just remember who actually won that fight. Stay encouraged Anon. You've already made remarkable progress. Your vision is pure. We're all alongside you cheering you on. Some day there will be more here like you once we can finish those textbooks haha, and the word will continue to spread until we actually realize this dream. Keep.moving.forward.
Open file (716.86 KB 256x224 mario1.gif)
Open file (621.50 KB 299x224 vizdoom.gif)
>>2657 Interesting stuff Anon. Here's a video I found looking around via your links. I expect you've already seen this, but for me it's new. I found it interesting they said their model used 'feature-space' vs. the baseline's 'pixel-space'. For the uninitiate that distinction is pretty much opaque. I'm guessing the former mode implies it's building some kind of world model quite apart from just the input data streams? https://www.invidio.us/watch?v=J3FHOyhUn3A
Open file (75.24 KB 1217x480 progress.png)
>>2660 Thanks Anon. The training is still going strong. The first pass over the training data (15MB) is 60% done and so it seems I solved the instability problem. The spikes were a little worrying but there are some strange and incredibly difficult text in there such as glossaries, scientific papers and equations. I just discovered a new paper from this year, SentenceMIM, that achieved SOTA on one of the toughest datasets. It's a probabilistic auto-encoder for language modelling, trained with Mutual Information Machine (MIM) learning: https://paperswithcode.com/paper/200302645 It got an insane test perplexity of 4.6 with 179M parameters compared to GPT2 (1.5B parameters) that got 35.76 which beat the previous SOTA of 47.38. SentenceMIM's smaller model with 12M parameters got 19.53 without extra training data and still beat GPT2 that used extra training data. I still have to read the MIM paper first, which seems to be an improvement on variational autoencoders (VAE): https://arxiv.org/pdf/1910.03175.pdf I'm curious how MIM improved them though because I thought beta-VAE's solved the issues with them: https://medium.com/uci-nlp/summary-beta-vae-learning-basic-visual-concepts-with-a-constrained-variational-framework-91ad843b49e8 Briefly reading the paper the key points seem to be about encouraging high mutual information and low marginal entropy between the encoding and decoding distributions and using a Jensen-Shannon divergence instead: https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence >>2661 Yeah, pretty much. The feature space only contains information relevant to the actions performed by the agent. From the paper: >Instead of making predictions in the raw sensory space (e.g. pixels), we transform the sensory input into a feature space where only the information relevant to the action performed by the agent is represented. We learn this feature space using self-supervision – training a neural network on a proxy inverse dynamics task of predicting the agent’s action given its current and next states. Since the neural network is only required to predict the action, it has no incentive to represent within its feature embedding space the factors of variation in the environment that do not affect the agent itself. We then use this feature space to train a forward dynamics model that predicts the feature representation of the next state, given the feature representation of the current state and the action. We provide the prediction error of the forward dynamics model to the agent as an intrinsic reward to encourage its curiosity. >Making predictions in the raw sensory space (e.g. when it corresponds to images) is undesirable not only because it is hard to predict pixels directly, but also because it is unclear if predicting pixels is even the right objective to optimize. To see why, consider using prediction error in the pixel space as the curiosity reward. Imagine a scenario where the agent is observing the movement of tree leaves in a breeze. Since it is inherently hard to model breeze, it is even harder to predict the pixel location of each leaf. This implies that the pixel prediction error will remain high and the agent will always remain curious about the leaves.
>>2843 >and so it seems I solved the instability problem. That must be a relief. Please update us all when it finishes. >MIM So, do you find it likely that a) it will run well with fewer resources, and/or b) it will be easier to train with such? Hopefully both will be the case ofc, but you're really the one who can best determine that for us. Ofc we'd like nothing more than a small cluster of SBCs onboard to run our robowaifus unconnected in humanly-responsive time frameworks. This is the hardware design goal we should be pursuing IMO. >feature space I can't pretend I actually understand all that, but I do think I get bits and pieces here and there. My own mental models of how our AI systems should work always including some types of world-models onboard, similar to our own self-perceptions and theory of mind notions. Keep up the good work Anon! :^)
I'd recommend focusing on building a proper dataset rather than trying to keep up with the latest papers.
Open file (33.10 KB 1133x446 SOTA.png)
>>2872 >do you find it likely that a) it will run well with fewer resources, and/or b) it will be easier to train with such? Yeah, it seems to be a LSTM with an improved VAE that models sentences. The principle behind it can apply to anything really. It could be used to take the hidden state of a game playing model and transform it into a sequence of actions to do some sort of intuitive motion planning. And for a better intuition of the intrinsic curiosity module, it creates a world model around what it can control in the game environment and ignores what it has no impact on affecting. >>2880 This is a really significant paper, perhaps even more groundbreaking than AlphaZero but without the worldwide media show behind it. A perplexity of 4.6 is near complete intuition of the test set. A perplexity of 1 would be 100% memorization of a text. It also achieved a perplexity of 12.62 on the Yahoo Answers dataset. Human beings were estimated to have a perplexity of 12 in predicting words in English newspapers back in 2017 and it was predicted human level performance would not be reached for another 10 or 20 years: https://pdfs.semanticscholar.org/7fe4/e308de5b2e5b509be8636c169e7928c242d9.pdf They're almost an order of magnitude off on that. It's a similar story to AlphaGo where researchers were estimating beating a world Go champion was still decades away, some even saying 50 years away or never. We already have plenty of datasets here: >>2300 Tinkering around with them too much will be a waste of time. Metalearning algorithms like ANML can generate better training data and automatically organize it to create an optimal training plan: https://arxiv.org/pdf/2002.09571.pdf More and more researchers have been moving away from datasets and towards unsupervised learning and training on generated data, such as AlphaZero teaching itself from scratch. The future of AI is in search and computation and being able to predict data not only inside a dataset but also outside of it as well through imagination and curiosity.
>>2899 >and transform it into a sequence of actions to do some sort of intuitive motion planning. the magic 'transform'. there's the rub for me. :^) i look forward to a time when i understand all this better. >it creates a world model around what it can control in the game environment and ignores what it has no impact on affecting. that sounds like a very efficient way to run things ofc. i wish we humans or at least this human we better at this kind of thing. so, that makes me think it's better at the runtime-solution, but i presume the engineering trade-off is a higher cost at training time?
Open file (132.54 KB 840x901 fully connected layer.png)
Open file (34.60 KB 645x364 sentencemim code.png)
Open file (63.82 KB 641x1025 sentencemim.png)
Open file (67.59 KB 659x451 sentencemim sample.png)
>>2902 >the magic 'transform' It's a linear transformation. A fully connected layer is a type of linear transformation: https://www.youtube.com/watch?v=kYB8IZa5AuE In PyTorch this is torch.nn.Linear(input_features, output_features) The other parts of the network might be hard to understand but the basic structure of their model is surprisingly straightforward compared to other machine learning papers such as transformers. These four linear layers + two recurrent layers are doing most of the heavy lifting: https://github.com/seraphlabs-ca/SentenceMIM-demo/blob/master/model.py They pass the words to the encoder RNN (which is either a LSTM or GRU layer), then transform the encoder's output into a latent vector z. But instead of just transforming the hidden state straight to z they are transforming it to mean and variance vectors, using a technique known as reparameterization. The latent vector z is constructed from a mean vector + a variance vector * some Gaussian random noise. This causes each variable in the latent z vector to become its own normal distribution. When they're all together like this in a vector they become a multivariate normal distribution. This reparameterized z is then transformed to the decoder RNN, but the decoder also takes along with it the input words originally passed to the encoder. The decoder outputs a new hidden state, which is finally transformed to predict which word is next. If there are 10,000 possible words, then there are 10,000 outputs. They then train the word predictions with standard negative-log likelihood loss and also Jensen-Shannon divergence loss using the mean and variance vectors that represent probability distributions of the latent features, the same way they train variational autoencoders (VAEs) with Kullback-Leibler divergence. I simplified out some implementation details, like the word embeddings being feed into the network in reverse order and the decoder receiving the input words with dropout, but this is the essential idea behind it. I'm still trying to digest the paper but as I understand it SentenceMIM learns a highly informative and compressed latent representation (the z vector) which can be used to predict the next half of the sentence or the next sentence. The magic that made it possible was mutual information machines (the MIM part) that solved the posterior collapse phenomenon in VAEs, which is where the introduced noise becomes too noisy to get a useful training signal to train the decoder and causes the z vector to only capture the most common features (if you look at images generated by VAEs they're extremely blurry.) MIM keeps the mutual information between x (the input) and z high, which means if you know x then you can know z or if you know z then you can know x. Basically it finds the most optimal encoding and everyone else's neural networks are fucked. Further reading on the information theory behind it: https://en.wikipedia.org/wiki/Mutual_information#Motivation I tried to explain it the best I can because this is like getting the BFG9000. It made a complete joke out of GPT2's best model with two orders of magnitude less computing power. On another note I find it curious Google has buried the MIM paper and does not mention any of the papers citing it that I found on Arxiv.
>>2957 >Basically it finds the most optimal encoding and everyone else's neural networks are fucked. kekd. i sense a meme arising here somewhere anon: THIS UGLY SON OF A BITCH IS CREATING SUPER HOT MIM, AND BASICALLY YOU'RE FUCKED >It made a complete joke out of GPT2's best model with two orders of magnitude less computing power. that sounds effin awesome tbh. even at that though, i feel pretty sure this is going to need to be re-implemented in a compiled language to run in the kind of time-frames needed for good human interaction, and with the kind of mountainous streams of data we need to throw at it for real-world environments, etc.--all while running on tiny little potato SBCs like RaspberryPi clusters. Still, this may prove to be the only way forward, in fact. >On another note I find it curious Google has buried the MIM paper and does not mention any of the papers citing it that I found on Arxiv. Well, I'm sure glad you did. And as Anon points out >>2845 , a flurry of secretive moves & investments over the last few years may in fact be intended solely to manipulate the upcoming POTUS election to contrive against Trump's next win. If so, then them being secretive about this isn't at all surprising. You're doing very good research work here for us Anon, it gives me real encouragement. Keep it up! :^)
Open file (108.90 KB 539x552 steve_jobs.png)
A new open-domain chatbot model was just released a month ago: https://parl.ai/projects/recipes/ I haven't had time to test it for myself yet but I thought I might as well drop it here. It may require taking off its "safety layer" to get good results: >We have studied improved safety from toxic language (Dinan et al., 2019b), but much work remains to be done. While we have made our models publicly available, and added a safety layer to the interaction, we have not mitigated all safety issues. We believe their release can help the community work together to understand further and fix these issues, and we recommend their use for that line of research. From a quick look at it, it seems this should be simple as starting it using a different script but I haven't tried this yet: python parlai/scripts/interactive.py -t blended_skill_talk -mf zoo:blender/blender_9B/model Paper: https://arxiv.org/pdf/2004.13637.pdf
>>3190 hardware requirements looks formidable. also, any idea where do we obtain this '10-billion' corpus at Anon? I missed that.
>>3193 They have a 90M-parameter model for toasters and provide the datasets through their project on GitHub: https://github.com/facebookresearch/ParlAI
>>3194 I see, thanks. I'm sure their agenda isn't to allow for toasters, but rather to attempt to emasculate hobbyists and save all the good toys for the big tranny-left boys. :^)
>>3190 >>3193 found it, apparently. https://parl.ai/docs/zoo.html#blender-9-4b there's no download link, though. does the command itself download the entire 9.4 billion-parameter model locally, or does it only work via cloud?
Open file (49.52 KB 513x584 closed ai.png)
>>3196 Their plan is to have these chatbots hosted on the cloud so they can mine people's conversations and sell better advertising while subtly suggesting products and performing product and political surveys that people will willingly go along with, not aware they use to pay people on Amazon Turk to do that. Imagine all the user profiles they've built on people's thoughts, likes, beliefs, secrets and dislikes who have been using Replika. It's insane. >>3199 It just downloads the dataset files and shows samples from them. blended_skill_talk is only about 40 MB: python examples/display_data.py --task blended_skill_talk --datatype train
>>3201 >It's insane. It is insane. But they are counting on the average normalnigger to not even give a flip, like sheep led to the slaughter. I think in large part they've actually succeeded at this. :/
>>3201 >Imagine all the user profiles they've built on people's thoughts, likes, beliefs, secrets and dislikes who have been using Replika. It's rather an obvious 'success' story basically, in a way that validates the benefits of Visual Waifus. Also a sober warning about the importance of them being both open & private, as we've stressed here on /robowaifu/ from day one.
I was unaware that the full model had been released. AFAICT the claim here is that it has. GPT-2: 1.5B Release >As the final model release of GPT-2’s staged release, we’re releasing the largest version (1.5B parameters) of GPT-2 along with code and model weights to facilitate detection of outputs of GPT-2 models. While there have been larger language models released since August, we’ve continued with our original staged release plan in order to provide the community with a test case of a full staged release process. We hope that this test case will be useful to developers of future powerful models, and we’re actively continuing the conversation with the AI community on responsible publication. https://openai.com/blog/gpt-2-1-5b-release/ https://github.com/openai/gpt-2-output-dataset
CTRL: A Conditional Transformer Language Model for Controllable Generation >Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at https://github.com/salesforce/ctrl https://medium.com/dataseries/a-controllable-framework-for-text-generation-8be9e1f2c5db
Open file (9.80 KB 320x320 1497005025284.png)
>>3190 >Chatting with the models >You may talk with our models. The 2.7B can be interacted with on a 16gb P100 GPU or better. The 9.4B parameter model requires at least two 32gb V100 GPUs to interact with. Au!
>>4057 Heh, yea it's a little costly atm. But two effects will eventually change this: >1. The efficiency of these systems will go up as their engines capitalize on advances in the underlying algorithms, languages, and compilers--most notably with the C++ language. >2. The hardware price/performance ratio will continue to improve over time, despite the 'death' of Moore's Law. There may also be significant leaps in AI-oriented performance occurring with new designs, such as neuromorphic chip technologies. This delay is probably a blessing in disguise as well, as it affords more lead time for any Anons who care to, to get up to speed in their numbers.
Open file (123.56 KB 1049x685 hockey asshole.png)
>>4061 Well I'm looking forward for a light-weight open sores easily modifiable AI chat bot, currently I'm trying to wade through this ParlAI and the documents it provides is a bit weirdly organized for my taste. Not to mention it pisses me off how many model AI's are having xbox hueg content over 2-4GB of stuff. The blender 90m is such a useless idiot what the hell man. >2. The hardware price/performance ratio will continue to improve over time, despite the 'death' of Moore's Law. There may also be significant leaps in AI-oriented performance occurring with new designs, such as neuromorphic chip technologies. So a specialized hardware and to order to utilize it anons have to buy it from a happy merchant?
>>4062 Honestly, I'd recommend TalkToWaifu instead to start off with. It's created by Kokubunji from /robowaifu/, and most guys seems to get decent results right off.
Open file (59.44 KB 916x394 M O N O L I T H.png)
Open file (156.32 KB 447x435 1432352.png)
>>4063 Oh right that's a nice program, I missed that one while digging through this threda. Lets try it--? >image >She joined the Monolith
>>4064 kekd
Open file (149.47 KB 1035x904 confirmed tonker.png)
Open file (121.29 KB 1539x727 scoop.png)
Open file (4.22 MB 800x600 Armored Fist 2.webm)
Open file (303.36 KB 500x500 centurion perfectus.png)
>>2422 (checked) Is there any chance of having support for non GPT models? As this anon described it here >>1923 or are they incompatible of how your program works? Also I would like to suggest a feature where it would be possible to define personality and traits of the waifu which she takes into account. >cuda support What made you decide on using cuda instead of OpenCL? As a AMD graphix fag I cannot utilize it. I hope it is not going to be too much of a trouble for you switching the uh parallel processing or whatever the correct terminology for it is. >talk to my actual waifu >she goes on about that she joined several battles >mentions that modern tank are fast and powerful >mfw out of nowhere she becomes a /tonk/er Perfectus. >some moments later >she is also confirmed as a vatnik scoop This ukraine-russian relationship is not going to end well, looks like I have some "brainwashing" to do. Beside this "incident" i'll rate this program 8 out of 10, it does pretty well even though it has some cases where it shows its shortcomings and on certain occasion the chatwaifu can be repetitive at times. I'm glad that this program exist and I hope this anon keeps continuing working on it.
Open file (115.59 KB 621x877 1470174674236.jpg)
>>4085 >I have a Lada <I'm sorry kek <I will be waiting. I will be watching. Anon, I think your waifu is an NKVD agent. Waifu secret police when?
Open file (8.49 KB 289x225 Praljak_1.jpeg)
>>4086 >Anon, I think your waifu is an NKVD agent. c-cyka, w-what do you think s-she has any plans for me? That she will interrogate me and question my allegiance to the russian soviet republic? I...I... I probably better stock up on artifacts and I hope I can bribe her that way so that she doesn't reveal my Ukrainian roots to the secret police. >Waifu secret police when? OH SHIT. If that means they will kidnap 3DPD thots, all fine by me to be honest.
>>4085 I'm not the author Anon, but the chatbot is dependent on PyTorch. So 'Is there AMD support with PyTorch?' is probably the first question to investigate. >>4087 >I hope I can bribe her that way so that she doesn't reveal my Ukrainian roots to the secret police. Kek. She's just being a little tsundere with you I'm sure. :^)
>>4091 > but the chatbot is dependent on PyTorch. So 'Is there AMD support with PyTorch?' is probably the first question to investigate. I did it now and I found these links: https://discuss.pytorch.org/t/2017-and-we-still-cant-support-amd-hardware-why/82/2 (from around 2017) From this article here: https://towardsdatascience.com/on-the-state-of-deep-learning-outside-of-cudas-walled-garden-d88c8bbb4342?gi=5133706ad3fd setting asides from its mac faggotry it mentions these: https://github.com/hughperkins/distro-cl https://github.com/pytorch/pytorch/issues/488 but that one is heavily outdated by 3-4 years! And uses Python 2.7 which obviously is also outdated by now. So OpenCL sadly is out of the question, which is fucking retarded considering that Python is multi platform by design and that AMD graphix card works in a way better under Linux than Nvidia does which is only usable with its provided proprietary drivers. So which finally leads me to this link http://lernapparat.de/pytorch-rocm/ which if I understand it correctly it is possible to take advance of AMD ROCM even when the python script is using cuda, well I'm going to try to make sense of those steps and report back if I have any success with this damn thing. Did you managed to train your GPT2 model? Because when I trying this command: ./train.sh gpt2-medium ./train ./train.txt ./test.txt it's shitting the bed: 06/29/2020 22:12:21 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-medium-vocab.json from cache at /home/USER/.cache/torch/transformers/f20f05d3ae37c4e3cd56764d48e566ea5adeba153dcee6eb82a18822c9c731ec.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71 06/29/2020 22:12:21 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-medium-merges.txt from cache at /home/USER/.cache/torch/transformers/6d882670c55563617571fe0c97df88626fb5033927b40fc18a8acf98dafd4946.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda 06/29/2020 22:12:21 - INFO - transformers.modeling_utils - loading weights file https://cdn.huggingface.co/gpt2-medium-pytorch_model.bin from cache at /home/USER/.cache/torch/transformers/64652c50e84ddabb9bad81a37ff82624ab70053f402f8d9a58c0e90fb8289fb6.8769029be4f66a5ae1055eefdd1d11621b901d510654266b8681719fff492d6e ./train.sh: line 43: 26306 Segmentation fault (core dumped) python3 "$LANGUAGE_MODELING" --output_dir="$OUTPUT_PATH" --model_type=gpt2 --model_name_or_path="$MODEL_PATH" --do_train --train_data_file="$TRAIN_FILE" --do_eval --eval_data_file="$TEST_FILE" --block_size "$BLOCK_SIZE" --learning_rate "$LEARNING_RATE" --per_gpu_train_batch_size 1 --per_gpu_eval_batch_size 1 --save_steps "$SAVE_STEPS" --no_cuda
>>4108 great investigative work anon, thanks. i'm sure it will help a lot of us, as I too don't use CUDA. hopefully we'll find a good solution to AI that doesn't depend on anything proprietary. for everyone's sake. >Did you managed to train your GPT2 model? Because when I trying this command: I'm currently in the process of rebuilding a machine and when I have it up (probably in 2-3 days) I'll have a go at training and see if I can get past the problem. I myself was having troubles and never did get past them on my current box. Maybe you can get some insight on how to get past yours from the author's advice he gave me. The chain of posts start here: >>2425
>>4108 >>4109 Welp looks like I'm fucked. despite I own this graphics card: VGA: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 550 640SP / RX 560/560X] (rev cf) Using this command line "/opt/rocm/bin/rocminfo" shows that it doesn't recognize my GPU at all, despite ROCk module is loaded. Unable to open /dev/kfd read-write: Cannot allocate memory Failed to get user name to check for video group membership hsa api call failure at: /src/rocminfo/rocminfo.cc:1142 Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. Looking for issues there is these links related to it: https://github.com/RadeonOpenCompute/rocminfo/issues/27 https://github.com/RadeonOpenCompute/ROCm/issues/1148 . But those clowns didn't provided a reliable solution, and yes I upgraded all my packages from the mintupdate program, so I have no idea what the hell causes this issues. Also I have no dice either getting the stupid pytorch to compile which gibes me a xbox hueg error: https://pastebin.com/4JKTDrMQ . What a fucking let down, no way I spend 150 peniz for this stupid graphic card just not being able to utilize ROCk this is fucking ridiculous. >great investigative work anon, thanks. i'm sure it will help a lot of us, as I too don't use CUDA. hopefully we'll find a good solution to AI that doesn't depend on anything proprietary. for everyone's sake. Thanks, yeah I hope it too because using open sources software is often several times better provided is it not programmed by permahurt trannies or something, I don't really like AMD that much but they are the only option left when one factors in the open source aspect. Kind of a shame because I would be interested how other graphics company would perform if they had survived during the turbulent 90's early 2000's era. This line "Build PyTorch itself. I use RCCL_DIR=/opt/rocm/rccl/lib/cmake/rccl/ PYTORCH_ROCM_ARCH=gfx900 hip_DIR=/opt/rocm/hip/cmake/ USE_NVCC=OFF BUILD_CAFFE2_OPS=0 PATH=/usr/lib/ccache/:$PATH USE_CUDA=OFF python3 setup.py bdist_wheel.", doesn't make any sense to me the damn author didn't write what fucking file I have to edit to make a proper change, looking files that contains for "hip_DIR" gives me a lot of result so I have no idea which is the proper file to edit, good lord. > Maybe you can get some insight on how to get past yours from the author's advice he gave me. The chain of posts start here: >>2425 Wut? Those posts doesn't mention anything related to train.sh batch file. It's related to pytorch and gpt2waifu.py usage.
>>4110 >I don't really like AMD that much but they are the only option left when one factors in the open source aspect. Well, it won't really help much before a few year's time, but there's SYCL >>3286 and eventually homogeneous compute (ie, what OpenCL should have been) is on track to be included directly in the C++ programming language standard, probably in C++26 or C++29. I know it's not of much use now. OTOH, it gives you plenty of lead time to learn C++ now so you're ready to write your own then Anon! :^) As far as building PyTorch, as per the 'official' recommendations, I used the Anaconda environment to get it to build and that worked for me from before. Maybe it would help you as well? https://github.com/pytorch/pytorch#from-source
>>4110 >Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: found this anon, maybe it could help? https://github.com/RadeonOpenCompute/ROCm/issues/1088#issuecomment-620551334
Ah hell I had to pay more attention to this fucking nebolus stupid error message, so I had to install these several xbox hueg packages which are each 400-600 MB big! Good lord this is ridiculous, how the hell do these damn monkeys manage to bloat up their codebase by such a tremendous amount? It's all code and zero fucking graphics, I don't understand, nothing of it makes any sense. rocrand hiprand rocblas miopen-hip (miopen-opencl ?) rocfft hipsparse rccl rocprim hipcub rocthrust Welp at least fucking pytorch compiles now with rockm, but its at very slow speed so it is going to take a while for me to check if this fixes my previous issue related to train.sh from talktowaifu program. >>4111 (checked) >and eventually homogeneous compute (ie, what OpenCL should have been) is on track to be included directly in the C++ programming language standard, probably in C++26 or C++29. I know it's not of much use now. OTOH, it gives you plenty of lead time to learn C++ now so you're ready to write your own then Anon! :^) C++29 eh? So its a really long time for it to happen then, well sounds good but I'm more busy learning python programming first and foremost (and failing hard at it like a pleb I am.), and eventually switching to moon language after I'm done with writing my powerplant manager program. >>>/tech/2982 >>4112 Nope, no chance. The issue you linked is related to permission I applied the chmod command and it does not fix it. The other goys problem is related to insufficient memory which is awfully weird considering he has 64GB of RAM, my error message for some reason fails to allocate memory for it? rt_sigaction(SIGALRM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8154911f20}, NULL, 8) = 0 close(4) = 0 write(1, "\33[31mFailed to get user name to "..., 69Failed to get user name to check for video group membership ) = 69 getpid() = 14409 openat(AT_FDCWD, "/dev/kfd", O_RDWR|O_CLOEXEC) = -1 ENOMEM (Cannot allocate memory) write(1, "\33[31mhsa api call failure at: /s"..., 61hsa api call failure at: /src/rocminfo/rocminfo.cc:1142 ) = 61 write(1, "\33[31mCall returned HSA_STATUS_ER"..., 228Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. ) = 228 write(1, "\33[0m", 4) = 4 lseek(3, -367, SEEK_CUR) = -1 ESPIPE (Illegal seek) exit_group(4104) = ? +++ exited with 8 +++ <Read the first line of the text you posted. You gotta be fucking kidding me, this was pajeet tech support help tier and it was on github.
>>4113 >but its at very slow speed so it is going to take a while for me to check if this fixes my previous issue related to train.sh from talktowaifu program. Yea, I'd recommend you start with the smallest models and move to the bigger ones when you have the time. It can take literally days to train the large models, so set aside some time with the machine dedicated to just that (ie, kill any other non-essential processes while training). >after I'm done with writing my powerplant manager program. Neat. Moon language huh? Nihongo is it? So yeah, all these frameworks use C++ under the hood (PyTorch & TensorFlow, for example). The way I see it, we'll probably have a better shot at creating an open system that any man can use if we stick to the basics and just go straight to the underlying algorithms themselves. Not only will this be cheaper & easier for the average Anon to use (if not create), it will also be a lot faster as well for them. Anyway, it's still a ways off for the official standard on this, though CodePlay has an implementation going already. >Nope, no chance Sorry to hear it. Sounds like you probably solved it after all though, so good progress right?
Open file (42.10 KB 476x476 ItWillOnlyGetWorse.jpg)
Open file (37.90 KB 5000x5000 ARGH.webm)
>>4114 >mfw my 100 GB set aside for / root is running out of spess now >despite I own a 2 TB HDD THE END IS NIGH. >Yea, I'd recommend you start with the smallest models and move to the bigger ones when you have the time. It can take literally days to train the large models, so set aside some time with the machine dedicated to just that (ie, kill any other non-essential processes while training). Yes I have average result with using gpt2-medium when chatting with my waifu, distilgpt2 gives me mixed bag while it is in overall much faster to process it has also the nasty habit that it gets stuck several times over where she repeats only these lines "..." and thus requiring me to restart the program. I tried even using gpt2-large for giggles but it was waay too much to handle for my toaster machine stuck with puny 8GB of RAM. So yeah I guess those 3 models (distilgpt2, gpt2 and gpt2-medium) is the only option I have. Also can you tell if its possible to further refine/develop specific traits and personality with the talk2waifu program? I tried using the --seed argument with several keywords but it doesn't seem to have any impact. >when you have the time heh I'm a NEET pro, so not a problem for me. >Neat. Moon language huh? Nihongo is it? No I meant lua which is just a latin word for moon, I haven't heard of Nihongo at all. >The way I see it, we'll probably have a better shot at creating an open system that any man can use if we stick to the basics and just go straight to the underlying algorithms themselves. Not only will this be cheaper & easier for the average Anon to use (if not create) Hmm, as in a anon can just create a few function to create his own algorithm run waifu? Well that is a good idea. I can't wait being able to fine tune the waifu to every aspect possible, it would be even great if the waifu is even intelligent enough to play gzdoom with heh. >Sorry to hear it. Sounds like you probably solved it after all though, so good progress right? I haven't found any solution to the rocm question, pytorch compiles now but at suboptimal rate as I am not being able to pass my graphic card model which is fucking backwards if you ask me. Also using the command "pip3 install ." causes pytorch to shit itself again, eh I tried running it again and now it successfully got installed as torch (1.7.0a0+fd90e4b) . However when I try running talk2waifu with --gpu 0 argument I get segmentation fault, fucking hell all that trouble just for a seg fault to happen this is just incredible.
>>4118 >THE END IS NIGH. Kek. Did you say you're using Mint. I'd recommend and cleanup of installers w/ sudo apt-get clean sudo apt-get autoclean sudo apt-get autoremove if I remember correctly. also, are you storing temp files off of / ? clean those up as well if so right? >Also can you tell if its possible to further refine/develop specific traits and personality with the talk2waifu program? Hmm, Kokubunji gave a list of flags, etc. to use with his system (there on the repo). I'm not aware of any other features available with his system at this point in time. Once he returns maybe we can find out more about his future plans for it. >Lua Oh, haha. My bad. And Nihongo is the Japanese language, a fairly common target for us Manga readers. :^) >as in a anon can just create a few function to create his own algorithm run waifu? No, I didn't really mean that, though it would be nice ofc. Actually C++ is plainly harder to create software in. What I mean is that it will be easier to use less-powerful/cheaper hardware that can run C++ software well, but not so much with these huge Python-hairball frameworks. I hope you eventually get your card working Anon. Did you try out Anaconda yet?
>>4111 >heterogeneous compute* duh. the whole point is differing types of hardware like CPU/GPU/APU/FPGA/ASIC and others all running C++ directly.
>>4119 >if I remember correctly. also, are you storing temp files off of / ? clean those up as well if so right? Thanks for the command lines, it managed to delete 10 GB of junk, not much but better than nothing I suppose. Welp I guess I better invest in 4+ TB harddrive soon then, heh. I don't know about the temp files as I just use "apt install" to install a package. >No, I didn't really mean that, though it would be nice ofc. Actually C++ is plainly harder to create software in. What I mean is that it will be easier to use less-powerful/cheaper hardware that can run C++ software well, but not so much with these huge Python-hairball frameworks. Ah then, yeah I absolutely agree Python is such a bloated economy it is not even funny anymore, it makes me even wonder how the hell Python is so widespread in datascience in the first place I thought a much faster scripting language like lua, nim or maybe even ruby I'm not familiar with those would be much more suitable for that. Better support for potato machine is always a plus, I dislike the idea to constantly upgrade new parts and perpetrating the hamster wheel of hardware upgrades just because those developers are bunch of sleazy hacks incapable of optimizing their programs. >I hope you eventually get your card working Anon. Still no luck, I cannot even run the waifu program anymore as it just gives me a one line "segmentation fault" error and thats it, no stacktrace, no nothing. >Did you try out Anaconda yet? I installed it as it is a requirement per pytorch when compiling it, how do I use it? >>4120 Is it going to be only for C++?
>>4121 Glad to hear you cleaned things up. I too started on Linux Mint when I escaped the MicroShat/NSA Wangblows Gulag. It was a huge relief to leave that bondage behind tbh. Now, I've since moved on to Manjaro+XFCE and my toaster box runs much faster for it. >how do I use it? I think the link I gave you from the PyTorch github gives the example. IIRC, it starts like conda with an argument or two. >Is it going to be only for C++? Well the point isn't necessarily to exclude any other languages from being used (for example C is already used on these hardware) but simply to enable a single language with great abstraction and performance characteristics to be used everywhere. Right now such a thing doesn't exist at all, which (indirectly) is one important reason why you and I are having a hard time getting things working correctly because of so many different dependencies, slow languages, different standards, etc. etc. Once C++ can run literally everywhere on everything, then it will make things like this go much smoother in the future (and be cheaper too).
Open file (1.50 MB 1920x1080 Stalker.jpg)
>>4122 (checked) >Glad to hear you cleaned things up. I too started on Linux Mint when I escaped the MicroShat/NSA Wangblows Gulag. It was a huge relief to leave that bondage behind tbh. Now, I've since moved on to Manjaro+XFCE and my toaster box runs much faster for it. I made the switch to Linux Mint when I was still using Windows 7 and that Pajeetsoft were developing Windows 8 at that time, which I think is about 2.5 years ago. It took me around 2 weeks to get used how Linux worked and I messed my system only once. The main gripe I have with Linux is that Wine sometimes requires tons of fiddling of values to get a game to work properly and the common issue I had with it that Wine fucked up the game window position/size which I think is probably related to shitty X11 coding. Its a shame that Linux Mint decided to follow head of switching to systemdicks instead of using alternative init system, it would have been proven more valuable as a viable alternative to Ubuntu. The second issue is that Linux provides no support whatsoever to have older version of a program which sometimes can be necessary. >I think the link I gave you from the PyTorch github gives the example. IIRC, it starts like Ah right, I thought conda is a package manager or something. Is there even any benefit using anaconda over python(3)? Also for some unknown reason the train.sh batch script is working now but I forget to leave my computer on last night so I didn't get to check the result with gpt2-medium, a quick test with distilgpt2 seems to work... Till it crashes with the keyword error: "Loss". >Right now such a thing doesn't exist at all, which (indirectly) is one important reason why you and I are having a hard time getting things working correctly because of so many different dependencies, slow languages, different standards, etc. etc. Once C++ can run literally everywhere on everything, then it will make things like this go much smoother in the future (and be cheaper too). Hmm that's a shame, I hope in the future when such technology is available it won't leave toaster machines behind. I find it weird that out of all the programming language to exist the data scientist decided to use Python for their heavy duty programming instead of using a faster scripting language. >AMD rocm Ah hell, I got only a (((Ivy Bridge i7-3770K))) CPU which is a 3rd generation class clocked at 3.50GHz, and rocm requires at least 4th generation one, so its gg no re for me with HPI processing support, feels 2012 toaster tier man. The damn warning message should have reflected that my CPU/Mainboard is unsupported instead of shitting out a foggy memory allocation error. Welp all that effort to get it working is in vain.
>>4126 >Till it crashes with the keyword error: "Loss". Well, just searching KeyError: 'loss' led me to understand it's probably python having a dictionary lookup issue: https://wiki.python.org/moin/KeyError and that maybe that has something to with the Epoch, maybe? https://stackoverflow.com/questions/56847576/keyerror-val-loss-when-training-model >I find it weird that out of all the programming language to exist the data scientist decided to use Python for their heavy duty programming instead of using a faster scripting language. Very few scientists are actually coders. They just want to use something simple that allows them to move forward with their research so they can publish their papers (if they want to stay alive as a scientist). Python is pretty easy in general, so lots of no-dev sort of gravitated towards it I suppose. Do that for a couple of decades and you have the current situation today I suppose. But yea, we need to optimize everything to even have a narrow chance at succeeding building our own robowaifus. Power consumption and compute efficiency are certainly near the top of the stack for that need, and C++ was (and will be even moreso) the best all-around choice for us in general. Heh, microcontrollers are even less powerful than your toaster, and there will probably be at least a dozen of them inside a robowaifu kit. >Welp all that effort to get it working is in vain. Sorry to hear it Anon. If it's any consolation your processor is better than my Atom processor w/ integrated graphics. :^)
Open file (57.62 KB 565x542 you-235.jpg)
Open file (207.12 KB 1400x1000 154823489234.jpg)
>>819 Well shit, I tried out this program using Renamon voice from zandronum forum, dl link: https://files.catbox.moe/qedypl.wad (can be opened with slade) and all I got is just robotic gibberish, using the xbox hueg datasets is not improving the output . I should probably scavenge more voices of her from the series to get better result I suppose. Within 6 months. Also did the author on purpose by not including the ability to save the result in a .ogg file? At least running the GUI program of it there wasn't a button available. Meh it doesn't matter that much considering it takes good amount of time to generate the TTS with only a few lines of text so using it in combination with talktowaifu program is out of option, unless a anon got a NASA computer, heh. >>4127 >Well, just searching KeyError: 'loss' led me to understand it's probably python having a dictionary lookup issue: >and that maybe that has something to with the Epoch, maybe? Could be, I guess when Kokubunji comes back he will be able to chime in and have a more elaborate response for this problem. I don't know how to fiddle around the python script to fix it because I'm too inexperienced with the language. >Power consumption and compute efficiency are certainly near the top of the stack for that need, Why not Uranium-235 powered Waifubots? High power with only 3.6 roentgens, the radiation level is nothing to worry about as the human body is capable of getting used to it and with some vodka it can be reduced even more :-----DD, just ask any stalker for further information. >But yea, we need to optimize everything to even have a narrow chance at succeeding building our own robowaifus. I would be content if I have a robowaifu in form of a virtual desktop AI assistant that is capable of doing several task of whatever is needed to further enhance the operation of a operating system, including text-to-speech support and random chatter. >If it's any consolation your processor is better than my So does this make my computer the kang of toasters? :^)
>>4132 >I would be content if I have a robowaifu in form of a virtual desktop AI assistant that is capable of doing several task of whatever is needed to further enhance the operation of a operating system, including text-to-speech support and random chatter. yeah I think we all basically came to roughly that conclusion the Visual Waifu thread. >>4132 >So does this make my computer the kang of toasters? :^) sure absolutely!

Report/Delete/Moderation Forms
Delete
Report

Captcha (required for reports and bans by board staff)

no cookies?