/robowaifu/ - Robowaifu@home: Together We Are Powerful

Name
Subject
E-mail
Message	Max message length: 6144
Files	Drag files to upload or click here to select them Maximum 5 files / Maximum size: 20.00 MB

Spoiler images
Password	(used to delete files and postings)
Use bypass

Robowaifu@home: Together We Are Powerful Robowaifu Technician 03/14/2021 (Sun) 09:30:29 No.8958

The biggest hurdle to making quick progress in AI is the lack of compute to train our own original models, yet there are millions of gamers with GPUs sitting around barely getting used, potentially an order of magnitude more compute than Google and Amazon combined. I've figured out a way though we can connect hundreds of computers together to train AI models by using gradient accumulation. How it works is by doing several training steps and accumulating the loss of each step, then dividing by the amount of accumulation steps taken before the optimizer step. If you have a batch size of 4 and do 256 training steps before an optimizer step, it's like training with a batch size of 1024. The larger the batch size and gradient accumulation steps are, the faster the model converges and the higher final accuracy it achieves. It's the most effective way to use a limited computing budget: https://www.youtube.com/watch?v=YX8LLYdQ-cA These training steps don't need to be calculated by a single computer but can be distributed across a network. A decent amount of bandwidth will be required to send the gradients each optimizer step and the training data. Deep gradient compression achieves a gradient compression ratio from 270x to 600x without losing accuracy, but it's still going to be using about 0.5 MB download and upload to train something like GPT2-medium each optimizer step, or about 4-6 mbps on a Tesla T4. However, we can reduce this bandwidth by doing several training steps before contributing gradients to the server. Taking 25 would reduce it to about 0.2 mbps. Both slow and fast computers can contribute so long as they have the memory to hold the model. A slower computer might only send one training step whereas a fast one might contribute ten to the accumulated gradient. Some research needs to be done if a variable accumulation step size impacts training, but it could be adjusted as people join and leave the network. All that's needed to do this is a VPS. Contributors wanting anonymity can use proxies or TOR, but project owners will need to use VPNs with sufficient bandwidth and dedicated IPs if they wish that much anonymity. The VPS doesn't need an expensive GPU rental either. The fastest computer in the group could be chosen to calculate the optimizer steps. The server would just need to collect the gradients, decompress them, add them together, compress again and send the accumulated gradient to the computer calculating the optimizer step. Or if the optimizing computer has sufficient bandwidth, it could download all the compressed gradients from the server and calculate the accumulated gradient itself. My internet has 200 mbps download so it could potentially handle up to 1000 computers by keeping the bandwidth to 0.2 mbps. Attacks on the network could be mitigated by analyzing the gradients, discarding nonsensical ones and banning clients that send junk, or possibly by using PGP keys to create a pseudo-anonymous web of trust. Libraries for distributed training implementing DGC already exist, although not as advanced as I'm envisioning yet: https://github.com/synxlin/deep-gradient-compression I think this will also be a good way to get more people involved. Most people don't know enough about AI or robotics enough to help but if they can contribute their GPU to someone's robowaifu AI they like and watch her improve each day they will feel good about it and get more involved. At scale though some care will need to be taken that people don't agree to run dangerous code on their computers, either through a library that constructs the models from instructions or something else. And where the gradients are calculated does not matter. They could come from all kinds of hardware, platforms and software like PyTorch, Tensorflow or mlpack.

Robowaifu Technician 10/20/2022 (Thu) 21:16:14 No.17557

This post is off topic. Feel free to delete it or move it if you don't want it here. >>17551 >Yes this is what defeat looks like. That's not a defeat scenario, that's what I expect is the inevitable scenario for all parties, even well-funded ones. Large models have gotten 10x bigger year-on-year since 2018, and I think training costs have risen faster than that. Companies are eventually going to be forced to specialize in their AI direction, at which point none of them will be "the best" at everything. At that point, if you want "the best" AI, you'll need to be able to plug into multiple models from multiple parties regardless of how well-funded you are. In the long run, no model performs better than a mixture of all the leading models. I'm not concerned about new legislation hindering open source AI. The US is far too afraid of China taking the lead on tech to introduce legislation that hinders AI development in any meaningful way. AI deployment, maybe, but AI development, no. I would guess that any Five Eyes country will be the same. The EU is going to get screwed on legislation as usual. That sucks, but as far as I know, the majority of Western open source AI enthusiasm is in the US and UK, and EU legislation is largely irrelevant in this. (Sorry if you're in the EU. If you do get screwed on legislation, maybe some of us can help proxy your work.) At least in the US, it's more likely that people will use current legislation against open source AI development, but I think even that is unlikely to succeed at scale. As far as taking advantage of open source code goes, it looks like Microsoft will be forced to take lead on the defense thanks to GitHub and Copilot, and they are very familiar with large legal battles around software. As far as making sure AI has access to copyrighted data, Google has an enormous stake in this, and they have won at least one related battle (Authors Guild, Inc. v. Google) in the US Supreme Court with a ruling that's definitely broad enough to cover AI use cases. As far as open source development goes, open source code and published research papers fall under the First Amendment, and this has been tested at the federal level even for something as extreme as cryptography (Bernstein v. Department of State). These cases can be overturned by the Supreme Court, but public opinion does not sway the current Supreme Court, as has been demonstrated recently with Row v. Wade. From the defense to the precedents to the judges, everything seems to work in favor of open source software in the US.

Chobitsu Board owner 10/21/2022 (Fri) 09:35:49 No.17559

>>17557 >This post is off topic. It certainly is, but one of rather high-quality. Thanks, actually. >Feel free to delete it or move it if you don't want it here. I may move the conversation to /meta or the news thread. As to your post, you're simply replying in-kind to the already-offtopic-poster who seems to do this frequently, but as he is an outsider and newcomer by his own admission no surprises tbh, you can get a hallpass this time Anon. :^)

Robowaifu Technician 12/12/2022 (Mon) 08:29:40 No.18163

There's now a PyTorch framework that makes parameter-efficient finetuning super simple and painless to do for any model: https://github.com/thunlp/OpenDelta Also a paper they published earlier this year studying parameter-efficient finetuning in depth: https://arxiv.org/abs/2203.06904 I'm using it at the moment for finetuning CLIP and will report back with results when done. My goal is to make it possible to finetune Stable Diffusion models with only 4GB by finetuning an image encoder to the frozen text encoder, then finetuning the text encoder with the frozen image encoder. LAION reported that it seems using a very large batch size (up to 159k) can help reach even higher performance, which is fertile ground for using LAMB. I've been thinking since optimizer updates will take a long time to happen at that scale, or even at 32k that OpenAI used, it will give the master node time to validate if gradients sent in actually improve the model. This will lower the trust required in a training group since someone sending erroneous or malicious gradients will only be taking up bandwidth and can be b& if necessary. If this works then it could begin a culture for distributed training groups by riding on Stable Diffusion's popularity. Then next year when open-source RLHF takes off after ChatGPT goes behind a paywall and people want to implement their own assistants, they will have the necessary tools to collaborate together on language models, voice models, and whatever else they see fit.

Chobitsu 12/12/2022 (Mon) 13:28:54 No.18166

>>18163 This sounds very exciting Anon. If we can truly solve this in an effective and manageable way using only open source tools, I'd suggest this will be a big breakthrough for the world generally. Certainly it could make the Robowaifu@home dream a reality. Godspeed.

Robowaifu Technician 12/14/2022 (Wed) 03:57:53 No.18211

One general question about this whole idea here: Will it be possible for people to rent a GPU online somewhere and contribute that way? My thinking is, that some people might have a little bit of money to spend but don't want to use their computer at home. Ideally we could get to a point where people with some crypto currency would pay other people for renting a GPU in a data center somewhere, and then wire this GPU into the robowaifu training cluster.

Chobitsu 12/14/2022 (Wed) 04:14:02 No.18213

>>18211 I don't see why not, Anon. It's pretty common today to rent compute on the cloud. There are yuge evil vendors, and literally thousands of smaller vendors doing effectively this. GPU cycles may be a rather limited niche for the smaller guys in general. Crypto payment isn't something I can speak to one way or other though.

Chobitsu 12/14/2022 (Wed) 04:36:36 No.18214

>>18211 One other thing I'd add: in the vidya & film industries it's not uncommon for animation studios to rent/least 'wee clusters on wheels' for a production's render post period. They're expensive, but you can throw a lot of compute around pretty quickly if you're in the right locations. They deliver right to your door! :^)

Robowaifu Technician 12/14/2022 (Wed) 07:12:56 No.18219

>>18211 Yeah, something else I've thought of is creating a system where people can earn Monero by training, but it needs a lot of thought yet to make it viable. I doubt it would be worthwhile to train on GPU rentals then since it would cost more than paying people directly for their GPUs. Vast.ai for example takes a 25% cut of all profit. If people want to contribute to a project but don't have compute, it would make more sense to donate straight to the project. A training group creator would put up some crypto and automatically pay people by how much their contributions improved the model on a hidden test set. Gradient descent is really noisy though and it might be too costly to evaluate a large enough batch size to get a good measurement, but I think it might work out just by taking an EMA similar to the optimizer's beta1 parameter to smooth out the noise. In the future I imagine the end-user program automatically connecting to the highest paying project or whatever preferences set by the user, so they can just install it, forget it and get paid periodically. The whole thing would be decentralized with trackers people can announce their training projects on, kind of like torrents. Realizing this will be difficult though because custom models need custom code to run, which a malicious actor could use to compromise people's systems, but I'm confident there's a way to parse most models with ast into blueprints that can be used to safely construct models on other people's computers from within the program without having to download any code.

Robowaifu Technician 12/14/2022 (Wed) 07:26:28 No.18220

>>18219 These are all very fascinating ideas, thanks.

Chobitsu 12/14/2022 (Wed) 12:27:03 No.18225

>>18219 >EMA Mind explaining this in a bit more detail for us, Anon? Is this anything similar to finance's use of the calculation? >In the future I imagine the end-user program automatically connecting to the highest paying project or whatever preferences set by the user, so they can just install it, forget it and get paid periodically. That would be pretty remarkable. Trust is key here, and for most men that will translate directly into: >Does my open-source (robo)waifu perform better now? As you're likely well-aware, this is an uphill struggle given the Globohomo & other's instant gratification dogma. >but I'm confident there's a way to parse most models with ast into blueprints that can be used to safely construct models on other people's computers from within the program without having to download any code. I'll shortly be working on some minor ASTs with C++ that at the very least should be smoking fast if nothing else.

Robowaifu Technician 12/15/2022 (Thu) 04:44:34 No.18243

It looks like bandwidth and parameter-efficient finetuning isn't needed after all. >lo-fi: distributed fine-tuning without communication https://arxiv.org/abs/2210.11948 >When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost. Nodes can finetune from a common checkpoint independently without any communication and merge their weights at the end of training. This method was used to finetune GPT-JT (>>18241) And very, very related: >Robust fine-tuning of zero-shot models https://github.com/mlfoundations/wise-ft >Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning approaches substantially improve accuracy in-distribution, they often reduce out-of-distribution robustness. We address this tension by introducing a simple and effective method for improving robustness: ensembling the weights of the zero-shot and fine-tuned models (WiSE-FT). Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements out-of-distribution, while preserving high in-distribution accuracy. On ImageNet (in-distribution) and five derived distribution shifts, WiSE-FT improves out-of-distribution accuracy by 4 to 6 percentage points (pp) over prior work while increasing in-distribution accuracy by 1.6 pp. WiSE-FT achieves similarly large robustness improvements (2 to 23 pp) on a diverse set of six further distribution shifts, and in-distribution accuracy gains of 0.8 to 3.3 pp compared to standard fine-tuning on seven commonly used transfer learning datasets. These improvements come at no additional computational cost during fine-tuning or inference. Basically, interpolating a finetuned CLIP checkpoint with its starting checkpoint improves its performance in-distribution and out-of-distribution. A similar phenomenon is seen in lo-fi where the low-performing merged models outpace the baseline when merged together, although it performed slightly worse on language modelling. I believe this is due to how weights get updated in attention layers. I haven't investigated this thoroughly yet but a single backward pass is usually enough for a transformer to remember something and then those weights are rarely ever touched again. I've inspected text encoders of various Stable Diffusion models and often 70-95% of the parameters are exactly the same. So what nodes learn from mutually exclusive data will effectively be washed out when merged together. Something I've been experimenting with in merging Stable Diffusion models is only merging the weights that are significantly different and smoothly blending them in with a sigmoid function on their standard deviation from the primary model. So secondary model parameters close to the primary model have an alpha of 0 and ones 4 standard deviations away have an alpha of 1. Usually merging too many models causes detail loss but my new merging method preserves them and the significant features of other models mixed in. I think a similar method is worth exploring in transformers and may push lo-fi past the baseline on language modeling. >>18225 >Mind explaining this in a bit more detail for us, Anon? Is this anything similar to finance's use of the calculation? Yup, an exponential moving average. In PyTorch it would be done something like:

def ema_update(target, value, beta):
    target.data *= beta
    target.data += (1 - beta) * value.data

Chobitsu 12/15/2022 (Thu) 06:02:56 No.18245

>>18243 >It looks like bandwidth and parameter-efficient finetuning isn't needed after all. Big if true. This will be amazing if it turns out effective. Surely thousands of groups will quickly rebel against """OpenAI""" and their ilk and start a true AI@home ecosystem? If that happens then we are surely golden with Robowaifu@home I'd guess? See any real roadblocks to my simple prognostication Anon? Thanks for the explanation of EMA, BTW. :^) >=== -add 'EMA' cmnt

Edited last time by Chobitsu on 12/15/2022 (Thu) 06:04:05.

Robowaifu Technician 12/16/2022 (Fri) 04:46:01 No.18254

>>18245 Robowaifu training groups will probably be a small niche relative to everything else going on due to the lack of immediate returns. Hell, I want robowaifus more than anything but I'm working on Stable Diffusion stuff now because money. It's hard to foresee how things will play out because AI is accelerating everything so fast. One thing for certain is robowaifus will have to become competitive with other projects to remain relevant. Training groups will need to have a blue ocean strategy for their models, having a feature or capabilities that no other model can do. For example, combining ChatGPT with Stable Diffusion so you can iteratively generate images and make edits to them in natural language would draw tons of attention because there would be no competitors, and people would find new uses for it you didn't intend like generating choose-your-own-adventure visual novels or getting commentary on the differences between two given pictures, such as feedback to an artist on their sketch with a reference image. All kinds of projects are going to pop up. Some might start one for a model that let's people design and generate unique voices. Some might work on game AI. Some might want something that helps with video editing. Essentially people will be training new components for more and more complex systems of whatever tickles their fancy. And when training groups miss subsets of data other groups will pop up to fill the gap. I imagine many people will need to see results to stick around. If a week goes by and a model has only improved 0.1% and they see some new interesting project they'll jump ship and contribute to whatever is trending. Active development and regular updates will be crucial to maintaining contributed computing power. There will definitely be a social aspect to it too since people will be working together on common interests. Collaborating with Youtube creators will likely be a good way to boost projects. The most crucial thing though will be being the first to create an easy-to-use program for doing federated learning, kind of like how Automatic1111 became the defacto web UI for Stable Diffusion and attracted other devs to contribute to it. By pioneering the right tools people will come.

Chobitsu 12/16/2022 (Fri) 05:19:54 No.18256

>>18254 Thanks Anon, you given us something to chew on. >blue ocean strategy I would argue that the entire specific-paradigm we've adopeted here on /robowaifu/ -- something that's never been done before in human history, something that literally millions of men would instantly want themselves the moment they see it, and finally something done very inexpensively such that literally any motivated individual can build one in a few months time as a hobby endeavor -- easily qualifies us as our own blue-ocean strategy. This is something truly revolutionary here. If we pull it off right it will change human history.

Robowaifu Technician 12/17/2022 (Sat) 16:51:05 No.18287

Well fuck, someone already made a LoRA finetuner for Stable Diffusion so the entire network can be trained on only 6 GB with 6 MB of parameters: https://github.com/cloneofsimo/lora People are reporting similar things papers have, such as getting better results than a full finetune in Dreambooth on small datasets and being able to use much higher learning rates. I got an idea though on how to create a completely decentralized way of training models where people can share parameters with similar projects in a way that benefits each other while training completely separate models. Getting ahead of the curve is now or never.

Robowaifu Technician 12/17/2022 (Sat) 16:53:12 No.18288

>>18256 It's not just about doing something new though. An essential part of a blue ocean strategy is creating new demand by delivering value to people. At the moment all we're doing is research and creating things only people familiar with the tech can replicate or use. Being able to rapidly turn ideas into MVPs people want to use will be the most important skill to have in the coming years.

Chobitsu 12/17/2022 (Sat) 19:43:20 No.18290

>>18287 This is a good thing. >>18288 No debate. But the simple fact we're at a watershed moment in human history. It behooves all of us to grab it with both hands. The consequences will nevar be the same! :^)

Robowaifu Technician 12/28/2022 (Wed) 06:26:22 No.18480

Has anyone here experience with DC++ ? May be a way to transfer work amongst contributors. https://sourceforge.net/projects/dcplusplus/

Robowaifu Technician 01/06/2023 (Fri) 08:40:48 No.18596

New optimizer that outperforms LAMB and works well with similarly large batch sizes across a wide variety of models and tasks: https://github.com/lucidrains/Adan-pytorch I haven't tried it out yet but it looks promising if true, achieving similar results with half the training.

Chobitsu 01/06/2023 (Fri) 09:05:22 No.18597

>>18596 Presuming you are suggesting this will assist with the effort at distributed training via the proposed Robowaifu@home, et al, would you clarify your ideas about the way that would work, Anon? TIA.

meta ronin 01/07/2023 (Sat) 08:18:03 No.18608

e/acc may be our allies in this I'll give more details as necessary but their goals are in direct alignment with ours though they aren't about the waifu angle.

Robowaifu Technician 01/07/2023 (Sat) 09:09:47 No.18609

>>18597 Yeah, by increasing the batch size training nodes can process more data before having to send any updates, which are costly especially finetuning a full model, but more importantly they report getting the same results in half the time and ultimately better test loss. From my own playing around merging model weights together it doesn't seem like updates are even required beyond the end of training. Merged weights almost always perform better at each respective task or better at one and almost equal on the other. Some experimenting will have to be done to see how well Alan optimized models merge together. >>18608 Once you have an AI assistant it's impossible to go back. I'm sure they will get hooked. Also, what I was saying about model merging. Other groups can finetune on whatever they fancy and /robowaifu/ can finetune on waifu stuff and we can merge that work together, so long as we both start from the same base model.

Chobitsu 01/07/2023 (Sat) 13:42:24 No.18610

>>18608 Likely true. Seemingly ironically, I also care about the women being abused by the lies they've swalloped from the Globohomo. Second to my concerns about the abused men, ofc. Once the Robowaifu Age begins, there will be a rapid decline of the power in feminism (after a highly-tumultuous period), and a return to healthier, more trad lifestyles for them will ensue. <Win-win-lose. >Men-Women-Gl*bohomo >>18609 >Merged weights almost always perform better at each respective task or better at one and almost equal on the other That sounds like a natural benefit directly in-line with our goals ITT then?

Robowaifu Technician 01/09/2023 (Mon) 10:25:38 No.18617

An older paper that is relevant to training on a budget: https://arxiv.org/abs/2001.04063 Rather than just predicting the next token, they predict the next n tokens. They reported achieving state of the art using ~1/4 of the training epochs over the data with each extra token prediction costing about +15% in training time. I think this paper is even more relevant today because multilingual tokenizers like OPT don't tokenize words but rather pieces of words. For example 'robowaifu' is [1001, 14271, 102, 1594, 257]. Text generation often gets mixed up and uses 'robot waifu' and 'robot wife' randomly with no consistency. Greedy search can improve this but I think there's something fundamentally wrong with the probability distribution that's spitting out nonsense like Markov chains because it's myopic on the next token. I'm going to try predicting 2 tokens on my next finetune, then 3 and 5 and see if it improves consistency. Something I also want to investigate is predicting the embedding of the next sentence. I think this could solve a lot of the issues with using sentence embeddings for external memory too. >>18610 Yeah, it's getting a lot easier to leverage other people's work. For example you can combine powerful new models like GPT-JT-6B with GPT-4chan and get the best of both. Which reminds me there is a new project for running (and even finetuning) 176B parameter models from home by pooling resources with people online: https://github.com/bigscience-workshop/petals >Inference runs at ≈ 1 sec per step (token) — 10x faster than possible with offloading, enough for chatbots and other interactive apps. Not useful for real-time applications but people could use this to generate high-quality training data for smaller models to learn from.

Chobitsu 01/09/2023 (Mon) 11:20:39 No.18619

>>18617 Neat! It certainly seems to be following an architectural model that lines up pretty closely with the ones discussed ITT? >Not useful for real-time applications but people could use this to generate high-quality training data for smaller models to learn from. That would be a serious benefit if we can use this data to train smaller systems to run on mobile-suited hardware! BTW question: their (BigScience) loicense wouldn't be suited to anything that any sane individual might say or do. [1] AFAICT, the only """approved""" uses would be (to put it in Current Year vernacular) 'Triple the pozz, and quadruple-down with the pronouns'. Any chance their work (or at least their approach) could be reasonably-mimiced in such a way to avoid such evils? Thanks Anon, encouraging stuff. Godspeed with your efforts! :^) 1. https://huggingface.co/spaces/bigscience/license (atch. A) >=== -minor grmr, prose edit

Edited last time by Chobitsu on 01/09/2023 (Mon) 12:14:32.

Robowaifu Technician 01/23/2023 (Mon) 07:20:12 No.18968

>Apache Beam is a unified programming model that provides an easy way to implement batch and streaming data processing jobs and run them on any execution engine using a set of different IOs. Anyone know anything about this? Could this be used for this project ITT? https ://dzone.com/articles/how-to-develop-a-data-processing-job-using-apache

Robowaifu Technician 01/24/2023 (Tue) 10:51:41 No.19000

Would robowaifu@home work out the same way other @homes do? I thought you couldn't train NNs on distributed systems.

Chobitsu 01/24/2023 (Tue) 11:59:52 No.19002

>>19000 >digits Yes, that's the plan, and yes, it's a difficult proposition. I think if you scour this thread (and partially in other AI-oriented ones) you'll get some idea of the most promising tacks on the table ATM.

Robowaifu Technician 02/08/2023 (Wed) 21:21:53 No.19669

>>9029 >>8995 Regarding the need for broad hardware/device support: I have recently been made aware of a JS backend for TensorFlow. [ https://www.tensorflow.org/js/guide/platform_environment ] It uses GPU acceleration by way of WebGL shaders, so in theory any device running a WebGL-enabled browser should be able to take advantage of that to squeeze out some extra compute. Browser is a pretty good LCD as far as availability, though I can't say how feasible it would be to integrate into distributed learning. Even so, I'm checking it out for possible local use since my machine has neither CUDA nor ROCm available as acceleration options. >=== -fix hotlink

Edited last time by Chobitsu on 02/08/2023 (Wed) 23:51:04.

Chobitsu 02/08/2023 (Wed) 23:32:17 No.19672

>>19669 Interesting, Anon. Please let us know what you discover.

Robowaifu Technician 07/04/2023 (Tue) 19:16:05 No.23826

> Distributed inference via MPI https://github.com/ggerganov/llama.cpp/pull/2099 > via (>>23819) >=== -add crosslink

Edited last time by Chobitsu on 07/04/2023 (Tue) 19:20:44.

01 07/09/2023 (Sun) 23:19:50 No.23928

>>23826 there are an update from ggerganov himself : https://github.com/ggerganov/llama.cpp/pull/2099#issuecomment-1627804506 > It would be a fun thing to try and potentially achieve world-first inference of 65B model on a cluster of Raspberries

Robowaifu Technician 07/09/2023 (Sun) 23:37:37 No.23929

>>23928 >yfw it's real

01 07/10/2023 (Mon) 16:29:32 No.23933

>>23928 aand he merged it https://github.com/ggerganov/llama.cpp/pull/2099 https://twitter.com/ggerganov/status/1678438186853203974 > ggerganov approved these changes 38 minutes ago anyone who is good at c/c++ can test it right now, ofc if u have 10+ RPi's with atleast 8gb ram on them :/

Robowaifu Technician 07/10/2023 (Mon) 18:10:18 No.23935

>>23933 I mean to have a cluster of them inside my robowaifu anyway, so I'll just make a mid-range plan to begin stocking up on some. Any idea if other h/w platforms are supported by it, 01?

01 07/10/2023 (Mon) 22:11:53 No.23937

>>23935 anything that runs code and has some sort of decent CPU and plenty of RAM, with linux on top of it, also llama.cpp is perfectly optimized for arm neon, but that one is macOS thing, or not, idk so it also should be good on any capable arm hardware :/

01 07/11/2023 (Tue) 17:53:58 No.23938

>>23937 it's pretty much confirmed for now.

Chobitsu 07/13/2023 (Thu) 13:37:07 No.23948

>>23928 >>23933 >>23938 This is really nice news 01 ! That's a talented man to be sure. It's quite gratifying that his own goals for his project seem to align rather well with several of ours. Thanks Anon.

01 07/15/2023 (Sat) 21:08:16 No.23975

>>23933 >>23937 >>23938 https://github.com/ggerganov/llama.cpp/issues/2164#issuecomment-1636766922 it spews 1 token every 10 sec, for now :/

Chobitsu 07/15/2023 (Sat) 21:37:01 No.23976

>>23975 You seem disappointed 01, but I consider this a serious breakthrough. Here we are, with performant, opensource solutions providing distributed LLM inferencing, currently working across a smol collection of tiny capacity, commodity processors (with not a GPU in sight). This is the 65B model isn't it? Compare this situation to just one year ago! :^) I'd suggest we all wait till RobowaifuDev and other waifu-targeting AI researchers begin their researches utilizing these same llama.cpp mechanisms. I predict eventual impressive price/performance leaps past this one (which is already quite remarkable IMO). Patience Anon, we're all gonna get there! :^) >=== -prose edit

Edited last time by Chobitsu on 07/16/2023 (Sun) 13:18:41.

01 07/16/2023 (Sun) 20:50:23 No.23987

>>23975 hmm weird https://github.com/ggerganov/llama.cpp/issues/2209 > Not sure about 65B, but I tried a 33B model that mmaps 26GB on a Mac mini with 24GB RAM. It swapped and worked at 46 seconds per token. Then I added a second Mac mini over MPI and together they worked at 450ms per token, which is 100x faster.

Chobitsu 07/16/2023 (Sun) 22:51:59 No.23988

>>23987 AFAICT, neither Ethernet nor MPI will be particularly fast in this context. I imagine the speedup (if it is in fact accurate, and not some type of systemic measurement error) is largely due to doubling the RAM available to the combined system. >=== -sp edit

Edited last time by Chobitsu on 07/16/2023 (Sun) 22:53:00.

Torrented Models NoidoDev ##eCt7e4 07/20/2023 (Thu) 02:00:32 No.24086

>Torrented Models Not sure if this is what the last few comments about or if it was already covered, but I ran into this: Petals - https://petals.ml/ Research - https://research.yandex.com/blog/petals-decentralized-inference-and-finetuning-of-large-language-models Petals Google Colab - https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8 >This notebook will guide you through the basics of Petals — a system for inference and fine-tuning 100B+ language models without the need to have high-end GPUs. With Petals, you can join compute resources with other people over the Internet and run large language models such as LLaMA, Guanaco, or BLOOM right from your desktop computer or Google Colab. Found through: https://www.youtube.com/watch?v=8jEGVaRKmFc

Chobitsu 07/20/2023 (Thu) 15:53:20 No.24118

>>24086 Nice, thanks Anon! No, I don't recall seeing this here IIRC.

Robowaifu Technician 07/25/2023 (Tue) 05:22:02 No.24241

Framework for distributing user defined Spiking Neural Networks and other algorithms. >"distributedArchitecture is a Tiny distributed computation framework for spiking ANNs and more! distributAr's primary purpose is to distribute Spiking Artificial Neural Networks among multiple hosts or CPUs. It provides mechanisms for sharing spike times among running threads. Threads running in a single process communicate through shared memory. Threads running in differrent processes or on different machines communicate via network multicast." https://github.com/rand3289/distributAr

Robowaifu Technician 08/07/2023 (Mon) 05:46:41 No.24497

> conceivably-related question (>>23971)

Chobitsu 09/03/2023 (Sun) 04:19:19 No.25162

>>17510 sleepy/tech/ also mentioned Deepspeed, Anon. https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node

NoidoDev ##pTGTWW 11/20/2023 (Mon) 06:25:28 No.26510

>NuNet is building a globally decentralized computing framework that combines latent computing power of independently owned compute devices across the globe into a dynamic ecosystem of compute resources, individually rewarded via tokenomic ecosystem based on NuNet Utility Token (NTX). https://www.nunet.io

Chobitsu 11/21/2023 (Tue) 00:10:40 No.26522

>>26510 While the basic idea behind the claims is sound (ours is much better however), the entire thing strikes me as yet another scam. If I'm correct, then it's an effort to sweep up any unencumbered compute resources not already controlled by the GH, into their already-obscenely-large hardware stable.

NoidoDev ##pTGTWW 04/12/2024 (Fri) 18:45:03 No.30888

Related: >>30759 >I'm working on infrastructure that's friendly to distributed development of complex AI applications

Report/Delete/Moderation Forms

Delete

Password Delete only files (Removes the file reference to the posts) Delete media (Removes the saved files from the server)

Report

Reason Global