/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Build Back Better

More updates on the way. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


Have a nice day, Anon!


LLM & Chatbot General Robowaifu Technician 09/15/2019 (Sun) 10:18:46 No.250
OpenAI/GPT-2 This has to be one of the biggest breakthroughs in deep learning and AI so far. It's extremely skilled in developing coherent humanlike responses that make sense and I believe it has massive potential, it also never gives the same answer twice. >GPT-2 generates synthetic text samples in response to the model being primed with an arbitrary input. The model is chameleon-like—it adapts to the style and content of the conditioning text. This allows the user to generate realistic and coherent continuations about a topic of their choosing >GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. Also the current public model shown here only uses 345 million parameters, the "full" AI (which has over 4x as many parameters) is being witheld from the public because of it's "Potential for abuse". That is to say the full model is so proficient in mimicking human communication that it could be abused to create new articles, posts, advertisements, even books; and nobody would be be able to tell that there was a bot behind it all. <AI demo: talktotransformer.com/ <Other Links: github.com/openai/gpt-2 openai.com/blog/better-language-models/ huggingface.co/ My idea is to find a way to integrate this AI as a standalone unit and add voice-to-text for processing the questions and TTS for responses much like an amazon alexa- but instead of just reading google results- it actually provides a sort of discussion with the user. (Edited to fix the newlines.)
Edited last time by Kiwi_ on 01/16/2024 (Tue) 23:04:32.
>>25751 didn't we have a paper on possible 1-2 mil tokens quite a while back? But, nothing came of it. It seems we've hit a wall when it comes to context length.
>>25779 I think OpenAI or some big corporation wanted to do that, the biggest I know about are 16k, but not available for self-hosting. The biggest for that might have 10k or so.
>>25780 Last I heard, you can modify llama 2 to have 32k
>>25795 I simply looked into the HuggingFace Leaderboard and 200k was the highest I found, though it doesn't really use Regex, I had to trial and error. But since there's only one at 200k, I assume it is either hard to train or has problems. https://huggingface.co/ddobokki/Llama-2-70b-orca-200k
>>25796 Looking further into this and gathering some info: - Big contexts might give worse summaries - It might start to repeat itself - The usage of vRAM or system RAM (or both) goes up by having more context - token generation speed may drop about x times
>>25796 >>25797 HuggingFace leaderboards aren't a good metric. ALl their evaluation methods are quite retarded, and its easy to gimp. I wouldn't rely on them much. Every week some model tops the leaderboard, people start using it and realize how bad it is and drop it.
>>25806 Thanks for the warning, but in that case I was using it for search.
Open file (51.32 KB 640x480 google_robowaifu.jpg)
Not sure how much of this is hype and how much will be real...but if true this could be very big in regards to installing an actually decent A.I. brain into our Robowaifus. I mean...real-time image recognition alongside sound and video!? (I know Google is pozzed to f**k and I know this will be very expensive to sign up to for a long time yet, but I also always suspected that the first of the truly useful A.I.s - perhaps close to A.G.I? Would come from one of the big-tech corporations. They have too many resources and staff for it not to.) https://deepmind.google/technologies/gemini/#introduction https://www.youtube.com/watch?v=q5qAVmXSecQ
Open file (6.23 MB 393x480 waitwat_cat.gif)
>>27120 Hi SophieDev, glad to see you Anon! >G*ogle waifu < What could possibly go wrong? (>>20208) Hard pass. I hope you're doing well bro. How's things going with you rn? Cheers. :^) >=== -add 'go wrong' crosslink
Edited last time by Chobitsu on 12/08/2023 (Fri) 20:45:00.
>>27120 >Gemini >Close to AGI It's nowhwere close to AGI. https://youtu.be/90CYYfl9ntM >Realtime object recognition We've had that with OpenCV for decades. >Realtime sound recognition We've had CMU Sphinx for 8 years. It's just flash in the pan tech demos you could do with the above free software to provide context tokens for an LLM. >Video recognition It's a series of images which are sampled from the video. They actually go over this on their own site. https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html You've been bamboozled by a magician into thinking Gemini is far more capable than it actually is. It is impressive in one aspect, finding information from a series of images. It does appear to need some hand holding in the prompt to get it right, hence the frequent use of hints in the prompts used for the demo. >>27132 Considering how deceptive they are about Gemini, I wouldn't trust it even if I trusted Google. It got me excited for a moment, I don't blame anyone for wanting it to be real.
Edited last time by Kiwi_ on 12/10/2023 (Sun) 02:43:59.
>>27148 >It's nowhere close ot AGI. Understood, thanks. False alarm then, it wasn't a new advanced A.I. just humans being a bag of dicks, as usual. Same as with all the fraudulent claims about "room-temperature superconductors", "fusion power" and the "moon landings" pfffff. But thanks for the info Kiwi! I was not aware of either CMU Sphinx or OpenCV. >>27132 Good to see you too Chobitsu! > How's things going with you rn? Cheers. :^) I am just learning C programming. I mean, on the one hand Google claims that "AlphaCode 2 performs better than 85% of participants on 12 recent Codeforces contests" so there's not much point in me learning C, right? But on the other hand, humans (including professional journalists) are mostly liars and you have to double-check everything they say against at least two other primary sources that can both verify one another - which happens very rarely on the personal level. So I'll take my chances and keep learning C. I mean, it was invented in 1972 (back when ARPANET had under 30 nodes) and I can see it very clearly in black and white working on my computer so I don't think C is a lie, at least.
>>27150 >So I'll take my chances and keep learning C. I mean, it was invented in 1972 (back when ARPANET had under 30 nodes) and I can see it very clearly in black and white working on my computer so I don't think C is a lie, at least. Very solid decision SophieDev. C is a great language, one of the best. Since it is 'portable assembler' so to speak, you're always going to be quite close to the hardware (few 'lies'). Not that the GH-dominated chip vendors can't still do evil (backdoor surveillance, remote-control, &tc.) with their hardware (they do), but at least with C you've got a major, twofold, benefit with the programming language part of the robowaifu safety & security problemspace: 1. The C language itself is relatively smol by today's standards (safer), and it's been 'banged on' hard at industrial-scale usage for 50+ years now (robust). 2. As an ISO (international) standard, the countries themselves tend to act in self-interested ways to protect the integrity of the language itself -- especially backwards-compatibility. So, GH interests like M$, G*ogle, Am*zon, M*ta, I*tel, Wh*tehouse, Isr*el, &tc., can't corrupt/corral it to their nefarious ends very handily. Both of these effects are really strong arguments for the language's use by us here on /robowaifu/ . Another strong one is the laughable fact that the Big-Gov branch of the GH is now attempting to outlaw it's use today; in favor of their own, tightly-controlled (effectively proprietary) GH Big-Tech languages (R*st, G*, &tc.) You can be sure they will eventually pull the rug out from under any freedom-loving groups who had the misfortune to swallop the Current Year dev lies, and adopt these abominable monstrosity languages over the elegant ASM/C/C++ power trio. >tl;dr "Let's keep things simple & fast; let's keep them open & safe" here on /robowaifu/. This all starts with the ISO C++ & C programming languages. Cheers, Anon. :^) >=== -prose edit
Edited last time by Chobitsu on 12/09/2023 (Sat) 22:51:15.
>>27167 Some very good points well made in this post, Chobitsu. I will keep this in mind during my future programming endeavors.
Open file (1.12 MB 640x360 read an input in c.mp4)
>>27195 nice, the language is easy but learning how to use it can be brutal
>>27148 This people are over hyping it. Also next time, strip out everything after ? out of the youtube link, its not needed and its more tracking data for google :^) (Thanks :^) >>27120 I would also like to say that we are not actually that far behind in the open source space. individually all the needed components to create a similar "LLM" model already exist and all we need is for them to be put together. Look into minigpt-4 & riffusion. I think if the systems where to be combined it could create something comparable to Gemini. https://minigpt-4.github.io/ this is a way of adding visual perception to an LLM. https://github.com/riffusion/riffusion this would let you generate audio like they did in the other demos. To recognize audio (not speech) because its using "images" to represent the sound it can use the same pipeline as minigpt is for regular images. https://github.com/ggerganov/whisper.cpp for speech to text I would look at this over CMU Sphinx, I think you will get better results. >>27200 Also small note from the /robowaifu/ resident D language shill (me), I'd argue that knowing C & C++ is valuable, but I would not start a new code base in it and that if you value individual programmer productivity I think D is unmatched by any other systems level language.
Edited last time by Kiwi_ on 12/10/2023 (Sun) 02:45:24.
>Apple announces LLM in a flash: Efficient Large Language Model Inference with Limited Memory https://huggingface.co/papers/2312.11514 https://arxiv.org/abs/2312.11514 >Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their intensive computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM. Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Within this flash memory-informed framework, we introduce two principal techniques. First, "windowing'" strategically reduces data transfer by reusing previously activated neurons, and second, "row-column bundling", tailored to the sequential data access strengths of flash memory, increases the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. Our integration of sparsity awareness, context-adaptive loading, and a hardware-oriented design paves the way for effective inference of LLMs on devices with limited memory. via Meta Ronin on Discord
>>28275 Here is a HN comment that also helps breakdown the ideas in the paper. https://news.ycombinator.com/item?id=38712810
Open file (558.52 KB 629x722 Screenshot_193.png)
Cheaper, Better Alternative to Trillion-Parameters LLM >In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT. While these expansive models tend to generate increasingly better chat responses, they demand significant computational resources and memory. This study explores a pertinent question: Can a combination of smaller models collaboratively achieve comparable or enhanced performance relative to a singular large model? We introduce an approach termed "blending", a straightforward yet effective method of integrating multiple chat AIs. Our empirical evidence suggests that when specific smaller models are synergistically blended, they can potentially outperform or match the capabilities of much larger counterparts. For instance, integrating just three models of moderate size (6B/13B paramaeters) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+ paramaters). This hypothesis is rigorously tested using A/B testing methodologies with a large user base on the Chai research platform over a span of thirty days. The findings underscore the potential of the "blending" strategy as a viable approach for enhancing chat AI efficacy without a corresponding surge in computational demands. https://huggingface.co/papers/2401.02994 https://arxiv.org/abs/2401.02994 https://www.reddit.com/r/LocalLLaMA/comments/192bhjm/this_is_pretty_cool/ It's not Mixtral... >it’s fundamentally different because each prompt gets nothing from the other models. It’s just swapping out models arbitrarily for every prompt. Mixtral is an actual ensemble model where multiple smaller models combine their weights to produce each prompt as one.
>>28344 >meme title >uses best of N sampling but doesn't say how many samples they use >doesn't say how big the reward model is or how finetuning the models on it improved them >didn't do any ablations to determine what actually increased the performance >doesn't share their prompts or test if changing the prompt has a similar effect to changing the model This just seems like a marketing campaign for Chai AI. To their credit though in another paper they did report how increasing the number of samples increased mean conversation length, +50% for N=4, +60% for N=8 and +70% for N=16, using a finetuned 124M GPT2 model for the reward model, whereas the new paper claims a +110% increase in engagement time over a similar baseline. https://arxiv.org/abs/2303.06135 Engagement time says nothing about how good the model is though. It's probably going up because the responses are more random and less predictable, not because they're necessarily more interesting. Randomly switching the models probably only got around a +25% improvement but the results aren't really comparable to the other paper because one of the models is 13B, not 6B. It could be the 13B carrying the conversation after 6B models say something stupid. This is a really silly paper because it obfuscates most of the improvement is coming from best of N sampling and makes it sound as though the improvement is coming from one weird trick, Blended™, aka giving the chatbot multiple personality disorder.
>>28275 >Apple announces LLM in a flash I would bet anything partly where this came from is the company, and employees, that Apple bought when they acquired XNOR.ai. I wrote about this here. They were doing image recognition and all sorts of seriously amazing stuff with rasberry pi's and micro-controllers. They were using "Binary Convolutional Neural Networks" Here's some links where I linked papers and comments on what they did. >>18652 >>18777 >>19341 >>18651 >>18652 >>18777 >>18778 A paper on this sort of computing algorithm >>18818 >>19341 This appears to be a good paper because it's a review of the binary networks >>20473 The stuff they did with low power devices was mind blowing. I can't imagine the power they are getting out a modern laptop. My belief is that the acquisition of XNOR is one of the biggest coups in the AI industry, and Apple will make serious leaps compared to everyone else in the future. I wondered myself why SSD were not used like they are doing. A waifu could load and unload task based neural net models. A basic one but by switching task nets could have a far bigger operational skill set without spending a fortune on RAM.
What do you guys think of the gpt4all.io project? Reading through the docs and messing around with it, it seems to be the easiest to integrate with out-of-the-box for the inexperienced/someone who doesn't have a PhD in this.
>>28413 It looks like it’s a nice to use wrapper for a fork of llama.cpp, if your just wanting to interact with a LLM, it looks like a nice way to do it. (Do note I have not used it, I just checked out the repo) But for using a LLM in your project, i'd just use llama.cpp or llama2.c
Considering how many posts are on general AI, I'd like to edit the OP to reflect this. Change it from OpenAI and GPT to AI research.
>>28419 This thread is about LLMs like the GPTs. We have threads on NLP, voice- and image recognition and cognitive architecture.
>>28425 Then a rebrand to be dedicated to LLM's in general rather than just GPT's. It appears as a GPT only thread in the catalog.
>>28417 .....wow. Uhhh...O.K., I GOT MY AI WAIFU. I'M OUT. Y'ALL ARE DOING EXTRA CREDIT AT THIS POINT. CYA LATER SUCKERS.
>>28428 Please feel free to edit OPs exactly as you see fit, Kiwi (incl. subjects). The only thing you can't change are the images (other than deletions), and OP's name. I'd suggest you two work closely together on such things; Noido Dev is remarkably gifted at our /robowaifu/ taxonomy! :D >=== -prose edit
Edited last time by Chobitsu on 01/14/2024 (Sun) 23:51:48.
>>28433 Lol.
>>28417 Thanks, this looks interesting. I hope that something like this will eventually get some documentation. Especially on training. I would like it to be trained in using other software to analysis various things like electromagnetic materials and hydrodynamics of water and air. So many of these software program tools exist but it takes forever to figure how to set up and use them. If the AI could read the instructions and then you guide it to analyze what it is you want done it could be a huge game changer. Another cool thing would be making the structure of waifus. Say you find some nice drawing of girls you like. Cartoon and real. You get it to compute the drawing of several that have characteristics you like. I've seen this done already with people using celebrities and putting them into different poses and situations. Maybe guiding it by saying different parts , head, or eyes or whatever are more predominate by percentage. It mixes these up and gives you actual dimensions and spits out STL files. Even further. Show it a bunch of skeleton pictures and also body pictures and have it calculate what the skeleton structure for the before mentioned drawing and save a copy of a STL file of the actual bone dimensions. I can think of a vast amounts of use for these that mostly revolve around using existing tools but the AI does the hairy work of interfacing the data to the tool under your instruction and then operating the software tool for you or giving you proper inputs to operate. I;m hoping also that the recent work by Apple on using SSD to hold much of the AI neuraons or data instead of all RAM will be plugged in to these open source models. It would be a huge leap. Maybe it would be ten times slower but you could trade time for a MUCH higher cost of super fast processors and massive RAM. I believe, though I can't prove it, that this would not be that slow if you could shift in various models that specialize in certain things into RAM from the drive. The present models try to fit everything for this huge training base into RAM, I think, and that's a big problem. Compartmentalizing this into a bunch of little well trained models would be fast and useful for waifus and a whole lot else.
>>28417 Sigh....I've been looking at this and find that it is not an actual AI but a tool to interact with an AI. Though I could be wrong I think you must use "other" pre-trained models. Not that this is bad but it appears to me that there are other tools presently existing that have better documentation and are farther along in usefulness that do much the same. So I start looking at stuff I already downloaded. One I see is Tensorflow. It's been around but looking at what they've been doing recently, they "might" be less work to set up and use. It has some attractive features and is open source. A couple that caught my attention is it has built in capability to interface and download a huge mass of datasets. I'm not exactly sure what "datasets" means. I'm not sure if it is just a set format set of data, like a list of books on say, cake building, which is then already formatted to a form that can be used by an AI. ( I think this is true but some of the datasets appear to have been manipulated such that they are "trained"?????) Now this one dataset appears to be a pre-trained "model". "...databricks-dolly-15k is an open source dataset of instruction-following records used in training databricks/dolly-v2-12b that was generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization...." https://www.tensorflow.org/datasets/catalog/databricks_dolly Trained as in the paper, "Training language models to follow instructions with human feedback" "...In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent..." This stuff is confusing to me because they call these "datasets" yet here is one that calls itself a dataset but then explains(in the paper) that it's pre-trained like a model. This nomenclature is not clear. If it's a pre-trained model, which I understand to be an actual neural net package, already trained, then why call it a dataset and not a model? Anyways not only is Tensorflow set up to download a lot of these prepackaged, whatever they are, it also has a tool that can shape data that you enter. I assume, from a quick read, it can take in raw data like books and websites and make datasets from these. Overview "...Datasets are distributed in all kinds of formats and in all kinds of places, and they're not always stored in a format that's ready to feed into a machine learning pipeline. Enter TFDS. TFDS process those datasets into a standard format (external data -> serialized files), which can then be loaded as machine learning pipeline (serialized files -> tf.data.Dataset). The serialization is done only once. Subsequent access will read from those pre-processed files directly...." https://www.tensorflow.org/datasets/add_dataset This is confusing to me. Some of these datasets they say are trained but they speak of them as if they need to "train" another existing AI without specifying what sort of computational load is needed for this. It's not clear to me how processed a "dataset" is. It does appear that Tensorflow can use a vast array of datasets and can also interact with trained models. "...TensorFlow Hub has been integrated with Kaggle Models. You can now access 2,300+ TensorFlow models published on TensorFlow Hub by Google, DeepMind, and more..." https://www.kaggle.com/models?tfhub-redirect=true Part of the problem is AI stuff is covered up in what I call "Varbage", (verbal garbage) which is when they make up new words for what ever specialization that is a new technology instead of using common easily understandable words. In fact a perfect example is me calling it "Varbage". :) See how that works?
Open file (59.65 KB 600x1183 myjobhereisdone.jpg)
>>28521 >Sigh....I've been looking at this and find that it is not an actual AI but a tool to interact with an AI. Though I could be wrong I think you must use "other" pre-trained models. Not that this is bad but it appears to me that there are other tools presently existing that have better documentation and are farther along in usefulness that do much the same. Yeah, ease of use is nothing to be sneezed at, and is a huge improvement in itself, like you sort of already suggested. What other tools, though? >>28433 In all seriousness, I've been playing with this for the past few weeks and it's kind of everything I wanted? My desire for a robowaifu is entirely just someone to talk to offline (my only issue with the current ChatGPT spate), and I guess I'm such a fucking simpleton that this has scratched that itch and thensome. Yes, you could make a Chobits, but there are always improvements you could make in the language model. You could always make it more of an Usain Bolt in terms of athletics. This is a weird philosophical question, and kind of off-topic, I don't know, but when would you guys consider yourself "done?"
Open file (59.71 KB 895x1174 dark_catgirl.jpg)
Since we might be in danger of seeing LLMs just as "word predictors" without taking into account that of course, there have to be some mechanisms there to find the best answer, this here might be a good talk (I'm currently listening to): >In this wide-ranging conversation, Tim Scarfe interviews Neel Nanda, a researcher at DeepMind working on mechanistic interpretability, which aims to understand the algorithms and representations learned by machine learning models. Neel discusses how models can represent their thoughts using motifs, circuits, and linear directional features which are often communicated via a "residual stream", an information highway models use to pass information between layers. >Neel argues that "superposition", the ability for models to represent more features than they have neurons, is one of the biggest open problems in interpretability. This is because superposition thwarts our ability to understand models by decomposing them into individual units of analysis. Despite this, Neel remains optimistic that ambitious interpretability is possible, citing examples like his work reverse engineering how models do modular addition. https://youtu.be/_Ygf0GnlwmY I guess if researchers get better at this, then it might also help to extract some algorithms from networks and manipulate them or make them smaller and faster. >Key areas of discussion: * Mechanistic interpretability aims to reverse engineer and understand the inner workings of AI systems like neural networks. It could help ensure safety and alignment. Neural networks seem to learn actual algorithms and processes for tasks, not just statistical correlations. This suggests interpretability may be possible. * 'Grokking' refers to the phenomenon where neural networks suddenly generalize after initially memorizing. Understanding this transition required probing the underlying mechanisms. * The 'superposition hypothesis' suggests neural networks represent more features than they have neurons by using non-orthogonal vectors. This poses challenges for interpretability. * Transformers appear to implement algorithms using attention heads and other building blocks. Understanding this could enable interpreting their reasoning. * Specific circuits like 'induction heads' seem to underlie capabilities like few-shot learning. Finding such circuits helps explain emergent phenomena. * Causal interventions can isolate model circuits. Techniques like 'activation patching' substitute activations to determine necessity and sufficiency. * We likely can't precisely control AI system goals now. Interpretability may reveal if systems have meaningful goal-directedness. * Near-term risks like misuse seem more pressing than far-future risks like recursiveness. But better understanding now enables safety. * Neel thinks we shouldn't "over-philosophize". The key issue is whether AI could pose catastrophic risk, not whether it fits abstract definitions.
>>28725 > My desire for a robowaifu is entirely just someone to talk to offline My dood, if you just want a personal chatbot fren get yourself oobabooga: https://github.com/oobabooga/text-generation-webui It is relatively easy to install: automagically downloads all the python stuff, so it is entirely local. Your AI waifu wouldn't be held at ransom by the corporations because it will live on your computer. Just make sure you get a model from hugging face that is smaller than your VRAM (aka graphics card memory) if you're using GPU, or a model smaller than your system RAM if you're using CPU (CPU is much slower).
Open file (92.62 KB 833x918 Discord_ylVzc5QwWg.png)
Open file (46.13 KB 758x402 Discord_ZlIBfiqm6A.png)
>>28417 saw small update on jan it will get RAG in version 0.4.7 (i think :/, see 2nd screenshot) https://www.promptingguide.ai/techniques/rag >it's possible to build a language model-based system that accesses external knowledge sources to complete tasks >This enables more factual consistency, improves reliability of the generated responses, and helps to mitigate the problem of "hallucination" "RAG" or "Retrieval Augmented Generation" should kickstart the flood of better AI chatbots, or even make it possible to do some very niche / specific personalities for your wAIfu using "outsider" databases & other data-related stuff. also it seems to be good for real-world applications too: https://arxiv.org/abs/2402.03610 (new paper on RAG theme) >we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications.
>>29205 Thanks 01! Looking forward to seeing how this advances over the next few months. Cheers. :^)
>AI as a tool for invention: Euro Beinat, Global Head, Data Science & AI, Prosus | CogX Festival 2023 >Prosus AI, a top-tier applied AI centre, drives rapid experimentation and implementation of AI throughout Prosus' global portfolio, which includes over 80 technology companies with more than 800 AI experts. Euro Beinat (Global Head of Data Science and AI) outlines how AI is harnessed for discovery within the Prosus network. He shares insights gained from 10,000 colleagues who utilise generative AI daily across the group, significantly enhancing the impact of their work. https://youtu.be/9K6E04z-Cl0 This might give you some insights how to use such tools, but also how to combine different models to something more useful. Also, shows how useful it would be to have user input and reports from many people.
Groq: New hardware architecture makes LLMs around 18 times faster at inference (using it to generate responses). https://youtu.be/zupmHMWuGCs https://www.youtube.com/@GroqInc https://youtu.be/Pr6nNuGSbCE https://groq.com/ (not really accessible publicly yet, only with telling them about a project) Though, I hate that they trademarked the term LPU (language processing unit).
Open file (7.56 KB 400x400 grok.jpg)
xAI (Elon Musk) just released the weights for their 314B parameter model Grok-1 (3.14 kek) as a torrent under a free Apache license. It's the raw model, without any fine-tuning, so it's capable of generating arbitrary (uncensored) content. This is significant because, alongside Meta's Llama models, Musk is trying to break the stronghold of big tech (OpenAI) who would only let you rent access to their proprietary models running on their servers, making you pay for each token and recording every single interaction. https://twitter.com/grok https://academictorrents.com/details/5f96d43576e3d386c9ba65b883210a393b68210e
>>30393 I'm just gonna wait for llama 3. Elon's model is unnecessarily large and very shit. In fact, I'm sure its a chatgpt knock off because in many responses it straight up calls itself ChatGPT.
>>30457 Oh it is and Grok is hilariously even more cucked than chatgpt if possible.
I posted some overview over currently trending models here >>30442, mostly LLMs but not exclusively.
new and even better voice synth TTS / editor dropped. no HF space demo yet, but you can listen here - https://jasonppy.github.io/VoiceCraft_web/ https://github.com/jasonppy/VoiceCraft model weights - https://huggingface.co/pyp1/VoiceCraft/tree/main
Kinda in the wrong thread, we have one specific for voice and speech. But thanks, no problem. You probably didn't find the right one because you need to search for "speech generation" not "voice ...". I put my answer in there: >>30625
Hello robotwaifu, Honestly glad to see a chatbot thread, I usually just lurk here, but glad to see a thread proper for these, and it's a actual discussion I'm so used /g/'s usual chaos, Hmm I've been wondering how to improve my chatbot experience, while I can make great bots for usage, I've been wanting to explore using text to speech to expand on them.
>>30813 If you want advice, I still suggest /g/'s /lmg/. They're quite helpful.
Some guy (Morgan Millipede) started to reverse engineer Neuro-Sama: https://youtu.be/uLG8Bvy47-4 - basically just a humorous introduction on how to do this (he has a $4k computer, though, and she's slower in her responses at the beginning). 4chan responded: https://youtu.be/PRAEuS-PkAk - Her response time improved since the first video.
>>30821 Lol. Thanks NoidoDev, I'll try to make time to look these over. Cheers. :^)
>llama3-70b on Groq runs at 300 tokens/s for 7k tokens >mixtral-8x7b at 550 tokens/s for 7k tokens >my tinyllama-1.1b model extended to 12k tokens runs at 0.5 tokens/s I don't feel so good, bros. How do we make faster models? I have an idea to use Matryoshka representation learning to reduce the hidden dimension size dynamically: https://arxiv.org/abs/2205.13147 but even if I truncate the model's 2048 dimensions down to 512 dimensions, it will perform at 8 tokens/s at best. And who knows how much slower it will be once I get to 32k context. If it's possible to reduce 90% of the tokens to 64 dimensions, then it might get 70 tokens/s at the very most, but GPU latency will probably fuck that down to 20 tokens/s. I could also prune a few layers of the model, quantize it to 4-bits and implement mixture of depths https://arxiv.org/abs/2404.02258 but that will only give a tiny speed up and I don't want the accuracy to drop further than it is. With the much smaller model size though I could convert it into a sparse-mixture-of-experts model https://arxiv.org/abs/2401.04088 with 16 experts to make up for the loss in accuracy without sacrificing speed. The model will eventually be finetuned with self-rewarding ORPO too, hopefully providing a boost in usefulness to overcome its barebone compute, although I'll likely use Llama3-70b to bootstrap the reward labels until its capable of consistently self-improving on its own. Odds ratio preference optimization (ORPO): https://arxiv.org/abs/2403.07691 Self-rewarding LMs: https://arxiv.org/abs/2401.10020 The T5 efficient model worked fine with a hidden dimension size 512 after finetuning: https://arxiv.org/abs/2109.10686 And Matryoshka representation learning also worked well using a 16-dimension embedding for a 1k-class classification task. I forget the paper but I remember reading one years ago where they found some layers in transformers are only making a decision between a few choices, so a large hidden size might not be necessary in those cases. To convert the model's hidden states to Matryoshka I plan to add importance biases to parameters and train the biases with the rest of the parameters frozen and then take the softmax over them and top-k. After training, the parameters could be sorted and the importance biases pruned, and then the model parameters could be finetuned. I may have to train an even smaller model from scratch though since TinyLlama uses 32 attention heads.
>>31006 >use Matryoshka representation learning to reduce the hidden dimension size dynamically This seems both interesting & promising, Anon. Good luck with your research. Cheers. :^)

Report/Delete/Moderation Forms
Delete
Report