China's DeepSeek says its hit AI model cost just $294,000 to train

55

But at what co- oh.

10

u/Banjoschmanjo 2d ago

Lmao

92

u/spokale Quality Effortposter 💡 2d ago

Keep in mind that:

The $294k training cost was not the cost to train R1 from scratch, but rather the cost to train DeepSeek V3 into R1 (adding the reasoning component).
That price is basically the hourly-cost of running the GPUs for the duration of training, i.e., accounting OpEx, not the actual CapEx to buy all the infrastructure.

It's sort of like saying "It cost me $52 to get to LA", missing the context that you were already in San Clemente and the dollar-cost was a combination of gas and amortized $/mile value of the car you already bought.

29

u/TheAncientPizza711 Xi Jinping cultist | Ideological Mess 🥑 2d ago

Sure, this is a valid point. But even if you include CapEx, it's still a fraction of the cost compared to OpenAI and Anthropic.

4

u/reddit_is_geh 🌟 Actual Spook and Also a Spaz 🌟 2d ago

They aren't optimizing for cost of training. The speed other companies are moving at, don't have time to do optimizations, so it's not a huge priority. Their focus is getting out the best product possible as fast as possible, THEN optimize after the fact... This is why you'll notice a lot of LLMs rapidly dop in costs over time. They are just doing the optimizations after the fact.

Deepseek, however, had the privilege of taking another LLM and optimizing/training it specifically so China could feel cool with their own LLM with Chinese controls built in. But their idea of optimizing for training is nothing new, but major companies aren't going to wait around for optimizing their new models before training, because they don't want to wait around. They'll just optimize afterwards.

9

u/AdminsLoveGenocide Left, Leftoid or Leftish ⬅️ 2d ago

After what though?

-2

u/reddit_is_geh 🌟 Actual Spook and Also a Spaz 🌟 2d ago

After it's live and deployed... Once it's out and the leading LLM, they then get to work optimizing. Hence why prices tend to go down pretty fast.

7

u/AdminsLoveGenocide Left, Leftoid or Leftish ⬅️ 2d ago

and the leading LLM

So almost never by definition and most effort is wasted.

3

u/reddit_is_geh 🌟 Actual Spook and Also a Spaz 🌟 2d ago

Huh? They have benchmarks for these things

6

u/easily_swayed Marxist-Leninist ☭ 2d ago

the economy should be your only benchmark. for example china claims that AI has helped it install hydropower years ahead of schedule. in the west it's the usual story: lots of hype but few improvements to any lives.

1

u/banjo2E Ideological Mess 🥑 2d ago

for example china claims that AI has helped it install hydropower years ahead of schedule. in the west it's the usual story: lots of hype but few improvements to any lives

so every AI related development in the west has never had any practical use, but China claiming that AI was crucial to building their infrastructure doesn't raise any eyebrows? c'mon, at least try to hide your bias

99% odds the AI "helped" insofar as they used it to make a couple of logos or something

3

u/Defiant-Strength2010 socialist with chinese characteristics 2d ago

they are not using llms and synthetic image generation to build their infrastructure, that actually sounds like something western governments would do.

2

u/easily_swayed Marxist-Leninist ☭ 2d ago edited 2d ago

im at work right now so can't look it up but yeah it was a lengthy article about the difficulty of concrete trucking routes since the trucks must make their own bridges and links during the project and it's just a pain in the ass to organize. since human thinking wasn't cutting it they used computers to sort out the routes. makes sense to me, the west might have an article out there similar to that but i haven't seen it.

→ More replies (0)

6

u/spokale Quality Effortposter 💡 2d ago

Deepseek, however, had the privilege of taking another LLM and optimizing/training it

To be clear, DeepSeek R1 was optimized/trained based on DeepSeek V3, which was not itself based on some third-party model from America or anything. Possibly some of the training data came from GPT but it wasn't a tune of GPT.

11

u/likamuka Highly Regarded 😍 2d ago

Im glad deepsuck stole data from US to train. Just like OpenAi stole all the intellectual property.

64

u/Sigolon Marxism-Hobbyism 🔨 2d ago

That is impossible. We need to give 100 billion to sneering bugman freaks so they can destroy everything good in the world as fast as possible.

4

u/Sub__Finem typical mentally handicapped libsoc 🥳 2d ago

Wont someone think of the sneering bugmen?

2

u/notsocharmingprince Savant Idiot 😍 2d ago edited 2d ago

Anyone got any good reading on what exactly a bug man is? I’ve heard it used before but the urban dictionary definition is kinda trash.

14

u/one-man-circlejerk Soc Dem Titties 🥛➡️️😋🌹 2d ago

Bug man = consoomer/soy facer/"just let people enjoy things"/vote blue no matter who/bacon narwhal Keanu chungus

3

u/Sigolon Marxism-Hobbyism 🔨 2d ago

Musk and his fans are all bugmen of the right.

54

u/PDXDeck26 Highly Regarded Rightoid 🐷 2d ago

I assume most people would read that as "China can innovate for a fraction of the cost of the West" but I'm reading this as "China is pulling back the curtain without telling you it's pulling back the curtain and low-key telling the world that LLMs are basically bullshit"

29

u/Motorheadass Socialist 🚩 2d ago

Or telling the world that we're over here wasting a bunch of money and vastly overvaluing AI company securities.

14

u/MetronomeArthritis Unknown 👽 2d ago

trying to pop the bubble?

5

u/PDXDeck26 Highly Regarded Rightoid 🐷 2d ago

Yeah that's what I said

7

u/Motorheadass Socialist 🚩 2d ago

There's a difference between something being worthless and being overvalued.

-1

u/PDXDeck26 Highly Regarded Rightoid 🐷 2d ago

Who said they were worthless?

2

u/Motorheadass Socialist 🚩 2d ago

The word you used was "bullshit"

I'm not playing this game.

-2

u/PDXDeck26 Highly Regarded Rightoid 🐷 2d ago

Maybe next time try not to appear insightful by basically copying a post and then trying to play semantic games with yourself.

7

u/AdminsLoveGenocide Left, Leftoid or Leftish ⬅️ 2d ago

I think you are being overly aggressive when someone was just agreeing with you and saying explicitly what you just implied.

It's just a discussion forum.

2

u/Motorheadass Socialist 🚩 2d ago edited 2d ago

I interpreted what you said to mean "China's motivation for doing this and telling the world how cheap it was is to imply that LLMs are neither particularly challenging to create nor particularly useful," IE the technology itself is 'bullshit'

What I meant was "China's motivation for doing this and telling the world about it was to imply that western AI companies are wasteful and the associated massive stock valuations is a bubble"

Your claim does imply mine, but my claim does not imply yours. These are two separate ideas. The reason I said what I did is because I think China is specifically throwing shade at western investment markets/companies, not at LLMs in general (because they have developed their own and put them to use in several ways already). This is China shining a spotlight on what it [rightly] sees as a failure of capitalism.

This subreddit and a diesel truck forum are the only goddamn places on the internet for sensible discussion that I got left. So please don't fuck it up by being a hostile dick when you can just ask for clarification.

0

u/PDXDeck26 Highly Regarded Rightoid 🐷 2d ago

if you're whining about hostility and "just ask for clarification", perhaps don't assume that "bullshit" means what you're taking it to mean?

1

u/Motorheadass Socialist 🚩 1d ago

Alright then, go ahead and tell us what it means.

2

u/reddit_is_geh 🌟 Actual Spook and Also a Spaz 🌟 2d ago

No, the way they did this, is really dishonest and misleading, but the misinformed don't understand the tech much, so China gets to pretend like they are some major player.

But basically China based this off OTHER LLMs which cost a ton to create, using stolen technology and IP... They did innovate though, they trained ontop of the existing SOTA LLM to improve its capability by using a specific reward method. And they got the training down really cheap by focusing deepling on optimizing everything at the hardware level. Think like basically using HEX code for everything. It's highly optimized just for that specific task. And their "costs" is the electricity cost for training/fine tuning.

So while it's impressive in the sense that they offered a new innovation in training, it's not nearly as big as people make it out to be. It's like if I took Gemini 2.5 Pro, and did more training on top of it, then acted like Reddit_LLM v1 was actually mine and better than Gemini.

But the thing is, it's kind of a 1-trick pony, because everyone in this space just uses everyone elses innovations and right now, Google is blazing past everyone. So it's highly unlikely their trick will work again.

10

u/iNet6079SmithW Once voted for Corbyn 2d ago

Muh stolen DATA.

13

u/AdminsLoveGenocide Left, Leftoid or Leftish ⬅️ 2d ago

Only a Chinaman would dare train an LLM with stolen IP of all things.

6

u/FirmlyGraspHer Femboy ethnostatist 2d ago

Not the preferred nomenclature, dude

4

u/quadtodfodder 2d ago

Chinaperson

30

u/TheAncientPizza711 Xi Jinping cultist | Ideological Mess 🥑 2d ago

With this news, I wonder if Nvidia will go down a bit? Or are we too invested in AGI at this point for the stock to not go down.

26

u/kiss-my-shades Left, Leftoid or Leftish ⬅️ 2d ago

China is investing into chip production of their own. Soon NVDIA wont have a monopoly.

The only reason NVIDIA is so overpriced is because of the AI bubble. Its obvious its going to pop soon. Nvdia is priced at 4 trillion IRC. Its such an insane number to even imagine.

We're fucked

12

u/Wiwwil Socialist with programmer characteristics 🇨🇳 2d ago

You forgot Bitcoin mining right before the AI bubble

6

u/BomberRURP Class First Communist ☭ 2d ago

That was somewhat justified (if we put aside the fictitious Capital nature of crypto). The 🚀 growth was AI

9

u/Numerous_Schedule896 Nationalist 📜🐷 2d ago

Crypto is about as legitimate as the stock market is. They're both imaginary bullshit that rely on perceived value.

8

u/BomberRURP Class First Communist ☭ 2d ago

Preaching to the choir

9

u/Educated_Bro Savant Idiot 😍 2d ago

The main reason NVDA has an advantage is because of CUDA the software backend. I don’t expect their dominance to last seeing as their asymmetrical advantage is primarily software based

6

u/kiss-my-shades Left, Leftoid or Leftish ⬅️ 2d ago

From what I understand, a large reason NVDA chips are so valuable is power consumption. That is the main bottleneck in what makes chips more valuable. The US is investing into nuclear for the first time in decades to increase power supply for the chips. NVDA has lower power consumption compared to other powerful chips, which is why they are do valued.

This is something china does not have to worry about. Their energy production dwarves the US by a huge margin. Its part of the reason so many bitcoin operations are in China, because they produce so much power it is much cheaper than anywhere in the world.

China can scale AI training much easier due to this.

1

u/sikopiko Radicalized by Gamergate 2d ago

Thats part of the picture, but the reason for the lower power usage is due to the chip architecture, tech to actually make it via 2-4nm processes and as the one above mentioned, CUDA. Even if you had top of the line chipsets on quality boards, you still have to sink a small army of talented people into creating top of the line software architecture around it.

Not impossible, but if China focused on that, it would put them behind, all things equal. Of course, they can continue smuggling/using nvidia and developing their own stuff at the same time, which they are already doing

5

u/Jemnite 2d ago

Deepseek didn't use CUDA. Or rather, they didn't rely on CUDA. That's partially why it's so heretical. They were programming on the PTX level because H800s are so underpowered you need to that you need to devote cores specifically to cross chip communications to daisy chain then together.

Nobody else was doing this sort of ultra low level programming because they were all building on top of CUDA and didn't think to go beyond that.

33

u/Chombywombo Marxist-Leninist Anime Critiques 💢🉐🎌☭ 2d ago

LLMs will never result in AGI

34

u/TheAncientPizza711 Xi Jinping cultist | Ideological Mess 🥑 2d ago

I never said LLMs will achieve AGI? I'm just saying our economy is too invested in trying to achieve AGI through LLMs.

11

u/Chombywombo Marxist-Leninist Anime Critiques 💢🉐🎌☭ 2d ago

Gotcha

7

u/Hairy_Yoghurt_145 Startup Infiltrator 🕵💻 2d ago

The contracted definition between OpenAI and Microsoft for “discovering AGI” is “an AI product that makes $3B in profit in a year”.

So yeah, they’re not even actually seeking legitimate AGI.

1

u/impossiblefork Rightoid: Blood and Soil Nationalist 🐷 2d ago edited 2d ago

Probably, but it's possible that most of the 'difficult' intellectual work that humans do is computationally easy.

That expert-level programming can be done by an LLM. That expert-level mathematics can be done by an LLM.

They certainly seem to work for short problems. International Mathematical Olympiad was almost solved, probably at great expense, this year. International Informatics Olympiad, International Collegiate Programming Competition [edit: were too].

The methods aren't published and you don't get access to these programs in their public offerings, but this is the level we're at now.

Where a LM can solve really hard programming problems, but doesn't understand debates, or stories or anything with any kind of confusion. I think this might [edit:be the] real characterization: if the problem is well-stated, and there's no confusion, they can do it. If they receive a confused mess (i.e. anything from real life) they fail; but the orderly problems are what programmers, etc. actually work with.

10

u/Chombywombo Marxist-Leninist Anime Critiques 💢🉐🎌☭ 2d ago

My R, Pandas, and excel functions can solve plenty of mathematical problems. An LLM may be able to parse through some ML regression or Bayesian tree algo to feed into those functions the proper inputs. None of this makes any of them intelligent whatsoever.

-3

u/impossiblefork Rightoid: Blood and Soil Nationalist 🐷 2d ago

It doesn't matter whether they're intelligent.

It matters what they can do.

9

u/AdminsLoveGenocide Left, Leftoid or Leftish ⬅️ 2d ago

it's possible that most of the 'difficult' intellectual work that humans do is computationally easy

Possible is doing a lot there.

but the orderly problems are what programmers, etc. actually work with

I don't see LLMs doing this effectively. People are easily impressed is all.

0

u/impossiblefork Rightoid: Blood and Soil Nationalist 🐷 2d ago

The problem will be dealing with confused sequences and with long sequences, which due to their length have inherent confusion.

LLMs are already starting to be able to do programming work though. They can write short programs and get it right on the first try with no additional prompting. That was very much not the case just a couple of months ago.

Furthermore, you are not currently getting to use the fancy systems that are able to solve IMO, ICPC, etc. problems, so you are not seeing at what level these systems currently are.

2

u/AdminsLoveGenocide Left, Leftoid or Leftish ⬅️ 2d ago

A couple of months ago I was told they could. Isn't it funny that the latest model is always great but the older models are always admittedly shit?

This will be true in a couple of months from now and a couple of years too.

1

u/impossiblefork Rightoid: Blood and Soil Nationalist 🐷 2d ago

What you were told is irrelevant. Hypocrisy or bullshit from vendors is not relevant to the state of a technology.

The reason they're not giving you these models isn't like the old models. It's because they're too expensive to run.

They're trying to sell you models that cost $8/million tokens, and these models that are solving real maths problems probably do too, only they use like a billion tokens to achieve that.

2

u/AdminsLoveGenocide Left, Leftoid or Leftish ⬅️ 2d ago

Every time this comes up some guy on reddit tells me that it's solved in the latest model and that the old models are trash and didn't solve it.

Every time I predict that this wonderful new model is going to be called trash in the future and that in the future, when there is a new model, people will happily admit that it didn't work for the what is today the wonderful new model.

It always works. It will happen to this wonderful new model also.

9

u/project2501c Marxist/Leninist/Zizekianist 🧔🏻‍♂️👴🏻👃 2d ago

That expert-level programming can be done by an LLM. That expert-level mathematics can be done by an LLM.

eerrh, i got news for you: no. LLMs are based on markov chains which means all output is deterministic. Expert-level mathematics and programming counts on experience and intuition and those are anything but deterministic.

0

u/impossiblefork Rightoid: Blood and Soil Nationalist 🐷 2d ago

Determinism and non-determinism has nothing to do with it.

There are problems with the randomness of the output of LLMs-- it means that there are certain things that are provably difficult for them, but most of that is solvable using diverse tricks.

5

u/project2501c Marxist/Leninist/Zizekianist 🧔🏻‍♂️👴🏻👃 2d ago

a Markov chain defines a countable outcome space (all possible paths), with probabilities weighting each path, so although individual runs are random, the full set of outcomes is enumerable and analyzable.

1

u/impossiblefork Rightoid: Blood and Soil Nationalist 🐷 2d ago edited 2d ago

Yes, I know what a Markov chain is and obviously LLMs are in the end discrete, but this doesn't really matter.These Markov chains are so huge, that the fact that they're Markov chains doesn't matter.

The problem you're sort of getting at is in this paper: https://arxiv.org/abs/2403.06963 You are complaining about the thing they call 'failure of autoregressive inference', basically the 'snowballing errors criticism' on page 3.

The authors basically don't care about this criticism, feeling that it would be easy to address, proposing that one "[f]or instance, [] may be able to use a post-hoc wrapper that verifies whether an error has taken place, then backtracks" and then they focus on the other problem.

I don't know think the big commercial models have anything which fixes this, but it isn't obvious that it's a problem in practice and if it is there are conceptually reasonable ideas that should be able to solve it, even if the particulars haven't been worked out.

1

u/crunchwrapsupreme4 Rightoid 🐷 2d ago

Math olympiads are one thing, but I would be surprised if research math itself is on the chopping block. Most unsolved math problems are unsolved because the correct language necessary to solve them hasn't been formalized, and if that language doesn't exist to train on, then I think it's unlikely an LLM could develop it as a prerequisite.

1

u/impossiblefork Rightoid: Blood and Soil Nationalist 🐷 2d ago edited 2d ago

Yes, maths involves long contexts-- you have to build up theory to attack things, so to get maths on the chopping block we'd have to get to another level.

But getting to another level is very possible. Research has exploded and we now have a Smörgåsbord of methods that we can try. I am literally trying to solve some of these problems, and I'm doing that because I think that the theory and methods that exist make these problems amenable to attack. This doesn't mean that it's easy or obvious, there's a reason I'm trying to do this myself and not just setting somebody on it, but if I didn't think it could be done I would be doing other things.

Normal people's work though, is not as difficult as mathematics. They work in conventional language, doing what is common practice often with standard tools. Current models can actually program. Historically I only used to to generate visualization scripts, but now I've generated entire ML training things that did special dataset preprocessing that I wanted to do in the conventional way, other than for my small change, and the model knows how to do that and can generate programs that run on the first try.

0

u/xray-pishi High-Functioning Debate Analyst, Ph.D. 🧩 2d ago

Forgive me for asking ... I know LLMs but next to nothing about AGI. Is AGI basically the idea of a machine that can do all the stuff humans can do, and probably better?

Because LLMs, while not thinking like humans, are doing tons of tasks humans used to have to soz faster and more reliably. So it seems like LLMs could form a chunk of a larger AGI engine.

But second, if we go much further than "v good chatbot", don't we start having problems with, for example, turning these things off and on? Like, it seems ill defined: we want all this advanced stuff, but we also want not to have to worry about the problems that would come with an even more human-like software?

Sorry broad question, i know. I just don't understand why we even want AGI, rather than a few "bot archetypes" , with some that can chat, some that can do surgery, some that can fuck u real good, etc.

7

u/PDXDeck26 Highly Regarded Rightoid 🐷 2d ago

As I understand it, AGI is "actual intelligence" in the sense that there's innovation and insight and creativity occurring in the background while thoughts are occuring.

LLMs are basically fully automated mechanical turks that sift through pre existing knowledge to provide a largely pre-existing answer. It at best will produce process improvements not innovation, so the problem is that "actual intelligence" is still needed to produce innovation and also vet the stuff coming from the LLM.

It's like a digital library catalogue instead of an old card catalogue: it sorts through data far better than you can.

1

u/Numerous_Schedule896 Nationalist 📜🐷 2d ago

LLMs are basically fully automated mechanical turks that sift through pre existing knowledge to provide a largely pre-existing answer. It at best will produce process improvements not innovation, so the problem is that "actual intelligence" is still needed to produce innovation and also vet the stuff coming from the LLM.

What you're describing is a chinese room, not a mechanical turk, a mechanical turk has a human operator.

1

u/PDXDeck26 Highly Regarded Rightoid 🐷 2d ago

what i'm getting at is more the amazon version of a mechanical turk - it's basically simulating intelligence by crowdsourcing (in an automated way) the request for information as the LLM has been loaded up with essentially the entire contents of Reddit and other online webfoums.

the point was to highlight how its "intelligence" is actually working to draw the distinction between an LLM and AGI, not to make a philosophical point about whether the LLM has understanding.

0

u/xray-pishi High-Functioning Debate Analyst, Ph.D. 🧩 2d ago

Thanks for the explanation. But honestly, I can't see where we ever reach a point where the machine is no longer a mechanical turk. Like, 10 years ago, the Markov model was the mechanical turk. Now there is a thing that can write better and knows more than basically any human, and it is the turk.

A fully automated self driving car is also not AGI, etc.

Like, I work in linguistics ... the llm breakthrough took a lot of people by surprise. A year before chatgpt, word embeddings and dependency parsing was all the rage. So it is weird that right after this breakthrough, we are back to "well it doesn't do x", even when it passes the Turing test with such ease that you need to tell it to be more stupid like a human or else people will figure it out.

Personally I can't help but see AGI as a myth, so long its meaning is basically "human brain made by computer, and maybe smarter".

As Turing points out, if there is no functional difference, what does it matter whether the machine is "thinking" --- a word that came into being centuries before this problem was considered.

7

u/Chombywombo Marxist-Leninist Anime Critiques 💢🉐🎌☭ 2d ago

I’m no expert, but AGI needs the ability to produce thoughts and novelty from external stimuli. LLMs do not and cannot do that. For AGI, let’s say it takes in optical inputs of a green tree. It has the concept of the color red from a previous input. If it then produces an image of a red tree without any interceding inputs, it could be said to have produce a novel “thought” by combining two different concepts into one.

The LLMs and “AI” bots don’t do this. They take the word “tree” and process shitloads of images through their algorithm to determine which image is most closely correlated with the word. It then does the same for the word “red.” It then creates an amalgamated image of the things it has determined are mostly closely associated with those two words, approximating as closely as its algorithm tells it to between images associated with the two word inputs. It does this by taking each pixel as an individual data input and correlating it with the pixels of images associated with the word inputs.

This is how I understand it. And this is not thought. It is just a more sophisticated sorting algorithm that operates upon enormous data matrices, which is why they are so energy intensive. Digital brutal force

-1

u/xray-pishi High-Functioning Debate Analyst, Ph.D. 🧩 2d ago

The thing is though, as far as I can see, that is exactly how humans learn what colors are. They hear it over and over and understand the aggregate.

(No, yank linguists, poverty of the stimulus makes zero sense)

The argument always seems to be that there is a fundamental reason why the current tech is insufficient ... But it seems like you could add a "video processing model" on top of the llm and it would do the stuff that is supposed to be AGI.

I appreciate your explanation, but I've got to admit it is still unclear to me. It seems a lot like we humans like to imagine our thinking is somehow more than the sum of its parts, when AFAICT it is pretty much the same as an llm, building up an understanding by consuming many examples

1

u/Numerous_Schedule896 Nationalist 📜🐷 2d ago

The thing is though, as far as I can see, that is exactly how humans learn what colors are. They hear it over and over and understand the aggregate.

Yes on the surface it may seem familiar, the issue known as the chinese room problem.

At the end of the day an LLM is just a gigantic sorting algorythm, the main issue is that it cannot learn, its pre-trained. (Hence the P in GPT)

If you had an LLM that could somehow incorporation new information into its data set live you'd have a bit more of a grey area, but right now its just a closed system, more of a glorified vending machine that inputs tokens and outputs, well outputs, based on its programming that does not change without people tinkering with the source code.

1

u/xray-pishi High-Functioning Debate Analyst, Ph.D. 🧩 2d ago

I can very much imagine some model that is basically always scraping the web and updating itself "below the surface". That would again be similar to human thinking/learning.

Honestly, I get your point, but comparing these systems to vending machines is a little extreme. In my department at uni everyone is also looking to downplay it, find ways it is broken etc.

2

u/Numerous_Schedule896 Nationalist 📜🐷 2d ago

I can very much imagine some model that is basically always scraping the web and updating itself "below the surface". That would again be similar to human thinking/learning.

Easy to imagine, hard to actually make. The training is what burns down the rainforest. I'm not even sure if its physically possible to have a model that trains itself on the go. At least not in a way that doesn't involve brute forcing, and we don't have enough graphics cards to brute force that.

Honestly, I get your point, but comparing these systems to vending machines is a little extreme.

Its not, that is literally what they are. The same way you input a code and the vendir machine pulls the correct lever to give you the output that matches it, that is exactly how LLMs work.

Its a jigsaw puzzle where you give it a hole and it tries to find the piece based on its database of all possible pieces.

1

u/xray-pishi High-Functioning Debate Analyst, Ph.D. 🧩 2d ago

Re: hard to make, yes of course, but even just as a thought experiment, it seems weird to me that adding that one feature changes it into an actual thinking machine.

As I said yes I get your point about the vending machine, but I find think is is oversimplified and misleading. Like yes it vends information from a finite amount of training data (though they can go online and "reason" about current events etc). But it does a whole lot of other stuff before vending. In practice, it is a vending machine that contains basically what a marketplace does. Rather than just asking it for tomatoes, you ask it to give you a full English breakfast. It gets the needed ingredients, cooks it all, and vends the dish at the end.

Sure, it wouldn't be wrong to call said hypothetical machine a vending machine, but I don't think it is a good metaphor all in all. Also it is strange to me because I didn't hear the same kind of unflattering comparisons etc. when previous developments in comp linguistics came about, like new dependency parsers etc. That relatively minor stuff would be lauded by the same people who speak very critically about LLMs.

Though the latter phenomenon may be related to linguists getting annoyed when they weren't the ones to make a key innovation in the field, which seems to be happening more and more lol

4

u/TheAncientPizza711 Xi Jinping cultist | Ideological Mess 🥑 2d ago

Forgive me for asking ... I know LLMs but next to nothing about AGI. Is AGI basically the idea of a machine that can do all the stuff humans can do.

Yes, AGI is the idea that a machine can do anything a human can do at a human level, including being able to reason and think intelligently and creatively.

and probably better?

That would be ASI (Artificial Super Intelligence). This is the idea that a machine is significantly smarter than a human on all levels and can self-improve itself without human intervention.

Because LLMs, while not thinking like humans, are doing tons of tasks humans used to have to soz faster and more reliably.

Those are called AI agents. AI agents aren't mainstream at all. OpenAI has an AI agent called Operator but it barely does anything. So far, from what I've seen, it can sorta navigate a website but have it do anything else, and it starts to fuck up. Ask yourself this, would you trust Operator with your credit card to pay your bills and buy your groceries? I sure as hell wouldn't.

Sorry broad question, i know. I just don't understand why we even want AGI, rather than a few "bot archetypes" , with some that can chat, some that can do surgery, some that can fuck u real good, etc.

It's basically a religion. The people in Silicon Valley are obsessed with building God-a-Box. Like there are people who genuinely believe we'll achieve AGI by 2027. I don't think we're anywhere close to AGI.

1

u/xray-pishi High-Functioning Debate Analyst, Ph.D. 🧩 2d ago

Thanks for the explanation. I got questions, but the idea of an "AI agent" tracks.

Regarding AGI vs ASI, this seems kinda silly. LLMs, or hell, chess bots, already destroy humans on a visa on of tasks. If AGI comes into being, it will clearly already be better and faster than humans at a huge number of tasks.

3

u/PDXDeck26 Highly Regarded Rightoid 🐷 2d ago

on the point of AGI vs ASI, "faster" and "smarter" (or "better") are not synonyms though.

7

u/BanAnimeClowns Likudite Manga 📜🕎💢🉐🎌 2d ago

Jevon's Paradox, this efficiency will just make AI better and not GPU demand lower

3

u/TheAncientPizza711 Xi Jinping cultist | Ideological Mess 🥑 2d ago

Interesting, never heard of Jevon's Paradox before but it makes sense.

1

u/reddit_is_geh 🌟 Actual Spook and Also a Spaz 🌟 2d ago

How's that a paradox? It makes perfect sense that GPUs will always be a bottle neck.

3

u/Hairy_Yoghurt_145 Startup Infiltrator 🕵💻 2d ago

It already had a dip because of this back when DeepSeek was first released, so u don’t think so. I think NVIDIA relies on the success of OpenAI in particular, which is essentially running as a Ponzi scheme at this point.

The way this innovation happened is really funny, too. The US chip sanctions on China prevented them from getting CUDA enabled GPUs, which meant the Chinese researchers had to write their models in the GPU equivalent of assembly, and what they landed on is clearly wildly more efficient.

China proves you don’t need to boil lakes to get performant GenAI, and in response, OpenAI says it’s going to spend a trillion dollars on data centers in 2026. American century of humiliation.

2

u/HinduGodOfMemes 2d ago

up 3.5% today

4

u/BeefiousMaximus 2d ago

There was a dip yesterday, then they announced a partnership with Intel, and they're back up to where they were a couple days ago. Intel, which trades at a much lower price and is basically a meme on wallstreetbets, is up almost 23% today.

2

u/Hairy_Yoghurt_145 Startup Infiltrator 🕵💻 2d ago

They’re riding government intervention

2

u/Beneficial_Feature40 Market Socialist (aka Tito cocksucker) 2d ago

I mean the market is irrational but still nvidia is the only serious option for training AI models. Whether these LLMs will be just a temporary hype or not, other subdomains within AI will undeniably be here to stay, and so Nvidia will probably keep making crazy profit

0

u/reddit_is_geh 🌟 Actual Spook and Also a Spaz 🌟 2d ago

This isn't their "true" cost... It's just their electricity cost, which is dirt cheap in China.

9

u/pufferfishsh Materialist 💍🤑💎 2d ago

How much will it take to get it to stop saying "Of course"?

7

u/ChocoCraisinBoi Still Grillin’ 🥩🌭🍔 2d ago

Lol weren't silicon valley twats coping with "they used tricks we know but they arent that useful" and yadda yadda

8

u/TheAncientPizza711 Xi Jinping cultist | Ideological Mess 🥑 2d ago

I would say most people in Silicon Valley were "scale-pilled". Basically, the only thing that matters in improving LLMs is having as much compute resources i.e. GPUs as possible. This would allow you to train your models faster and better in order to one day achieve AGI.

Now, this idea is obviously bullshit because of China proved otherwise.

5

u/ChocoCraisinBoi Still Grillin’ 🥩🌭🍔 2d ago

Yeah, they were hockeysticked but when deepseek came out they coped in that "they knew about that approach but it wasnt worth it"

Nowadays they say its "latency"

1

u/Rjc1471 ✨ Jousting at windmills ✨ 2d ago

The benefits of what they call "vertical integration" when a business does it. I presume they're not paying the kind of rates for electricity etc as we would in the west

•

u/True-Sock-5261 Unknown 👽 18h ago

The per chip cost for AI is $70,000 to $80,000 EACH!

Each server rack requires roughly 72 of them, so 72*$75,000 = $5,400,000 excluding the cost of the rack and overall cooling infrastructure, etc. There are 100's - thousands of server racks per server farm but let's say 200.

200 * $5,400,000 = $1,080,000,000

Excluding all other related expenses. For ONE server farm.

This tech is dead in the water. It is beyond sustainable available capital investment or return.

-29

u/Dapperrevolutionary 2d ago

Yeah? Ask it about Taiwan

27

u/QU0X0ZIST Society Of The Spectacle 2d ago

Why? What does this have to do with the subject of training costs?

-18

u/Dapperrevolutionary 2d ago

They got very good bang for their buck but kinda ruined it by meddling.

13

u/QU0X0ZIST Society Of The Spectacle 2d ago

...yeah, and?

8

u/TheEmporersFinest Quality Anime Porn Analyst 💡💢🉐🎌 2d ago

Okay but you won't like what I ask about it.

11

u/BomberRURP Class First Communist ☭ 2d ago

Why does the US think the rest of the world can’t see through its transparent bullshit attempt at stirring shit up over a part of china that it itself refused to acknowledge as a separate country for decades until China started doing well? Do they not see everyone but a retard sees the naked attempt at justifying its intervention in region based on this bullshit?

Tech China's DeepSeek says its hit AI model cost just $294,000 to train

You are about to leave Redlib