Time Travel: 2014

Chapter 347 Huge Hidden Wealth

As for why Lin Hui didn’t bring out the generative adversarial network?

Lin Hui did not want to give other outside academic staff a sense of fragmentation.

Just like Lin Hui didn't want to give gamers a sense of fragmentation when developing (transporting) games in the past.

Although Lin Hui now has a certain logical basis for the introduction of generative adversarial networks.

(Lin Hui previously worked on the generative model involved in generative text summarization, and the patent acquired from Eve Kali also involved the class discriminant model, and the composition of the generative adversarial network contains the generative model. network and discriminative network...)

But it would still not be a good idea if Lin Hui rashly creates a generative adversarial network.

After all, the application level of generative adversarial networks has little to do with the academic field of natural language processing that Lin Hui has been doing now.

Under such circumstances, why did Lin Hui inexplicably launch a model that has little to do with natural language processing?

Although there are many examples of unintentional interference in academic research, the purpose of many academic results is often misaligned when they first come out.

However, the principles that Lin Hui believed in made it impossible for Lin Hui to break the previous routine.

Whether it’s game development or academic progress, Lin Hui doesn’t want to give others a sense of fragmentation.

Moreover, it is better to click on the technology tree in order.

Although it is okay to click the technology tree out of sequence as a hang-up.

But in a pluralistic society, not following the rules often means risks.

If you mess up the technology tree, your own technology logical chain will not be formed.

Potential opponents have formed a corresponding development trajectory.

Then the scientific and technological achievements are likely to be stolen by opponents.

This is what Lin Hui doesn't want to see.

Now, according to Lin Hui, what he has to do in terms of academics is to delve deeper into natural language processing.

Deep dive into generative text summarization.

Through continuous in-depth cultivation, we have found a breakthrough point in the field of natural language processing.

In other words, it is best to light up the branch of the science and technology tree adjacent to Lin Hui who has already lit up the science and technology achievements.

(Lin Hui is not in a hurry. Even if he fails to find a suitable breaking point for a while, it actually doesn’t matter.

At least for a month, Lin Hui doesn’t need to worry too much.

After all, Lin Hui's "breakthrough progress (successful transfer)" in terms of generative text summarization can at least "get" a master's degree.

And this will take some time for Lin Hui to digest.

In fact, Lin Hui's original estimate was more optimistic.

Lin Hui originally thought that if he understood the thesis in the direction of generative text summarization, he would be able to get a PhD.

However, through his recent exchanges with Eve Carly, Lin Hui felt that he was too optimistic.

Just like Nobel Prize-level results may not actually win the Nobel Prize.

Even if what Lin Hui has tinkered with in terms of generative text summarization can be regarded as a PhD-level or even higher-level achievement in this time and space.

But it is also very difficult to get a doctoral thesis in one step.

After all, the main presentation form of Lin Hui's academic content before was centered around an algorithm patent such as generative text summarization.

In this time and space, the West tends to regard academic achievements in the form of patents as more practical, that is, engineering achievements.

It is very troublesome to obtain doctoral achievements in one step by relying solely on engineering achievements.

Although the academic benefits of generative text summarization are slightly lower than Lin Hui's expectations, it is not a big problem.

Lin Hui felt that taking too big an academic step was not entirely a good thing. )

Since the generative adversarial network will not be moved in the short term.

So isn’t the thought just now about generative adversarial networks equivalent to a waste of brain cells?

of course not.

Many times, thinking may be inspired by some careless thinking.

Thinking about the generative adversarial network, Lin Hui suddenly realized that he still had a huge invisible wealth.

That is the manually labeled data from previous lives.

Although I didn’t look too carefully at the information I brought with me in my previous life.

But it is impossible that there is no artificially labeled data.

Especially in the previous life, it was absolutely impossible for those enterprise-level hard drives to have no manually labeled data.

Even if there is no manual annotation of images, manual annotation of some texts is absolutely impossible to omit.

After all, this kind of thing is quite practical, and text annotations don't actually take up much space.

You must know that when it comes to neural network learning training or deep learning training, a large amount of manually labeled data is required when building a model.

In particular, supervised learning and semi-supervised learning require a large amount of manually labeled data.

Usually a model requires a lot of manually labeled data when constructing it.

A lot of manual annotation of data is also required during adjustment.

Take this example:

In image recognition, we often need millions of manually labeled data.

In speech recognition, we may need thousands of hours of manually annotated data.

When it comes to machine translation, tens of millions of sentence annotation data are needed.

To be honest, as a technician from a previous life a few years ago.

Previously, Lin Hui really didn't take it seriously when it came to the value of manually annotated data.

But now it seems that the value of this thing was obviously ignored by Lin Hui before.

Lin Hui remembered a set of data he saw in his previous life in 2017, which involved human translation.

The cost of a word is about 5-10 cents, and the average length of a sentence is about 30 words.

If we need to label 10 million bilingual sentence pairs, that is, we need to find experts to translate 10 million sentences. The cost of this labeling is almost 22 million US dollars.

It can be seen that the cost of data labeling is very, very high.

And this is just the data labeling cost in 2017.

Doesn't the labeling cost mean higher data labeling costs now?

You know, there is almost no emphasis on unsupervised learning.

In terms of unsupervised learning, there are almost no models that can be used.

Mainstream machine learning still relies on supervised learning and semi-supervised learning.

All supervised learning and semi-supervised learning are basically inseparable from manually labeled data.

Measured from this perspective, wouldn't the large amount of ready-made manually labeled data owned by Lin Hui be a huge invisible wealth?

In the previous life of 2017, labeling 10 million pieces of bilingual data would have cost more than 20 million US dollars.

So in 2014, when machine learning as a whole is lagging behind.

How much does it cost to label the same 10 million pieces of bilingual data?

Lin Hui felt that 10 million pieces of bilingual annotated data would cost two to three billion US dollars.

The figure of "two to three billion US dollars" seems a bit scary.

But it’s not an exaggeration.

Chapter 347/447
77.63%
Time Travel: 2014Ch.347/447 [77.63%]