Thoughts about tenacity and deep learning

The following are quoted from a fast.ai notebook (04_mnist_basics) discussing Tenacity and Deep Learning. I want to share some reflection on this “touching” anecdote.

The story of deep learning is one of tenacity and grit by a handful of dedicated researchers. After early hopes (and hype!) neural networks went out of favor in the 1990’s and 2000’s, and just a handful of researchers kept trying to make them work well. Three of them, Yann Lecun, Yoshua Bengio, and Geoffrey Hinton, were awarded the highest honor in computer science, the Turing Award (generally considered the “Nobel Prize of computer science”), in 2018 after triumphing despite the deep skepticism and disinterest of the wider machine learning and statistics community. Geoff Hinton has told of how even academic papers showing dramatically better results than anything previously published would be rejected by top journals and conferences, just because they used a neural network. Yann Lecun’s work on convolutional neural networks, which we will study in the next section, showed that these models could read handwritten text—something that had never been achieved before. However, his breakthrough was ignored by most researchers, even as it was used commercially to read 10% of the checks in the US! In addition to these three Turing Award winners, there are many other researchers who have battled to get us to where we are today. For instance, Jurgen Schmidhuber (who many believe should have shared in the Turing Award) pioneered many important ideas, including working with his student Sepp Hochreiter on the long short-term memory (LSTM) architecture (widely used for speech recognition and other text modeling tasks, and used in the IMDb example in «chapter_intro»). Perhaps most important of all, Paul Werbos in 1974 invented back-propagation for neural networks, the technique shown in this chapter and used universally for training neural networks (Werbos 1994). His development was almost entirely ignored for decades, but today it is considered the most important foundation of modern AI. There is a lesson here for all of us! On your deep learning journey you will face many obstacles, both technical, and (even more difficult) posed by people around you who don’t believe you’ll be successful. There’s one guaranteed way to fail, and that’s to stop trying. We’ve seen that the only consistent trait amongst every fast.ai student that’s gone on to be a world-class practitioner is that they are all very tenacious.

It’s indeed a great story about tenacity and grit. I ask myself: If most of my peers were standing on the other side, could I still firmly hold on to my position and prove their validness? Unfortunately, I can’t give a postive answer. People are mostly social animal, we need to be supported, loved and believed by our peers. Otherwise, our lives are in a danger if abandoned by our “tribe”. Taking the alternative path and be the minorities needs tremendous courage and belief that the truth rests on their side. I don’t think it’s always worthing the correctness our beliefs and/or ideas and win the battle of argument every time. For example, when you and your partner have different opinions, there is no need to convince them you are the right one. If your parner is reasonable person or the issue may lead to consequences you can’t bear, then the strategies to reach the agreement are another story.

Another point is as a learner of deep learning, we should catiously pace of our learning, not too fast and not too slow. If we go through all the material too fast, we pay less attention on the key components which may backfire when we try to go deeper into the area. Building a solid foundation, especially learning stuff you have no exposure before, is a smart strategy in order to go fast in the future. If we move too slow, we may lose interests because we are making enough progress to get feedbacks. For example, if I spend a whole month figuring out every nitty gritty details of math and library details of fast.ai, I won’t feel any excitement and may eventually drop the class. The course structure is well designed by fast.ai team adopting a top-down approach to grasp the whole picture of deep learning, train some state-of-the-art models solving real life problem in lession 1 & 2 and intentionally leave out math and other coding details.