Best case scenario - everyone will ignore it. The best way to seek the help of online community is to half-ass something, post it, and then see all of these self-righteous angry trolls coming from under their bridges to show you the true and the only way. Because people care more about boosting their virtual ego at your expense instead of actually helping do something helpful for greater good i.
If you collect such feedback, you can even find good bits in it!
Because you may ask why anyone would care if something wrong is said in the Internet? Because they are idealistic knights in shining armor? No - because of virtue signalling. You built some complicated thing. But if you deliberately make a couple mistakes - everyone will seem to care at once.
Ok, but what does it have to do with technology and Dark Forests? Of course if engineers work on a new generation of CPUs for 2 years something will happen regardless of how efficient they are. But my bet is that you do not have engineers and 2 years. You need your result yesterday. And here it gets interesting. Imagine a new technology born or reaching some form of maturity i.
Each new technology first attracts people who want to see the results of their actions. Because they need people who get the shit done without rambling for 2 hour about self-righteousness. Then public and marketing catches up, marketing BS starts flooding the information channels.
No one has an incentive to take risks, no one is wants to build something new, status quo is the new motto, everyone is happy and content. A medieval cat achieving state of the art speed record disregarding some practical issues like blowing up afterwards. It is not news that academic world or to put it in more plain terms - people writing papers scholars, corporate employees, ML competition participants, whatever are not immune to Goodhart's law. Found this gem and this gem recently on this topic. Seriously just read it. Filling the tickmarks instead of doing something useful.
Probably you are already fed up with me ranting, let me be more concrete. Dear people writing papers!
I have a bone to pick with you. To be more precise several bones:. Also sometimes it is worth running all of your experiments till the end, but usually you can see on first epochs that something does not fly. Why do you copy-paste and drag the obscure mathematical formulas from paper to paper even if your contribution has nothing to do with these formulas?
Do not give me the BS that "each paper would be self-sufficient and any newcomer would understand everything from just one paper". It does not work like this. Why not just write like "we use this loss, invented by this guy here, implemented by this guy here, and yeah, we just use this implementation for everything, and here is our Dockerfile! This is productive. Sounds not very sexy, right? Being vague on which elements are crucial to your system and which can be trimmed.
If you started with some black box framework and just slapped something on top of that - just say this.
Or do proper ablation tests. Or just show your path.
Just be frank. Do not hide behind big words. I understand that an illustration from a NAS paper is probably a bit misleading here, but it obviously shows which exploration steps have been taken.
Doing something on your laptop on MNIST is kind of cringe nowadays yeah you can fit any sufficiently large shitty network to anything given enough time and money , but at least it feels honest. DeepSpeech paper probably is the best paper to illustrate this.
They just showed how much performance depends on adding more data. Even if they cannot share this data. If what you do works only on ideal small dataset and requires 10x more compute and time - it is useless. No, you are not pushing state-of-the-art unless there is a fundamentally cool new characteristic about what you did. Ah, maybe you care about and reproducibility? But ahem, making your contribution reproducible and approachable by the public may be 10x more work. So it's better just to omit anything that would point your fellow practitioners in the right direction.
Because otherwise you are vulnerable and the Dark Forest will strike. Also do not get me started on stack and blend similar things approach. If you genuinely mix several different things, i. But just stacking similar networks is not. Hyper-params usually deserve a small section in the middle of paper. But sometimes it actually takes a lot of time to tune them just right and it even may be more important that your brand-new SOTA improvement.
So why not be more open about how you found them? How much compute it takes? Does it scale to other datasets? Are they stable? Please do something useful! Also in particular, the problems I had with STT papers were the following:. All the cool new unsupervised learning approaches wav2vec, cyclic training, etc. Looks very cool, but why show "progress" on some random small datasets? Why not just scale with speakers in LibriSpeech?
Why not extrapolate how compute efficient is your method, even by means of back of the envelope computation? There are not complicated matters, but just commons sense. But you see, saying that "we achieved almost unsupervised something, but in real life is even less practical than supervised" does not sound sexy at all;. Wav2letter paper is cool. So deceivingly simple and nice I was one of the reasons I decided to try build an STT system.
But wait, on real data it just does not fit as promised! Maybe you need to fit it for 30 days on 8 V GPUs? All papers featuring large networks, GLU activations and 0. You can fit any network on anything. They are this flexible. But you know, the time and compute taken may differ 10x. May be I am just an idiot, but we tried such things, and they worked, but even on toy data converged much slower. Given that actually one "comma" in such a fragile thing as pre-processing may be deal-breaker, lack of published code from paper authors is strange come-on, if you spent a lot of time tuning your pre-processing, and probably it is just 50 lines of code, then why just not share it?
The community will benefit and you will get the PR boost. There is a cool paper - SincNet - where the author had a cool idea and even published down-to-earth code you can test with your data - we are yet to try it. Also - as some papers have shown 1 2 3 4 - sometimes you do not even need a complicated structure and per se training for the model to work;.
It is so cool when you can just put everything in one network on one framework and it can be even agnostic to framework and just hit. It is not useful. It is just a novelty item. Maybe adding an additional LM-loss will help instead? Probably Dark Forest is a bit of exaggeration, but some behaviors were observed were kind of similar in spirit.
Usually I am not the one to brag about how many "job offers per second I have" or "how many jobs I have at the moment" or "how much money per second I make" all of this is vain and undignified or any obnoxious shit like this, but stick with me for a couple of anecdotes now. All of this happened after we published one of first more or less mature versions of our dataset. When we published our dataset on habr.
Ofc on numerous occasions people told me that what we do should not be done, because Google or because Yandex. Because Dark Forest.
But even as my personal wellness grows, I see a risk in this change. Also sometimes it is worth running all of your experiments till the end, but usually you can see on first epochs that something does not fly. The Dark Forest Theory of the Internet. Therefore, it is in every civilization's best interest to preemptively strike and destroy any developing civilization before it can become a threat, but without revealing their own location to the wider universe, thus explaining the Fermi paradox. Liu invites us to think about this a different way. There is a very nice analogy that permeates the whole trilogy in one way or another I just could not help sharing it : Like hunters in a "dark forest", a civilization can never be certain of an alien civilization's true intentions.
Yeah right, we will see when comparing benchmarks xD;. We were approached on 3 different occasions with some vague offers to join some teams, and in one case we even could collaborate on making our dataset better , which is very cool!