My Machine Learning Career in 5 minutes
I saw these questions in Reddit. “ What kind of side projects you did? Any Internships or other kind of opportunities,etc. ? Any postgraduate qualifications like masters that helped? What I mean to say is this:- All I want to know in short is that what helped landing your first job or internship and how difficult was it? “
About 50 Machine Learnistas (pronounced “em listas") had already posted their response to the above questions. Most of them, presumably much younger than me. Some, easily a decade and a half younger. My 15 minute response or according to Medium, a mere 5 minute read is posted here, suitably edited for adults:
This was circa 2001. I was already a Ph D — Extremal Combinatorics, Hungarian school and all that jazz. After a 3 year postdoc stint got into an interdisciplinary startup as faculty. I was still doing Combinatorial Number Theory (Combinatorial Nullstellensatz type of work), Coding Theory (Space time codes among others), Cryptograpy (Stream Ciphers) when I chanced upon a popular article explaining Turk and Pentland’s result. It was more like WTF moment than a flash of inspiration. Coding and Crypto seemed not applied enough for me at that point. I also love algebraic proofs and knew about SVD. But being more of a pure mathematician, I was not aware of Turk, Pentland, Kanade and the whole bevvy of engineering applications. Or Mumford, Gelman(s) and the Brown School of Pattern Recognition.
I got into a field that went under the name of Statistical Pattern Recognition, by supervising a couple of MS theses on the subject. During the third year of my foray, I wrote a proposal for a 5 Crore grant (Fifty million Rs? It was roughly 44 or 45 Rs to a dollar then. Go compute) which was promptly plagiarized by another agency which took home that loot. They have still not delivered anything yet. I already had good results on FRGC and other datasets during this time with my students. I felt let down, and quit about a couple or so years down the lane and started my own company in 2006. The aim was to create a Face Verification product for attendance with one of my MS students. He promptly ditched me (supposedly, family pressure) about 6 months later.
I also ran into 2008 with no cash and turned into a Statistical Pattern Recognition mercenary doing for a price ($25 to $50 per hour depending on the complexity and sometimes fixed cost ones) all sorts of projects like segmenting wedding clothes, finding defects in engine nozzles, finding tables and entities in Edgars filings, defects in packages, machine translation for European language pairs using Moses, finding and matching resumes, small vocabulary speech recognition, algorithmic trading algorithms and event triggers, OCR of HCFA forms, near duplicate detection algorithms, sentiment analysis, detecting intent, detecting deception in witness statements, topic modeling, document classification, ontological matching, measuring objects, measuring foot and shoe size, lead scoring and classification. Trust me, this is not an exhaustive list.
Most of the time, I would have to formulate the problem and the project for the client and also make it as inexpensive as possible. Think 11 years of learning ML doing these projects a few ( 2 to 4 easily on an average) per year. This does not include the amount of due diligence I had to do for projects that my clients finally dropped or could not afford to proceed. How I did what I did will take much more than 5 minutes to explain.
Now qualifications: I am a Masters in Mathematics — Trinity College, Cambridge (92–93) and a Ph D (IISc, 1997). But mostly all the maths I needed, I learnt during my B Sc which was done through correspondence course in a small university (Madurai Kamaraj University) in South India. That *is still* my main qualification. The other experiences gave me more mathematical maturity. That is all you need. Mathematical maturity. Easier written than described.
How difficult was it? This Statistical Pattern Recognition thingee has become more popularly known as ML and now evolved into DL. I wish I can convey a Mathematician’s excitement while reading Mumford’s book with Agnes on Pattern Recognition. Their draft of a few chapters was only what I could get hold of in the beginning. Or Charniak’s tiny little masterpiece which I borrowed from my Physicist colleague. Or the excitement of making LDA, Bayesian models and logit, all work on data *and* getting paid for it. Initially piffling amounts, which became more substantial but in the long run, I have paid about 128 months of salaries *on time — the last day of the month*. Something I am immensely proud of. I see startups failing left and right leaving employees in the lurch. I have never failed an employee or a client. Take that snaps, zillas, karts…
During the early years of development, when we used Weka could hardly handle more than 10K instances — mostly because RAM prices were too high and we could not afford pricier machines. I had to write projects and chase grants to get PCs that could handle data. OpenCV, I discovered more during 2006. For one Vision project, I had to work with Halcon software and also budget for it, for god’s sake. If I had this project now, I would do it in about a couple of days flat or even less, using Python + OpenCV. Even good dicom libraries were rare given the state of hardware. We did not have pandas or dask or Spark or H2O. R project was good, but it took a longer time to productionize. It never ceases to amaze me every time an one-line pandas or dask code crunches the data. For me it was not the map reduce framework, but the out of core dataframes that broke the data barrier. These libraries democratized data science long before Tensorflow which has started pilfering their achievements. As you can see, ML work was and is still exciting. But people? People were difficult. Sort of. Sometimes. People who worked under me, for me, with me. People being people and attrition being what it is in the Software industry.
No, I did not have to jump into NIPS headquarters and find neatly balanced data and throw gradient bombs at it GAN-GAN style. NIPS is about trends and fashion. We are all basking in the golden era of ML. The upside is the sheer infectious enthusiasm of the new kids in the block and the typical superficiality of most of the research directions. It is more like HAVE GPU, WILL COMPUTE. If you compute with ‘42’ backwards for sufficiently many epochs, in Tensorflow (or PyTorch) and watch the progress in Tensorboard, you may even be able to reverse engineer that ‘question-that-cannot-be-formulated’. There are only so many ways to do this. Like 2 x 3 x 7 or 6 x 7. I don’t know if Quantum computers has progressed to factorizing more than 15. We are still waiting with bated breath. When you hear that Microsoft has started storing data in DNA, you start worrying. Block chains, next DL, then Quantum, now DNA computing? Will this hype cycle ever end? What next? The rise of brick and mortar? Possible, but I digress.
For Aspiring MListas — Yes. The future is bright — the next two, three years or so at least. Or you can watch how the Big Data / Hadoop era is unfolding right before your eyes to guess DL’s future. Or read the Gartner reports with those nice quadrants. Rest assured that there will be a dying quadrant. As for aspiring DLers (pronounced “dealers”), it wont be difficult to find an internship or a job. The gradient is still with you people. As an employer, I find it more difficult to get good people in my geography. With some push and git tutorials in TF, DLers will go places.
My advice to an aspiring DLer is this. You should do a startup if you are ambitious. Definitely. The time is ripe. Especially if you are sub 30. Post 40 you will be declared brain dead by VCs and other assorted angels. But if you do, please do not make a vision app which does >90% mushroom classification, without providing a dashboard (Flink and D3 preferred) showing the number of people currently using the app with the shortest distance to the nearest ER mapped in bright red in Google maps. If space permits (in the app of course), ER phone numbers in Sans Serif 24+ font in the app would be very much appreciated. These numbers are for envious 40+ aged DL entrepreneurs who will not get second round funding for their insect classification products.
This was written circa 2017 and can be found in https://towardsdatascience.com/my-ml-career-in-15-minutes-a19096c852f9