Working in a major trend – Machine Learning

Standard

I saw recently this from the recent Amazon shareholder letter.

“These big trends are not that hard to spot…We’re in the middle of an obvious one right now: machine learning & artificial intelligence” — Jeff Bezos

One of the hard parts about working professionally on these technologies. Is I take them for granted. So I consider this a post to just reflect on the improvements in image processing, computer vision, translation, natural language processing, text understanding, forecasting, risk analysis.

I’ve worked on some of these technologies, and these challenges, and continue to work on extracting information from CVs and matching candidates to the best jobs for them at Elevate Direct. When you’re in the weeds you sometimes forget what you’re working on, and that you’re part of a major trend.

As Matt Turck says

Big Data provides the pipes, and AI provides the smarts.

AI in the Enterprise (the problem)

Standard

I was recently chatting to a friend who works as a Data Science consultant in the London Area – and a topic dear to my heart came up. How to successfully do ‘AI’ (or Data Science) in the enterprise. Now I work for an Enterprise SaaS company in the recruitment space, so I’ve got a certain amount of professional interest in doing this successfully.

My aim in this essay is to outline what the problem is, and provide some solutions.

Firstly it’s worth reflecting on the changes we’ve seen in Consumer apps – Spotify, Google, Amazon, etc – all of these apps have personalised experiences which are enhanced by machine learning techniques depending on the labelled data that consumers provide.

I’ll quote what Daniel Tuckelang (formerly of Linkedin) said about the challenges of doing this in the enterprise.

First, most enterprise data still lives in silos, whereas the intelligence comes from joining across data sets. Second, the enterprise suffers from weak signals — there’s little in the way of the labels or behavioral data that consumer application developers take for granted. Third, there’s an incentive problem: everyone promotes data reuse and knowledge sharing, but most organizations don’t reward it

I’ve personally seen this when working with enterprises, and being a consultant. The data is often very noisy, and while there are techniques to overcome that such as ‘distant supervision‘ it does make things harder than say building Ad-Tech models in the consumer space or customer churn models. Where the problem is more explicitly solvable by supervised techniques.

In my experience and the experience of others. Enterprises are much more likely to try buy in off-the-shelf solutions, but (to be sweepingly general) they still don’t have the expertise to understand/validate/train the models.There are often individuals in small teams here & there who’ve self-taught or done some formal education, but they’re not supported. (My friend Martin Goodson highlights this here)  There needs to be a cultural shift. At a startup you might have a CTO who’s willing to trust a bunch of relatively young data science chaps to try figure out an ML-based solution that does something useful for the company without breaking anything. And it’s also worth highlighting that there’s a difference in risk aversion between enterprises (with established practices  etc) and the more exploratory or R and D mindset of a startup.

The somewhat more experienced of us these days tend to have a reasonable idea of what can be done, what’s feasible, and furthermore how to convince the CEO that it’s doing something useful for his valuation.

Startups are far more willing to give things a go, there’s an existential threat. And not to forget that often Venture Capitalists and the assorted machinery expect Artificial Intelligence, and this is encouraged.

Increasingly I speculate that established companies now outsource their R and D to startups, hence the recent acquisitions like the one by GE Digital.

So I see roughly speaking two solutions to this problem. Two ways to de-risk data science projects in the enterprise.

1) Build it as an internal consultancy with two goals: identifying problems which can be solved with data solutions, and exposing other departments to new cultural thinking & approaches. I know of one large retailer who implemented this by doing 13 week agile projects, they’d do a consultation, then choose one team to build a solution for.

2) Start putting staff through training schemes similar to what is offered by General Assembly (there are others), but do it whole teams at a time, the culture of code review and programmatic analysis has to come back and be implemented at work. Similarly, give the team managers additional training in agile project management etc.

The first can have varied success – you need the right problems, and the right internal customers – and the second I’ve never seen implemented.

I’d love to hear some of the solutions you have seen. I’d be glad to chat about this.

Acknowledgements: I’d like to thank the following people for their conversations: John Sandall, Martin Goodson, Eddie Bell, Ian Ozsvald, Mick Delaney and I’m sorry about anyone else I’ve forgotten.

 

The Setup

Standard

The Setup has always been one of my favorite sites on the internet. I love seeing how other people – in vastly different careers – get their work done. Though I don’t craft Chinese soliders out of cardboard or anything nearly that fascinating, I thought it would be a fun exercise to put together my own version.

Who are you, and what do you do?

I’m Peadar Coyle, and I’m a data scientist based in Luxembourg, until recently I was at Vodafone as a Quantitative Analyst in their Energy team.  As you might expect, there are many people out there with that title and many do quite different work. My career has been varied so far, but I’m predominately a type A (for insights) data-scientist which means I spend half of my time coding and prototyping models to provide insights for business stakeholders. I’m working hard on improving my development skills so that I can deliver robust, working code in production. My intellectual background is in Physics and Mathematics.

I enjoy talking (as all Irish people do 🙂 ) so I regularly share my knowledge at conferences such as PyData.

What hardware do you use?

I use (and adore) my Leuchterm notebook (8″, with dots) for taking notes during phone calls, meetings, and any other times when typing on a laptop feels out of place or unnecessary. It’s a fantastic thought-collector for all manner of doodles, brainstorms, projects, and data visualisations. In that notebook (and everywhere else, really), I’ll write with whatever is around, but my preference is for ultra fine gel ballpoints.

Until recently I was using Moleskines, but I found them a tad expensive for their quality.

 

I carry a Samsung Galaxy J everywhere for all the uses in the world (+ multi-factor auth all the things). The battery is absolutely terrible, so I always keep a portable battery in my bag. That might actually be one of the most worthwhile 25 euros I’ve ever spent.

My home machine is a MacBook Pro (Retina, 15-inch, Mid 2014) with 16GB of RAM. This is a pretty hefty machine and quite difficult to carry around, but the retina screen is awesome.

For cloud computing (that counts as hardware right?) I use EC2 and S3 on AWS. For certain problems like Kaggle or complicated problems I’ll use whatever the most powerful machine I can get my hands on 🙂

And what software?

This is where I spent most of my time. I try out lots of tools to make my work (and life) easier. For me, “easier” is always a balance between “more tools that each do one thing well” and “fewer tools that each do all sorts of things.” It’s a constant work in progress.

I’m still using OS X 10.10 (Yosemite). When it comes to my work system, I’m rarely an early adopter because new OS updates always break environments.

I probably spend 50% of my time in OS X’s Terminal. Most of that time is spent in vim. I write most things there: code (mostly Python and bash), documents (Markdown, text, and TeX), etc. The solarized (dark) theme gives nice syntax highlighting contrast, and also keeps my eyes from getting tired (this will be a recurring theme). I keep meaning to try out iTerm but haven’t gotten around to it. I spend a lot of time working on remote Linux servers, so I tend to keep it simple (and similar) on my own machine. I’ll occasionally try to learn Emacs – and then give up and go back to Vim.

I’d guess the next 45% of my time is spent in Chrome. Among all the articles I’ve opened to read (but will inevitably drop into the Pocket black hole), you’ll pretty much always find some combination of tabs open that include: all Google Apps (mail, cal, drive, and a handful of docs), StackOverflow, the Python docsGitHub, Slack, trello, twitter and often a wikipedia page or two about whatever concept or technique I’m trying to grok at the moment. I’ve recently started using Safari books which is an expensive investment but it strikes me personally as a worthwhile one.

I recommend any data geek wanting to improve their productivity learn sed, awk and also use csvkit which I couldn’t live without.

I also use a bunch of Extensions because efficiency makes me incredibly happy: JSONView & XML Tree (prettify API responses), Markdown Reader (live rendering of local .md files – usually how I write and review these posts), Pocket (save-for-later), and Tab-Snap (store giant tab list as restorable text file)..

The last 5% of my time is spent switching between a host of other apps: Wunderlust (daily note-taking and long-term reference storage), Slack (team/org communication), Gimp (for my amateur image creation needs), Slides (for important presentations, GDocs for less important ones), and Toggl (time tracking; incredibly enlightening if you’ve never tried it). I also use Jupyter a lot but recently I’ve been moving to PyCharm  since I’m trying to write less ad-hoc stuff and more python modules. Since I’m trying to learn Scala at the moment I’ve been using IntelliJ which is an awesome IDE. I honestly don’t know how anyone codes in a JVM language without a good IDE.

There are a handful of other apps that are hugely valuable and always running in the background, too:  Dropbox (for both personal syncing – Camera Uploads! – and quick file sharing), Skypef.lux(adjusts your display’s color temperature – helps reduce eye strain when working at night).

What would be your dream setup?

Although I am close to it. Some small changes would include: a not-yet-possible 13″ MacBook Air with the specs of the burly 15″ Retina MBP, a pair of those magical Bose headphones I mentioned earlier, a couple of 27″ displays, and a beautiful, automatic sit-to-stand desk would be a nice start.

Sexism in Tech conferences

Standard

Writing about sexism in tech conferences is hard. Especially as a young white male. I can only speak anecdotally – but most women in the Tech industry I speak to, talk a bit about moments of subtle sexism or sometimes out-and-out harassment. As a member of the tech community I’m completely behind any promotion of minorities in the industry, and feel that more can be done. It is interesting that most men I speak to in the industry don’t notice any problem.

Two articles spring to mind:

http://womeninastronomy.blogspot.com/2014/11/its-not-about-that-damn-shirt.html

This was written about STEM but I feel the same rules apply to the Tech community (especially since I personally straddle both communities).

It’s “not a big deal” when someone tells you he came to your talk because you’re attractive.
It’s “not a big deal” when a coworker comments on your appearance.
It’s “not a big deal” when someone makes a “joke” at work demeaning women.
It’s “not a big deal” when you are asked in a job interview if you have or are planning to have kids.
It’s “not a big deal” that you were warned about what professor to avoid basically as soon as you got to school.
It’s “not a big deal” that that same professor was untouchable by the administration because he was too famous.
It’s “not a big deal” when someone assumes you are your own secretary on the phone.
It’s “not a big deal” when someone calls you “Miss” and your male colleague “Doctor.”
It’s “not a big deal” when going to parties at a conference comes with warnings of which of your fellow scientists are dangerous.
It’s “not a big deal” when your boss, adviser, or senior colleague asks you out.
All of this stuff IS a big deal. One of the things I hear about the tech industry – partly because of the passive agression that Hackers sometimes adopt is that as a community we need to grow up and become more professional AND inclusive. I agree wholeheartedly with this and applaud the conferences that encourage more female participation and more female speakers. Diversity is a good thing and I think it makes us smarter :).
The other link I saw was http://adainitiative.org/2012/08/defcon-why-conference-harassment-matters/ about Defcon a famous security conference. I found the following paragraph to be very powerful.
When you say, “Women shouldn’t go to DEFCON if they don’t like it,” you are saying that women shouldn’t have all of the opportunities that come with attending DEFCON: jobs, education, networking, book contracts, speaking opportunities – or else should be willing to undergo sexual harassment and assault to get access to them. Is that really what you believe?
I am glad things are getting better but there are still a number of actions that we can all take. I think this is a subproblem of the larger problem that Pete Warden commented about. I consider his article to be self-recommending http://petewarden.com/2014/10/05/why-nerd-culture-must-die/
Comments are welcome. The articles I linked to, contain some excellent resources on how to enforce or come up with policies in regards harassment – which is a legal issue. Lots of us like to avoid legal issues like this – but an advantage of policies and ‘processes’ is that they are transparent and fair. Some of us consider these things to be too formal – but as I get older I see that some of these ‘formalities’ that we have in corporations and other organizations are useful and save a lot of hassle.

David MacKay interview on Climate Change

Standard

http://www.davidstrahan.com/blog/?p=1104 is a link to an interview with David MacKay one of the top civil servants on climate change, he is also the author of an excellent book.
Mathematical analysis offers a lot in understanding the challenges we face, and its great to see an eminent physicist being involved in such things.
What is very interesting is how important Nuclear Energy is in the calculations, its very difficult to reach power output without this happening.

Observations on the connectedness of our world.

Standard

Scientifically focused geeks like myself,have a tendency to speak highly of the web. We see Skype, MSN, and Facebook as great technical marvels. Yet as someone like Tim Ferris or Cal Newport observe there is a price to this connectedness.
Today for instance there was a family wedding in Ireland. I wasn’t able to attend due to examinations next week in my own studies. Yet through text messages and Skype conversations I feel like I’m half there.
Which means that concentration is difficult. Yet concentration is something I need to develop the rare and valuable skills of a Mathematician. Or whatever discipline I end up working in.
We should remember that we are fundamentally limited by the hardware of our brains. And limited by our humanity. We shouldn’t forget the effects of technology or modern day life on our cognitive load.

Mick Bremner wrote a post on this a few years ago.

Now, as time goes on and I realize that moving home every couple of years is actually taking a toll on my relationships with people that I care very much about I realize that, possibly, my writing can help the situation. I’m reluctantly realizing that I’m rarely ever going to be able to spend long afternoons chatting with my dearest friends over (good) coffee. But maybe if I keep this blog up to date then at least they might have some chance of keeping track of what’s going on with me.

My own Facebook and Twitter accounts have friends and family all around the world. I’ve friends who live in Hong Kong, Shanghai, New York, London, Adelaide and everywhere in between. And as Ben Casnocha pointed out, there is a ‘feel bad effect’ to Facebook of not-so-close-facebook-friends.
Constantly we see upbeat images, or happy occasions. Rarely the daily struggles of our existences. When we read CV’s or resumes of people in our respective fields we don’t hear about the struggles of their lives.
This is written to point out that everything I do in life seems to be an absolute struggle.

On Technology without Borders

Standard

As a naively minded Scientific type, I often make the mistake of imagining that merely developing technology is enough to better the world. As the developing world especially faces huge humanitarian challenges: for instance Malaria, Climate Change, HIV, and diseases such as TB, there are problems with the market conditions of the First World.
Peter Singer once wrote an essay titled ‘Hair Loss pills or Anti-Malaria medication’ pointing out that market conditions may lead to research and development that doesn’t lead to Utilitarian answers. In simple terms Utilitarianism speaks of the ‘greatest happiness for the greatest number of people’, and one can assume implicitly that Malaria and starvation aren’t conductive to human happiness. Except we in the West perhaps don’t know how to quite sympathise with that.
Peter Singer was hinting at the problem of ‘orphan diseases’, i.e. diseases which haven’t been adopted by the pharmaceutical industry due to a lack of financial incentives. A wonderful article in Seed Magazine which I came across today spoke of how to facilitate Biotechnology such as Synthetic Biology to help improve the human condition.
A paragraph that particularly struck me was:

Practically, the phenomenon of orphan diseases points to a broader challenge underlying all innovation and development. Many powerful new technologies migrate slowly, if at all, to developing world populations. More critically, the upstream choice of which technological advances to pursue often depends on market conditions or the wealth of different national governments, which means that the unique needs of developing world populations tend to go unaddressed or are not voiced during the early stages of a technology’s development. Thus, translating the promise of any new field of research, such as synthetic biology, into concrete benefits requires more than technology alone, especially when it comes to helping underserved populations in the developing world. It requires supportive legal, institutional, and commercial environments, and coordination among researchers to pool efforts toward solving shared problems.

We can’t just assume that innovation is the solution, in addressing large scale problems and in attempts to improve the human condition – and if one isn’t trying to do that directly or indirectly then why not? – one needs strong legal frameworks, and more than just referring to the gods of the ‘market forces’. Biobricks is certainly not a perfect solution, but it is interesting nonetheless and I’m very interested in how human resourcefulness in the developing world, in conjunction with Synthetic Biology can produce innovative solutions to some of humanities most pressing problems.