Hearing Aids (Or, Metaphors are Personal)

Thursday morning, I gave the opening keynote at an event about the future of commerce at the Rotman School of Management in Toronto. I shared four insights:

  • The AI instinct is to view a reasoning problem as a data problem
    • Marketing hype leads many to imagine that artificial intelligence (AI) works like human brain intelligence. Words like “cognitive” lead us to assume that computers think like we think. In fact, succeeding with supervised learning, as I explain in this article and this previous post, involves a shift in perspective to reframe a reasoning task as a data collection task.
  • Advances in deep learning are enabling radical new recommender systems
    • My former colleague Hilary Mason always cited recommender systems as a classic example of a misunderstood capability. Data scientists often consider recommenders to be a solved problem, given the widespread use of collaborative filtering, where systems infer person B’s interests based on similarity with person A’s interests. This approach, however, is often limited by the “cold start” problem: you need person A and person B to do stuff before you can infer how they are similar. Deep learning is enabling us to shift from comparing past transactional history (structured data) to comparing affinities between people and products (person A loves leopard prints, like this ridiculous Kimpton-style robe!). This doesn’t erase the cold start problem wholesale, but it opens a wide range of possibilities because taste is so hard to quantify and describe: it’s much easier to point to something you like than to articulate why you like it.
  • AI capabilities are often features, not whole products
  • AI will dampen the moral benefits of commerce if we are not careful
    • Adam Smith is largely remembered for his theories on the value of the distribution of labor and the invisible hand that guides capitalistic markets. But he also wrote a wonderful treatise on moral sentiments where he argued that commerce is a boon to civilization because it forces us to interact with strangers; when we interact with strangers, we can’t have temper tantrums like we do at home with our loved ones; and this gives us practice in regulating our emotions, which is a necessary condition of rational discourse and the compromise at the heart of teamwork and democracy. As with many of the other narcissistic inclinations of our age, the logical extreme of personalization and eCommerce is a world where we no longer need to interact with strangers, no longer need to practice the art of tempered self-interest to negotiate a bargain. Being elegantly bored at a dinner party can be a salutatory boon to happiness. David Hume knew this, and died happy; Jean-Jacques Rousseau did not, and died miserable.
bill cunningham
This post on Robo Bill Cunningham does a good job explaining how image recognition capabilities are opening new roads in commerce and fashion.

An elderly couple approached me after the talk. I felt a curious sense of comfort and familiarity. When I give talks, I scan the audience for signs of comprehension and approval, my attention gravitating towards eyes that emit kindness and engagement. On Thursday, one of those loci of approval was an elderly gentleman seated in the center about ten rows deep. He and his Russian companion had to have been in their late seventies or early eighties. I did not fear their questions. I embraced them with the openness that only exists when there is no expectation of judgment.

She got right to the point, her accent lilted and slavic. “I am old,” she said, “but I would like to understand this technology. What recommendations would you give to elderly people like myself, who grew up in a different age with different tools and different mores (she looked beautifully put together in her tweed suit), to learn about this new world?”

I told her I didn’t have a good answer. The irony is that, by asking about something I don’t normally think about, she utterly stumped me. But it didn’t hurt to admit my ignorance and need to reflect. By contrast, I’m often able to conjure some plausible response to those whose opinion I worry about most, who elicit my insecurities because my sense of self is wrapped up in their approval. The left-field questions are ultimately much more interesting.

The first thing that comes to mind if we think about how AI might impact the elderly is how new voice recognition capabilities are lowering the barrier to entry to engage with complex systems. Gerontechnology is a thing, and there are many examples of businesses working to build robots to keep the elderly company or administer remote care. My grandmother, never an early adopter, loves talking to Amazon Alexa.

But the elegant Russian woman was not interested in how the technology could help her; She wanted to understand how it works. Democratizing knowledge is harder than democratizing utility, but ultimately much more meaningful and impactful (as a U Chicago alum, I endorse a lifelong life of the mind).

Then something remarkable happened. Her gentleman friend interceded with an anecdote.

“This,” he started, referring to the hearing aid he’d removed from his ear, “is an example of artificial intelligence. You can hear from my accent that I hail from the other side of the Atlantic (his accent was upper-class British; he’d studied at Harvard). Last year, we took a trip back with the family and stayed in quintessential British town with quintessential British pubs. I was elated by the prospect of returning to the locals of my youth, of unearthing the myriad memories lodged within childhood smells and sounds and tastes. But my first visit to a pub was intolerable! My hearing aid had become thoroughly Canadian, adapted to the acoustics of airy buildings where sound is free to move amidst tall ceilings. British pubs are confined and small! They trap the noise and completely bombarded my hearing aid. But after a few days, it adjusted, as these devices are wont to do these days. And this adaptation, you see, shows how devices can be intelligent.”

Of course! A hearing aid is a wonderful example of an adaptive piece of technology, of something whose functionality changes automatically with context. His anecdote brilliantly showed how technologies are always more than the functionalities they provide, are rather opportunities to expose culture and anthropology: Toronto’s adolescence as a city indexed by its architecture, in contrast to the wizened wood of an old-world pub; the frustrating compromises of age and fragility, the nostalgic ideal clipped by the time the device required to recalibrate; the incredible detail of the personal as a theatrical device to illustrate the universal.

What’s more, the history of hearing aids does a nice job illustrating the more general history of technology in this our digital age.

Partial deafness is not a modern phenomenon. As everywhere, the tools to overcome it have changed shape over time.

Screen Shot 2017-11-19 at 11.39.29 AM
This 1967 British Pathé primer on the history of hearing aids is a total trip, featuring radical facial hair and accompanying elevator music. They pay special attention to using the environment to camouflage cumbersome hearing aid machinery.

One thing that stands out when you go down the rabbit hole of hearing aid history is the importance of design. Indeed, historical hearing aids are analogue, not digital. People used to use naturally occurring objects, like shells or horns, to make ear trumpets like the one pictured in the featured image above. Some, including 18th-century portrait painter Joshua Reynolds, did not mind exposing their physical limitations publicly. Reynolds was renowned for carrying an ear trumpet and even represented his partial deafness in self-portraits painted later in life.

reynolds_self_portrait_1775_0
Reynolds’ self-portrait as deaf (1775)

Others preferred to deflect attention from their disabilities, camouflaging their tools in the environment or even transforming them into signals of power. At the height of the Napoleonic Age, King John VI of Portugal commissioned an acoustic throne with two open lion mouths at the end of the arms. These lion mouthes became his makeshift ears, design transforming weakness into a token of strength; Visitors were required to kneel before the chair and speak directly into the animal heads.

acoustic throne
King John VI’s acoustic throne, its lion head ears requiring submission

The advent of the telephone changed hearing aid technology significantly. Since the early 20th century, they’ve gone from being electronic to transistor to digital. Following the exponential dynamics of Moore’s Law, their size has shrunk drastically: contemporary tyrants need not camouflage their weakness behind visual symbols of power. Only recently have they been able to dynamically adapt to their surroundings, as in the anecdote told by the British gentleman at my talk. Time will tell how they evolve in the near future. Awesome machine listening research in labs like those run by Juan Pablo Bello at NYU may unlock new capabilities where aids can register urban mood, communicating the semantics of a surrounding as opposed to merely modulating acoustics. Making sense of sound requires slightly different machine learning techniques than making sense of images, as Bello explores in this recent paper. In 50 years time, modern digital hearing aids may seem as eccentric as a throne with lion-mouth ears.

The world abounds in strangeness. The saddest state of affairs is one of utter familiarity, is one where the world we knew yesterday remains the world we will know tomorrow. Is the trap of the filter bubble, the closing of the mind, the resilient force of inertia and sameness. I would have never included a hearing aid in my toolbox of metaphors to help others gain an intuition of how AI works or will be impactful. For I have never lived in the world the exact same way the British gentleman has lived in the world. Let us drink from the cup of the experiences we ourselves never have. Let us embrace the questions from left field. Let each week, let each day, open our perspectives one sliver larger than the day before. Let us keep alive the temperance of commerce and the sacred conditions of curiosity.


The featured image is of Madame de Meuron, a 20th-century Swiss aristocrat and eccentric. Meuron is like the fusion of Jean des Esseintes–the protagonist of Huysman’s paradigmatic decadent novel, À Rebours, the poisonous book featured in Oscar Wilde’s Picture of Dorian Gray–and Gertrude Stein or Peggy Guggenheim. She gives life to characters in Thomas Mann novels. She is a modern day Quijote, her mores and habits out of sync with the tailwinds of modernity. Eccentricity, perhaps, the symptom of history. She viewed her deafness as an asset, not a liability, for she could control the input from her surroundings: “So ghör i nume was i wott! – So I only hear what I want to hear!”

Clinamen

The Sagrada Familia is a castle built by Australian termites.


The Sagrada Familia is not a castle built by Australian termites, and never will be. Tis utter blasphemy.


The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell, in near silence, near but for the methodical gnawing, not unlike that of a mouse nibbling rapaciously on parched pasta left uneaten all these years but preserved under the thick dust on the thin cardboard with the thin plastic window enabling her to view what remained after she’d cooked just one serving, with butter, for her son, there emerged and fell, with the sublime transience of Andy Goldsworthy, a neo-Gothic church of organic complexity on par with that imagined by Antoni Gaudí i Cornet, whose Sagrada Familia is scheduled for completion in 2026, a full century after the architect died in a tragic tram crash, distracted by the recent rapture of his prayer.


The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell a structure so eerily resemblant of the one Antoni Gaudí imagined before he died, neglected like a beggar in his shabby clothes, the doctors unaware they had the chance to save the mind that preempted the fluidity of contemporary parametric architectural design by some 80 odd years, a mind supple like that of Poincaré, singular yet part of a Zeitgeist bent on infusing time into space like sandalwood in oil, inseminating Euclid’s cold geometry with femininity and life, Einstein explaining why Mercury moves retrograde, Gaudí rendering the holy spirit palpable as movement in stone, fractals of repetition and difference giving life to inorganic matter, tension between time and space the nadir of spirituality, as Andrei Tarkovsky went on to explore in his films.

tarkovsky mirror
From Andrei Tarkovsky’s Mirror. As Tarkovsky wrote of his films in Sculpting in Time: “Just as a sculptor takes a lump of marble, and, inwardly conscious of the features of his finished piece, removes everything that is not a part of it — so the film-maker, from a ‘lump of time’ made up of an enormous, solid cluster of living facts, cuts off and discards whatever he does not need, leaving only what is to be an element of the finished film.”

The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell a structure so eerily resemblant of the one Antoni Gaudí imagined before he died, with the (seemingly crucial) difference that the termites built their temple without blueprints or plan, gnawing away the silence as a collectivity of single stochastic acts which, taken together over time, result in a creation that appears, to our meaning-making minds, to have been created by an intelligent designer, this termite Sagrada Familia a marvelous instance of what Dennett calls Darwin’s strange inversion of reasoning, an inversion that admits to the possibility that absolute ignorance can serve as master artificer, that IN ORDER TO MAKE A PERFECT AND BEAUTIFUL MACHINE, IT IS NOT REQUISITE TO KNOW HOW TO MAKE IT*, that structures might emerge from the local activity of multiple parts, amino acids folding into proteins, bees flying into swarms, bumper-to-bumper traffic suddenly flowing freely, these complex release valves seeming like magic to the linear perspective of our linear minds.


The Sagrada Familia is not a castle built by Australian termites, and yet, the eerie resemblance between the termite and the tourist Sagrada Familias serves as a wonderful example to anchor a very important cultural question as we move into an age of post-intelligent design, where the technologies we create exhibit competence without comprehension, diagnosing lungs as cancerous or declaring that individuals merit a mortgage or recommending that a young woman would be a good fit for a role on a software engineering team or getting better and better at Go by playing millions of games against itself in a schizophrenic twist resemblant of the pristine pathos of Stephan Zweig, one’s own mind an asylum of exiled excellence during the travesty of the second world war, why, we’ve come full circle and stand here at a crossroads, bidden by a force we ourselves created to accept the creative potential of Lucretius’ swerve, to kneel at the altar of randomness, to appreciate that computational power is not just about shuffling 1s and 0s with speed but shuffling them fast enough to enable a tiny swerve to result in wondrous capabilities, and to watch as, perhaps tragically, we apply a framework built for intelligent design onto a Darwinian architecture, clipping the wings of stochastic potential, working to wrangle our gnawing termites into a straight jacket of cause, while the systems beating Atari, by no act of strategic foresight but by the blunt speed of iteration, make a move so strange and so outside the realm of verisimilitude that, as Kasparov succumbing to Deep Blue, we misinterpret a bug for brilliance.


The Sagrada Familia is not a castle built by Australian termites, and yet, it seems plausible that Gaudí would have reveled in the eerie resemblance between a castle built by so many gnawing termites and the temple Josep Maria Bocabella i Verdaguer, a bookseller with a popular fundamentalist newspaper, “the kind that reminded everybody that their misery was punishment for their sins,”**commissioned him to build.

Bocabella
A portrait of Josep Maria Bocabella, who commissioned Gaudí to build the Sagrada Familia.

Or would he? Gaudí was deeply Catholic. He genuflected at the temple of nature, seeing divine inspiration in the hexagons of honeycombs, imagining the columns of the Sagrada Familia to lean, buttresses, as symbols of the divine trilogy of the father (the vertical axis), son (the horizontal axis), and holy spirit (the vertical meeting the horizontal in crux of the diagonal). His creativity, therefore, always stemmed from something more than intelligent design, stood as an act of creative prayer to render homage to God the creator by creating an edifice that transformed, in fractals of repetition in difference, inert stone into movement and life.

columns
The top of the columns inside the Sagrada Familia have twice as many lines as the roots,             the doubling generating a sense of movement and life.

The Sagrada Familia is not a castle built by Australian termites, and yet, the termite Sagrada Familia actually exists as a complete artifact, its essence revealed to the world rather than being stuck in unfinished potential. And yet, while we wait in joyful hope for its imminent completion, this unfinished, 144-year-long architectural project has already impacted so many other architects, from Frank Gehry to Zaha Hadid. This unfinished vision, this scaffold, has launched a thousand ships of beauty in so many other places, changing the skylines of Bilbao and Los Angeles and Hong Kong. Perhaps, then, the legacy of the Sagrada Family is more like that of Jodorowsky’s Dune, an unfinished film that, even from its place of stunted potential,  changed the history of cinema. Perhaps, then, the neglect the doctors showed to Gaudí, the bearded beggar distracted by his act of prayer, was one of those critical swerves in history. Perhaps, had Gaudí lived to finish his work, architects during the century wouldn’t have been as puzzled by the parametric requirements of his curves and the building wouldn’t have gained the puzzling aura it gleans to this day. Perhaps, no matter how hard we try to celebrate and accept the immense potential of stochasticity, we will always be makers of meaning, finders of cause, interpreters needing narrative to live grounded in our world. And then again, perhaps not.


The Sagrada Familia is not a castle built by Australian termites. The termites don’t care either way. They’ll still construct their own Sagrada Familia.


The Sagrada Familia is a castle built by Australian termites. How wondrous. How essential must be these shapes and forms.


The Sagrada Familia is a castle built by Australian termites. It is also an unfinished neo-Gothic church in Barcelona, Spain. Please, terrorists, please don’t destroy this temple of unfinished potential, this monad brimming the history of the world, each turn, each swerve a pivot down a different section of the encyclopedia, coming full circle in its web of knowledge, imagination, and grace.


The Sagrada Familia is a castle built by Australian termites. We’ll never know what Gaudí would have thought about the termite castle. All we have are the relics of his Poincaréan curves, and fish lamps to illuminate our future.

fish-4
Frank Gehry’s fish lamps, which carry forth the spirit of Antoni Gaudí

*Dennett reads these words, penned in 1868 by Robert Beverley MacKenzie, with pedantic panache, commenting that the capital letters were in the original.

**Much in this post was inspired by Roman Mars’ awesome 99% Invisible podcast about the Sagrada Familia, which features the quotation about Bocabella’s newspaper.

The featured image comes from Daniel Dennett’s From Bacteria to Bach and Back. I had the immense pleasure of interviewing Dan on the In Context podcast, where we discuss many of the ideas that appear in this post, just in a much more cogent form. 

 

Degrees of Knowledge

That familiar discomfort of wanting to write but not feeling ready yet.*

(The default voice pops up in my brain: “Then don’t write! Be kind to yourself! Keep reading until you understand things fully enough to write something cogent and coherent, something worth reading.”

The second voice: “But you committed to doing this! To not write** is to fail.***”

The third voice: “Well gosh, I do find it a bit puerile to incorporate meta-thoughts on the process of writing so frequently in my posts, but laziness triumphs, and voilà there they come. Welcome back. Let’s turn it to our advantage one more time.”)

This time the courage to just do it came from the realization that “I don’t understand this yet” is interesting in itself. We all navigate the world with different degrees of knowledge about different topics. To follow Wilfred Sellars, most of the time we inhabit the manifest image, “the framework in terms of which man came to be aware of himself as man-in-the-world,” or, more broadly, the framework in terms of which we ordinarily observe and explain our world. We need the manifest image to get by, to engage with one another and not to live in a state of utter paralysis, questioning our every thought or experience as if we were being tricked by the evil genius Descartes introduces at the outset of his Meditations (the evil genius toppled by the clear and distinct force of the cogito, the I am, which, per Dan Dennett, actually had the reverse effect of fooling us into believing our consciousness is something different from what it actually is). Sellars contrasts the manifest image with the scientific image: “the scientific image presents itself as a rival image. From its point of view the manifest image on which it rests is an ‘inadequate’ but pragmatically useful likeness of a reality which first finds its adequate (in principle) likeness in the scientific image.” So we all live in this not quite reality, our ability to cooperate and coexist predicated pragmatically upon our shared not-quite-accurate truths. It’s a damn good thing the mess works so well, or we’d never get anything done.

Sellars has a lot to say about the relationship between the manifest and scientific images, how and where the two merge and diverge. In the rest of this post, I’m going to catalogue my gradual coming to not-yet-fully understanding the relationship between mathematical machine learning models and the hardware they run on. It’s spurring my curiosity, but I certainly don’t understand it yet. I would welcome readers’ input on what to read and to whom to talk to change my manifest image into one that’s slightly more scientific.

So, one common thing we hear these days (in particular given Nvidia’s now formidable marketing presence) is that graphical processing units (GPUs) and tensor processing units (TPUs) are a key hardware advance driving the current ubiquity in artificial intelligence (AI). I learned about GPUs for the first time about two years ago and wanted to understand why they made it so much faster to train deep neural networks, the algorithms behind many popular AI applications. I settled with an understanding that the linear algebra–operations we perform on vectors, strings of numbers oriented in a direction in an n-dimensional space–powering these applications is better executed on hardware of a parallel, matrix-like structure. That is to say, properties of the hardware were more like properties of the math: they performed so much more quickly than a linear central processing unit (CPU) because they didn’t have to squeeze a parallel computation into the straightjacket of a linear, gated flow of electrons. Tensors, objects that describe the relationships between vectors, as in Google’s hardware, are that much more closely aligned with the mathematical operations behind deep learning algorithms.

There are two levels of knowledge there:

  • Basic sales pitch: “remember, GPU = deep learning hardware; they make AI faster, and therefore make AI easier to use so more possible!”
  • Just above the basic sales pitch: “the mathematics behind deep learning is better represented by GPU or TPU hardware; that’s why they make AI faster, and therefore easier to use so more possible!”

At this first stage of knowledge, my mind reached a plateau where I assumed that the tensor structure was somehow intrinsically and essentially linked to the math in deep learning. My brain’s neurons and synapses had coalesced on some local minimum or maximum where the two concepts where linked and reinforced by talks I gave (which by design condense understanding into some quotable meme, in particular in the age of Twitter…and this requirement to condense certainly reinforces and reshapes how something is understood).

In time, I started to explore the strange world of quantum computing, starting afresh off the local plateau to try, again, to understand new claims that entangled qubits enable even faster execution of the math behind deep learning than the soddenly deterministic bits of C, G, and TPUs. As Ivan Deutsch explains this article, the promise behind quantum computing is as follows:

In a classical computer, information is stored in retrievable bits binary coded as 0 or 1. But in a quantum computer, elementary particles inhabit a probabilistic limbo called superposition where a “qubit” can be coded as 0 and 1.

Here is the magic: Each qubit can be entangled with the other qubits in the machine. The intertwining of quantum “states” exponentially increases the number of 0s and 1s that can be simultaneously processed by an array of qubits. Machines that can harness the power of quantum logic can deal with exponentially greater levels of complexity than the most powerful classical computer. Problems that would take a state-of-the-art classical computer the age of our universe to solve, can, in theory, be solved by a universal quantum computer in hours.

What’s salient here is that the inherent probabilism of quantum computers make them even more fundamentally aligned with the true mathematics we’re representing with machine learning algorithms. TPUs, then, seem to exhibit a structure that best captures the mathematical operations of the algorithms, but exhibit the fatal flaw of being deterministic by essence: they’re still trafficking in the binary digits of 1s and 0s, even if they’re allocated in a different way. Quantum computing seems to bring back an analog computing paradigm, where we use aspects of physical phenomena to model the problem we’d like to solve. Quantum, of course, exhibits this special fragility where, should the balance of the system be disrupted, the probabilistic potential reverts down to the boring old determinism of 1s and 0s: a cat observed will be either dead or alive, as the harsh law of the excluded middle haunting our manifest image.

What, then, is the status of being of the math? I feel a risk of falling into Platonism, of assuming that a statement like “3 is prime” refers to some abstract entity, the number 3, that then gets realized in a lesser form as it is embodied on a CPU, GPU, or cup of coffee. It feels more cogent to me to endorse mathematical fictionalism, where mathematical statements like “3 is prime” tell a different type of truth than truths we tell about objects and people we can touch and love in our manifest world.****

My conclusion, then, is that radical creativity in machine learning–in any technology–may arise from our being able to abstract the formal mathematics from their substrate, to conceptually open up a liminal space where properties of equations have yet to take form. This is likely a lesson for our own identities, the freeing from necessity, from assumption, that enables us to come into the self we never thought we’d be.

I have a long way to go to understand this fully, and I’ll never understand it fully enough to contribute to the future of hardware R&D. But the world needs communicators, translators who eventually accept that close enough can be a place for empathy, and growth.


*This holds not only for writing, but for many types of doing, including creating a product. Agile methodologies help overcome the paralysis of uncertainty, the discomfort of not being ready yet. You commit to doing something, see how it works, see how people respond, see what you can do better next time. We’re always navigating various degrees of uncertainty, as Rich Sutton discussed on the In Context podcast. Sutton’s formalization of doing the best you can with the information you have available today towards some long-term goal, but learning at each step rather than waiting for the long-term result, is called temporal-difference learning.

**Split infinitive intentional.

***Who’s keeping score?

****That’s not to say we can’t love numbers, as Euler’s Identity inspires enormous joy in me, or that we can’t love fictional characters, or that we can’t love misrepresentations of real people that we fabricate in our imaginations. I’ve fallen obsessively in love with 3 or 4 imaginary men this year, creations of my imagination loosely inspired by the real people I thought I loved.

The image comes from this site, which analyzes themes in films by Darren Aronofsky. Maximilian Cohen, the protagonist of Pi, sees mathematical patterns all over the place, which eventually drives him to put a drill into his head. Aronofsky has a penchant for angst. Others, like Richard Feynman, find delight in exploring mathematical regularities in the world around us. Soap bubbles, for example, offer incredible complexity, if we’re curious enough to look.

Macro_Photography_of_a_soap_bubble
The arabesques of a soap bubble

 

AI Standing On the Shoulders of Giants

My dear friend and colleague Steve Irvine and I will represent our company integrate.ai at the ElevateToronto Festival this Wednesday (come say hi!). The organizers of a panel I’m on asked us to prepare comments about what makes an “AI-First Organization.”

There are many bad answers to this question. It’s not helpful for business leaders to know that AI systems can just-about reliably execute perception tasks like recognizing a puppy or kitty in a picture. Executives think that’s cute, but can’t for the life of them see how that would impact their business. Seeing these parallels requires synthetic thinking and expertise in AI, the ability to see how the properties of a business’ data set are structurally similar to those of the pixels in an image, which would merit the application of similar mathematical model to solve two problems that instantiate themselves quite differently in particular contexts. Most often, therefore, being exposed to fun breakthroughs leads to frustration. Research stays divorced from commercial application.

Another bad answer is mindlessly mobilize hype to convince businesses they should all be AI First. That’s silly.

On the one hand, as Bradford Cross convincingly argues, having “AI deliver core value” is a pillar of a great vertical AI startup. Here, AI is not an afterthought added like a domain suffix to secure funding from trendy VCs, but rather a necessary and sufficient condition of solving an end user problem. Often, this core competency is enhanced by other statistical features. For example, while the core capability of satellite analysis tools like Orbital Insight or food recognition tools like Bitesnap is image recognition*, the real value to customers arises with additional statistical insights across an image set (Has the number of cars in this Walmart parking lot increased year over year? To feel great on my new keto diet, what should I eat for dinner if I’ve already had two sausages for breakfast?).

On the other hand, most enterprises have been in business for a long time and have developed the Clayton Christensen armature of instilled practices and processes that make it too hard to flip a switch to just become AI First. (As Gottfried Leibniz said centuries before Darwin, natura non saltum facit  – nature does not make jumps). One false assumption about enterprise AI is that large companies have lots of data and therefore offer ripe environments for AI applications. Most have lots of data indeed, but have not historically collected, stored, or processed their data with an eye towards AI. That creates a very different data environment than those found at Google or Facebook, requiring tedious work to lay the foundations to get started. The most important thing enterprises need to keep in mind is to never to let perfection be the enemy of the good, knowing that no company has perfect data. Succeeding with AI takes a guerrilla mindset, a willingness to make do with close enough and the knack of breaking down the ideal application into little proofs of concepts that can set the ball rolling down the path towards a future goal.

Screen Shot 2017-09-10 at 12.14.38 PM
The swampy reality of working with enterprise data.

What large enterprises do have is history. They’ve been in business for a while. They’ve gotten really good at doing something, it’s just not always something a large market still wants or needs. And while it’s popular for executives to say that they are “a technology company that just so happen to be financial services/healthcare/auditing/insurance company,” I’m not sure this attitude delivers the best results for AI. Instead, I think it’s more useful for each enterprise to own up to its identity as a Something-Else-First company, but to add a shift in perspective to go from a Just-Plain-Old-Something-Else-First Company to a Something-Else-First-With-An-AI-Twist company.

The shift in perspective relates to how an organization embodies its expertise and harnesses traces of past work.** AI enables a company to take stock of the past judgments, work product, and actions of employees – a vast archive of years of expertise in being Something-Else-First – and either concatenate together these past actions to automate or inform a present action.

To be pithy, AI makes it easier for us to stand on the shoulder of giants.

An anecdote helps illustrate what this change in perspective might look like in practice. A good friend did his law degree ten years ago at Columbia. One final exam exercise was to read up on a case and write how a hypothetical judge would opine. Having procrastinated until the last minute, my friend didn’t have time to read and digest all the materials. What he did have was a study guide comprising answers former Columbia law students had given to the same exam question for the past 20 years. And this gave him a brilliant idea. As students all have to have high LSAT scores and transcripts to get into Columbia Law, he thought, we can assume that all past students have more or less the same capability of answering the question. So wouldn’t he do a better job predicting a judge’s opinion by finding the average answer from hundreds of similarly-qualified students rather than just reporting his own opinion? So as opposed to reading the primary materials, he shifted and did a statistical analysis of secondary materials, an analysis of the judgments that others in his position had given for a given task. When he handed in his assignment, the professor remarked on the brilliance of the technique, but couldn’t reward him with a good grade because it missed the essence of what he was tested for. It was a different style of work, a different style of jurisprudence.

Something-Else-First AI organizations work similarly. Instead of training each individual employee to do the same task, perhaps in a way similar to those of the past, perhaps with some new nuance, organizations capture past judgments and actions across a wide base of former employees and use these judgments – these secondary sources – to inform current actions. With enough data to train an algorithm, the actions might be completely automated. Most often there’s not enough to achieve satisfactory accuracy in the predictions, and organizations instead present guesses to current employees, who can provide feedback to improve performance in the future.

This ability to recycle past judgments and actions is very powerful. Outside enterprise applications, AI’s ability to fast forward our ability to stand on the shoulders of giants is shifting our direction as a species. Feedback loops like filtering algorithms on social media sites have the potential to keep us mired in an infantile past, with consequences that have been dangerous for democracy. We have to pay attention to that, as news and the exchange of information, all the way back to de Tocqueville, has always been key to democracy. Expanding self-reflexive awareness broadly across different domains of knowledge will undoubtedly change how disciplines evolve going forward. I remain hopeful, but believe we have some work to do to prepare the citizenship and workforce of the future.


*Image recognition algorithms do a great job showing why it’s dangerous for an AI company to bank its differentiation and strategy on an algorithmic capability as opposed to a unique ability to solve a business problem or amass a proprietary data set. Just two years ago, image recognition was a breakthrough capability just making its way to primetime commercial use. This June, Google released image recognition code for free via its Tensorflow API. That’s a very fast turnaround from capability to commodity, a transition of great interest to my former colleagues at Fast Forward Labs.

**See here for ethical implications of this backward-looking temporality.

The featured image comes from a twelfth-century manuscript by neo-platonist philosopher Bernard de Chartres. It illustrates this quotation: 

“We are like dwarfs on the shoulders of giants, so that we can see more than they, and things at a greater distance, not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up by their giant size.”

It’s since circulated from Newton to Nietzsche, each indicating indebtedness to prior thinkers as inspiration for present insights and breakthroughs. 

The Temporality of Artificial Intelligence

Nothing sounds more futuristic than artificial intelligence (AI). Our predictions about the future of AI are largely shaped by science fiction. Go to any conference, skim any WIRED article, peruse any gallery of stock images depicting AI*, and you can’t help but imagine AI as a disembodied cyberbabe (as in Spike Jonze’s Her), a Tin Man (who just wanted a heart!) gone rogue (as in the Terminator), or, my personal favorite, a brain out-of-the-vat-like-a-fish-out-of-water-and-into-some-non-brain-appropriate-space-like-a-robot-hand-or-an-android-intestine (as in Krang in the Ninja Turtles).

Screen Shot 2017-07-16 at 9.11.35 AM
A legit AI marketing photo!
Screen Shot 2017-07-16 at 9.12.33 AM
Krang should be the AI mascot, not the Terminator!

The truth is, AI looks more like this:

Screen Shot 2017-07-16 at 9.16.46 AM
A slide from Pieter Abbeel’s lecture at MILA’s Reinforcement Learning Summer School.

Of course, it takes domain expertise to picture just what kind of embodied AI product such formal mathematical equations would create. Visual art, argued Gene Kogan, a cosmopolitan coder-artist, may just be the best vehicle we have to enable a broader public to develop intuitions of how machine learning algorithms transform old inputs into new outputs.

 

One of Gene Kogan‘s beautiful machine learning recreations.

What’s important is that our imagining AI as superintelligent robots — robots that process and navigate the world with a similar-but-not-similar-enough minds, lacking values and the suffering that results from being social — precludes us from asking the most interesting philosophical and ethical questions that arise when we shift our perspective and think about AI as trained on past data and working inside feedback loops contingent upon prior actions.

Left unchecked, AI may actually be an inherently conservative technology. It functions like a time warp, capturing trends in human behavior from our near past and projecting them into our near future. As Alistair Croll recently argued, “just because [something was] correct in the past doesn’t make it right for the future.”

Our Future as Recent Past: The Case of Word Embeddings

In graduate school, I frequently had a jarring experience when I came home to visit my parents. I was in my late twenties, and was proud of the progress I’d made evolving into a more calm, confident, and grounded me. But the minute I stepped through my parents’ door, I was confronted with the reflection of a past version of myself. Logically, my family’s sense of my identity and personality was frozen in time: the last time they’d engaged with me on a day-to-day basis was when I was 18 and still lived at home. They’d anticipate my old habits, tiptoeing to avoid what they assumed would be a trigger for anxiety. Their behavior instilled doubt. I questioned whether the progress I assumed I’d made was just an illusion, and quickly fall back into old habits.

In fact, the discomfort arose from a time warp. I had progressed, I had grown, but my parents projected the past me onto the current me, and I regressed under the impact of their response. No man is an island. Our sense of self is determined not only by some internal beacon of identity, but also (for some, mostly) by the self we interpret ourselves to be given how others treat us and perceive us. Each interaction nudges us in some direction, which can be a regression back to the past or a progression into a collective future.

AI systems have the potential to create this same effect at scale across society. The shock we feel upon learning that algorithms automating job ads show higher-paying jobs to men rather than women, or recidivism-prediction tools place African-American males at higher risk than other races and classes, results from recapitulating issues we assume society has already advanced beyond. Sometimes we have progressed, and the tools are simply reflections for the real-world prejudices of yore; sometimes we haven’t progressed as much as we’d like to pretend, and the tools are barometers for the hard work required to make the world a world we want to live in.

Consider this research about a popular natural language processing (NLP) technique called word embeddings by Bolukbasi and others in 2016.**

The essence of NLP is to to make human talk (grey, messy, laden with doubts and nuances and sarcasm and local dialectics and….) more like machine talk (black and white 1s and 0s). Historically, NLP practitioners did this by breaking down language into different parts and using those parts as entities in a system.

tree why_graphs002
Tree graphs parsing language into parts, inspired by linguist Noam Chomsky.

Naturally, this didn’t get us as far as we’d hoped. With the rise of big data in the 2000s, many in the NLP community adopted a new approach based on statistics. Instead of teasing out structure in language with trees, they used massive processing power to find repeated patterns across millions of example sentences. If two words (or three, or four, or the general case, n) appeared multiple times in many different sentences, programmers assumed the statistical significance of that word pair conferred semantic meaning. Progress was made, but this n-gram technique failed to capture long-term, hierarchical relationships in language: how words at the end of a sentence or paragraph inflect the meaning of the beginning, how context inflects meaning, how other nuances make language different from a series of transactions at a retail store.

Word embeddings, made popular in 2013 with a Google technique called word2vec, use a vector, a string of numbers pointing in some direction in an N-dimensional space***, to capture (more of) the nuances of contextual and long-term dependencies (the 6589th number in the string, inflected in the 713th dimension, captures the potential relationship between a dangling participle and the subject of the sentence with 69% accuracy). This conceptual shift is powerful: instead of forcing simplifying assumptions onto language, imposing arbitrary structure to make language digestible for computers, these embedding techniques accept that meaning is complex, and therefore must be processed with techniques that can harness and harvest that complexity. The embeddings make mathematical mappings that capture latent relationships our measly human minds may not be able to see. This has lead to breakthroughs in NLP, like the ability to automatically summarize text (albeit in a pretty rudimentary way…) or improve translation systems.

With great power, of course, comes great responsibility. To capture more of the inherent complexity in language, these new systems require lots of training data, enough to capture patterns versus one-off anomalies. We have that data, and it dates back into our recent – and not so recent – past. And as we excavate enough data to unlock the power of hierarchical and linked relationships, we can’t help but confront the lapsed values of our past.

Indeed, one powerful property of word embeddings is their ability to perform algebra that represents analogies. For example, if we input: “man is to woman as king is to X?” the computer will output: “queen!” Using embedding techniques, this operation is conducted by using a vector – a string of numbers mapped in space – as a proxy for analogy: if two vectors have the same length and point in the same direction, we consider the words at each pole semantically related.

embeddings
Embeddings use vectors as a proxy for semantics and syntax.

Now, Bolukbasi and fellow researchers dug into this technique and found some relatively disturbing results.

Screen Shot 2017-07-30 at 10.27.32 AM

It’s important we remember that the AI systems themselves are neutral, not evil. They’re just going through the time warp, capturing and reflecting past beliefs we had in our society that leave traces in our language. The problem is, if we are unreflective and only gauge the quality of our systems based on the accuracy of their output, we may create really accurate but really conservative or racist systems (remember Microsoft Tay?). We need to take a proactive stance to make sure we don’t regress back to old patterns we thought we’ve moved past. Our psychology is pliable, and it’s very easy for our identities to adapt to the reflections we’re confronted with in the digital and physical world.

Bolukbasi and his co-authors took an interesting, proactive approach to debiasing their system, which involved mapping the words associated with gender in two dimensions, where the X axis represented gender (girls to the left and boys to the right). Words associated with gender but that don’t stir sensitivities in society were mapped under the X axis (e.g., girl : sister :: boy : brother). Words that do stir sensitivities (e.g., girl : tanning :: boy : firepower) were forced to collapse down to the Y axis, stripping them of any gender association.

Screen Shot 2017-07-30 at 10.32.47 AM

Their efforts show what mindfulness may look like in the context of algorithmic design. Just as we can’t run away from the inevitable thoughts and habits in our mind, given that they arise from our past experience, the stuff that shapes our minds to make us who we are, so too we can’t run away from the past actions of our selves and our society. It doesn’t help our collective society to blame the technology as evil, just as it doesn’t help any individual to repress negative emotions. We are empowered when we acknowledge them for what they are, and proactively take steps to silence and harness them so they don’t keep perpetuating in the future. This level of awareness is required for us to make sure AI is actually a progressive, futuristic technology, not one that traps us in the unfortunate patterns of our collective past.

Conclusion

This is one narrow example of the ethical and epistemological issues created by AI. In a future blog post in this series, I’ll explore how reinforcement learning frameworks – in particular contextual bandit algorithms – shape and constrain the data collected to train their systems, often in a way that mirrors the choices and constraints we face when we make decisions in real life.


*Len D’Avolio, Founder CEO of healthcare machine learning startup Cyft, curates a Twitter feed of the worst-ever AI marketing images every Friday. Total gems.

**This is one of many research papers on the topic. FAT ML is a growing community focused on fairness, accountability, and transparency in machine learning. the brilliant Joanna Bryson has written articles about bias in NLP systems. Cynthia Dwork and Toni Pitassi are focusing more on bias (though still do great work on differential privacy). Blaise Aguera y Arcas’ research group at Google thinks deeply about ethics and policy and recently published an article debunking the use of physiognomy to predict criminality. My colleague Tyler Schnoebelen recently gave a talk on ethical AI product design at Wrangle. The list goes on.

***My former colleague Hilary Mason loved thinking about the different ways we imagine spaces of 5 dimensions or greater.

The featured image is from Swedish film director Ingmar Bergman‘s Wild Strawberries (1957). Bergman’s films are more like philosophical essays than Hollywood thrillers. He uses medium, with its ineluctable flow, its ineluctable passage of time, to ponder the deepest questions of meaning and existence. A clock without hands, at least if we’re able to notice it, as our mind’s eye likely fills in the semantic gaps with the regularity of practice and habit. The eyes below betokening what we see and do not see. Bergman died June 30, 2007 the same day as Michelangelo Antonioni, his Italian counterpart. For me, the coincidence was as meaningful as that of the death of John Adams and Thomas Jefferson on July 4, 1826.  

The Unreasonable Effectiveness of Proxies*

Imagine it’s December 26. You’re right smack in the midst of your Boxing Day hangover, feeling bloated and headachy and emotionally off from the holiday season’s interminable festivities. You forced yourself to eat Aunt Mary’s insipid green bean casserole out of politeness and put one too many shots of dark rum in your eggnog. The chastising power of the prefrontal cortex superego is in full swing: you start pondering New Year’s Resolutions.

Lose weight! Don’t drink red wine for a year! Stop eating gluten, dairy, sugar, processed foods, high-fructose corn syrup–just stop eating everything except kale, kefir, and kimchi! Meditate daily! Go be a free spirit in Kerala! Take up kickboxing! Drink kombucha and vinegar! Eat only purple foods!

Right. Check.

(5:30 pm comes along. Dad’s offering single malt scotch. Sure, sure, just a bit…neat, please…)**

We’re all familiar with how hard it is to set and stick to resolutions. That’s because our brains have little instant gratification monkeys flitting around on dopamine highs in constant guerrilla warfare against the Rational Decision Maker in the prefrontal cortex (Tim Urban’s TEDtalk on procrastination is a complete joy). It’s no use beating ourselves up over a physiological fact. The error of Western culture, inherited from Catholicism, is to stigmatize physiology as guilt, transubstantiating chemical processes into vehicles of self deprecation with the same miraculous power used to transform just-about-cardboard wafers into the living body of Christ. Eastern mindsets, like those proselytized by Buddha, are much more empowering and pragmatic: if we understand our thoughts and emotions to be senses like sight, hearing, touch, taste, smell, we can then dissociate self from thoughts. Our feelings become nothing but indices of a situation, organs to sense a misalignment between our values–etched into our brains as a set of habitual synaptic pathways–and the present situation around us. We can watch them come in, let them sit there and fester, and let them gradually fade before we do something we regret. Like waiting out the internal agony until the baby in front of you in 27G on your overseas flight to Sydney stops crying.

Resolutions are so hard to keep because we frame them the wrong way. We often set big goals, things like, “in 2017 I’ll lose 30 pounds” or “in 2017 I’ll write a book.” But a little tweak to the framework can promote radically higher chances for success. We have to transform a long-term, big, hard-to-achieve goal into a short-term, tiny, easy-to-achieve action that is correlated with that big goal. So “lose weight” becomes “eat an egg rather than cereal for breakfast.” “Write a book” becomes “sit down and write for 30-minutes each day.” “Master Mandarin Chinese” becomes “practice your characters for 15 minutes after you get home from work.” The big, scary, hard-to-achieve goal that plagues our consciousness becomes a small, friendly, easy-to-achieve action that provides us with a little burst of accomplishment and satisfaction. One day we wake up and notice we’ve transformed.

It’s doubtful that the art of finding a proxy for something that is hard to achieve or know is the secret of the universe. But it may well be the secret to adapting the universe to our measly human capabilities, both at the individual (transform me!) and collective (transform my business!) level. And the power extends beyond self-help: it’s present in the history of mathematics, contemporary machine learning, and contemporary marketing techniques known as growth hacking.

Ut unum ad unum, sic omnia ad omnia: Archimedes, Cavalieri, and Calculus

Many people are scared of math. Symbols are scary: they’re a type of language and it takes time and effort to learn what they mean. But most of the time people struggle with math because they were badly taught. There’s no clearer example of this than calculus, where kids memorize equations that something is so instead of conceptually grasping why something is so.

The core technique behind calculus–and I admit this just scratches the surface–is to reduce something that’s hard to know down to something that’s easy to know. Slope is something we learn in grade school: change in y divided by change in x, how steep a line is. Taking the derivative is doing this same process but on a twisting, turning, meandering curve rather than just a line. This becomes hard because we add another dimension to the problem: with a line, the slope is the same no matter what x we put in; with a curve, the slope changes with our x input value, like a mountain range undulating from mesa to vertical extreme cliff. What we do in differential calculus is find a way to make a line serve as a proxy for a curve, to turn something we don’t know how to do and into something know how to do. So we take magnifying glasses with ever increasing potency and zoom in until our topsy-turvy meandering curve becomes nothing but a straight line; we find the slope; and then we sum up those little slopes all the way across our curve. The big conceptual breakthrough Newton and Leibniz made in the 17th century was to turn this proxy process into something continuous and infinite: to cross a conceptual chasm between a very, very small number and a number so small that it was effectively zero. Substituting close-enough-for-government-work-zero with honest-to-goodness-zero did not go without strong criticism from the likes of George Berkeley, a prominent philosopher of the period who argued that it’s impossible for us to say anything about the real world because we can only know how our minds filter the real world. But its pragmatic power to articulate the mechanics of the celestial motions overcame such conceptual trifles.***

riemann sum
Riemann Sums use the same proxy method to find the area under a curve. One replaces that hard task with the easier task of summing up the area of rectangles approximate the area of the curve.

This type of thinking, however, did not start in the 17th century. Greek mathematicians like Archimedes (famous for screaming Eureka! (I’ve found it!) and running around naked like a madman when he noticed that water levels in the bathtub rose proportionately to his body mass) used its predecessor, the method of exhaustion, to find the area of a shape like a circle or a blob by inscribing it within a series of easier-to-measure shapes like polygons or squares to get an approximation of the area by proxy to the polygon.

exhaustion
The method of exhaustion in ancient Greek math.

It’s challenging for us today to reimagine what Greek geometry was like because we’re steeped in a post-Cartesian mindset, where there’s an equivalence between algebraic expressions and geometric shapes. The Greeks thought about shapes as shapes. The math was tactical, physical, tangible. This mindset leads to interesting work in the Renaissance like Bonaventura’s Cavalieri’s method of indivisibles, which showed that the areas of two shapes were equivalent (often a hard thing to show) by cutting the shapes into parts and showing that each of the parts were equivalent (an easier thing to show). He turns the problem of finding equivalence into an analogy, ut unum ad unum, sic omnia ad omnia–as the one is to the one, so all are to all–substituting the part for the whole to turn this in a tractable problem. His worked paved the way for what would eventually become the calculus.****

Supervised Machine Learning for Dummies

My dear friend Moises Goldszmidt, currently Principal Research Scientist at Apple and a badass Jazz musician, once helped me understand that supervised machine learning is quite similar.

Again, at an admittedly simplified level, machine learning can be divided into two camps. Unsupervised machine learning is using computers to find patterns in data and sort different data into clusters. When most people hear they world machine learning, they think about unsupervised learning: computers automagically finding patterns, “actionable insights,” in data that would evade detection of measly human minds. In fact, unsupervised learning is an area of research in the upper echelons of the machine learning community. It can be valuable for exploratory data analysis, but only infrequently powers the products that are making news headlines. The real hero of the present day is supervised learning.

I like to think about supervised learning as follows:

Screen Shot 2017-07-02 at 9.51.14 AM

Let’s take a simple example. We’re moving, and want to know how much to put our house on the market for. We’re not real estate brokers, so we’re not great at measuring prices. But we do have a tape measure, so we are great at measuring the square footage of our house. Let’s say we go look through a few years of real estate records, and find a bunch of data points about how much houses go for and what their square footage is. We also have data about location, amenities like an in-house washer and dryer, and whether the house has a big back yard. But we notice a lot of variation in prices for houses with different sized back yards, but pretty consistent correlations between square footage and price. Eureka! we say, and run around the neighbourhood naked horrifying our neighbours! We can just plot the various data points of square footage : price, measure our square footage (we do have our handy tape measure), and then put that into a function that outputs a reasonable price!

This technique is called linear regression. And it’s the basis for many data science and machine learning techniques.

Screen Shot 2017-07-02 at 9.57.31 AM

The big breakthroughs in deep learning over the past couple of years (note, these algorithms existed for a while, but they are now working thanks to more plentiful and cheaper data, faster hardware, and some very smart algorithmic tweaks) are extensions of this core principle, but they add the following two capabilities (which are significant):

  • Instead of humans hand selecting a few simple features (like square footage or having a washer/dryer), computers transform rich data into a vector of numbers and find all sorts of features that might evade our measly human minds
  • Instead of only being able to model phenomena using simple linear lines, deep learning neural networks can model phenomena using topsy-turvy-twisty functions, which means they can capture richer phenomena like the environment around a self-driving car

At its root, however, even deep learning is about using mathematics to identify a good proxy to represent a more complex phenomenon. What’s interesting is that this teaches us something about the representational power of language: we barter in proxies at every moment of every day, crystallizing the complexities of the world into little tokens, words, that we use to exchange our experience with others. These tokens mingle and merge to create new tokens, new levels of abstraction, adding from from the dust from which we’ve come and to which we will return. Our castles in the sky. The quixotic figures of our imagination. The characters we fall in love with in books, not giving a dam that they never existed and never will. And yet, children learn that dogs are dogs and cats are cats after only seeing a few examples; computers, at least today, need 50,000 pictures of dogs to identify the right combinations of features that serve as a decent proxy for the real thing. Reducing that quantity is an active area of research.

Growth Hacking: 10 Friends in 14 Days

I’ve spent the last month in my new role at integrate.ai talking with CEOs and innovation leaders at large B2C businesses across North America. We’re in that miraculously fun, pre product-market fit phase of startup life where we have to make sure we are building a product that will actually solve a real, impactful, valuable business problem. The possibilities are broad and we’re managing more unknown unknowns than found in a Donald Rumsfeld speech (hat tip to Keith Palumbo of Cylance for the phrase). But we’re starting to see a pattern:

  • B2C businesses have traditionally focused on products, not customers. Analytics have been geared towards counting how many widgets were sold. They can track how something moves across a supply chain, but cannot track who their customers are, where they show up, and when. They can no longer compete on just product. They want to become customer centric.
  • All businesses are sustained by having great customers. Great means having loyalty and alignment with brand and having a high life-time value. They buy, they buy more, they don’t stop buying, and there’s a positive association when they refer a brand to others, particularly others who behave like them.
  • Wanting great customers is not a good technical analytics problem. It’s too fuzzy. So we have to find a way to transform a big objective into a small proxy, and focus energy and efforts on doing stuff in that small proxy window. Not losing weight, but eating an egg instead of pancakes for breakfast every morning.

Silicon Valley giants like Facebook call this type of thinking growth hacking: finding some local action you can optimize for that is a leading indicator of a long-term, larger strategic goal. The classic example from Facebook (which some rumour to be apocryphal, but it’s awesome as an example) was when the growth team realized that the best way to achieve their large, hard-to-achieve metric of having as many daily active users as possible was to reduce it to a smaller, easy-to-achieve metric of getting new users up to 10 friends in their first 14 days. 10 was the threshold for people’s ability to appreciate the social value of the site, a quantity of likes sufficient to drive dopamine hits that keep users coming back to the site.***** These techniques are rampant across Silicon Valley, with Netflix optimizing site layout and communications when new users join given correlations with potential churn rates down the line and Eventbrite making small product tweaks to help users understand they can use to tool to organize as well as attend events. The real power they unlock is similar to that of compound interest in finance: a small investment in your twenties can lead to massive returns after retirement.

Our goal at integrate.ai is to bring this thinking into traditional enterprises via a SaaS platform, not a consulting services solution. And to make that happen, we’re also scouting small, local wins that we believe will be proxies for our long-term success.

Conclusion

The spirit of this post is somewhat similar to a previous post about artifice as realism. There, I surveyed examples of situations where artifice leads to a deeper appreciation of some real phenomenon, like when Mendel created artificial constraints to illuminate the underlying laws of genetics. Proxies aren’t artifice, they’re parts that substitute for wholes, but enable us to understand (and manipulate) wholes in ways that would otherwise be impossible. Doorways into potential. A shift in how we view problems that makes them tractable for us, and can lead to absolutely transformative results. This takes humility. The humility of analysis. The practice of accepting the unreasonable effectiveness of the simple.


*Shout out to the amazing Andrej Karpathy, who authored The Unreasonable Effectiveness of Recurrent Neural Networks and Deep Reinforcement Learning: Pong from Pixels, two of the best blogs about AI available.

**There’s no dearth of self-help books about resolutions and self-transformation, but most of them are too cloying to be palatable. Nudge by Cass Sunstein and Richard Thaler is a rational exception.

***The philosopher Thomas Hobbes was very resistant to some of the formal developments in 17th-century mathematics. He insisted that we be able to visualize geometric objects in our minds. He was relegated to the dustbins of mathematical history, but did cleverly apply Euclidean logic to the Leviathan.

****Leibniz and Newton were rivals in discovering the calculus. One of my favourite anecdotes (potentially apocryphal?) about the two geniuses is that they communicated their nearly simultaneous discovery of the Fundamental Theorem of Calculus–which links derivatives to integrals–in Latin anagrams! Jesus!

*****Nir Eyal is the most prominent writer I know of on behavioural design and habit in products. And he’s a great guy!

The featured image is from the Archimedes Palimpsest, one of the most exciting and beautiful books in the world. It is a Byzantine prayerbook–or euchologion–written on a piece of parchment paper that originally contained mathematical treatises by the Greek mathematician Archimedes. A palimpsest, for reference, is a manuscript or piece of writing material on which the original writing has been effaced to make room for later writing but of which traces remain. As portions of Archimedes’ original Archimedes are very hard to read, researchers recently took the palimpsest to the Stanford Accelerator Laboratory and threw all sorts of particles at it really fast to see if they might shine light on hard-to-decipher passages. What they found had the potential to change our understanding of the history of math and the development of calculus! 

Notes from Transform.AI

I spent the last few days in Paris at Transform.AI, a European conference designed for c-level executives managed and moderated by my dear friend Joanna Gordon. This type of high-quality conference approaching artificial intelligence (AI) at the executive level is sorely needed. While there’s no lack of high-quality technical discussion at research conferences like ICML and NIPS, or even part-technical, part-application, part-venture conferences like O’Reilly AI, ReWork, or the Future Labs AI Summit (which my friends at ffVC did a wonderful job producing), most c-level executives still actively seek to cut through the hype and understand AI deeply and clearly enough to invest in tools, people, and process changes with confidence. Confidence, of course, is not certainty. And with technology changing at an ever faster clip, the task of running the show while transforming the show to keep pace with the near future is not for the faint of heart.

Transform.AI brought together enterprise and startup CEOs, economists, technologists, venture capitalists, and journalists. We discussed the myths and realities of the economic impact of AI, enterprise applications of AI, the ethical questions surrounding AI, and the state of what’s possible in the field. Here are some highlights.*

The Productivity Paradox: New Measures for Economic Value

The productivity paradox is the term Ryan Avent of the Economist uses to describe the fact that, while we worry about a near-future society where robots automate away both blue-collar and white-collar work, the present economy “does not feel like one undergoing a technology-driven productivity boom.” Indeed, as economists noted at Transform.AI, in developed countries like the US, job growth is up and “productivity has slowed to a crawl.” In his Medium post, Avent shows how economic progress is not a linear substitution equation: automation doesn’t impact growth and GDP by simply substituting the cost of labor with the cost of capital (i.e., replacing a full-time equivalent employee with an intelligent robot) despite our — likely fear-inspired — proclivities to reduce automation to simple swaps of robot for human. Instead, Avent argues that “the digital revolution is partly responsible for low labor costs” (by opening supply for cheap labor via outsourcing or just communication), that “low labour costs discourage investments in labour-saving technology, potentially reducing productivity growth,” and that benefiting from the potential of automation from new technologies like AI costs far more than just capital equipment, as it takes a lot of investment to get people, processes, and underlying technological infrastructure in place to actually use new tools effectively. There are reasons why IBM, McKinsey, Accenture, Salesforce, and Oracle make a lot of money off of “digital transformation” consulting practices.

The takeaway is that innovation and the economic impact of innovation move in syncopation, not tandem. The consequence of this syncopation is the plight of shortsightedness, the “I’ll believe it when I see it” logic that we also see from skeptics of climate change who refuse to open their imagination to any consequences beyond their local experience. The second consequence is the overly simplistic rhetoric of technocratic Futurism, which is also hard to swallow because it does not adequately account for the subtleties of human and corporate psychology that are the cornerstones of adoption. One conference attendee, the CEO of a computer vision startup automating radiology, commented that his firm can produce feature advances in their product 50 times faster than the market will be ready to use them. And this lag results not only from the time and money required for hospitals to modify their processes to accommodate machine learning tools, but also the ethical and psychological hurdles that need to be overcome to both accommodate less-than-certain results and accept a system that cannot explain why it arrived at its results.

In addition, everyone seemed to agree that the metrics used to account for growth, GDP, and other macroeconomic factors in the 20th-century may not be apt for the networked, platform-driven, AI-enabled economy of the 21st. For example, the value search tools like Google have on the economy far supersedes the advertising spends accounted for by company revenues. Years ago, when I was just beginning my career, my friend and mentor Geoffrey Moore advised me that traditional information-based consulting firms were effectively obsolete in the age of ready-at-hand information (the new problem being the need to erect virtual dams – using natural language processing, recommendation, and fact-checking algorithms – that can channel and curb the flood of available information). Many AI tools effectively concatenate past human capital – the expertise and value of a skilled-services work – into a present-day super-human laborer, a laborer who is the emergent whole (so more than the sum of its parts) of all past human work (well, just about all – let’s say normalized across some distribution). This fusion of man and machine**, of man’s past actions distilled into a machine, a machine that then works together with present and future employees to ever improve its capabilities, forces us to revisit what were once clean delineations between people, IP, assets, and information systems, the engines of corporations.

Accenture calls the category of new job opportunities AI will unlock The Missing Middle. Chief Technology and Innovation Officer Paul Daugherty and others have recently published an MIT Sloan article that classifies workers in the new AI economy as “trainers” (who train AI systems, curating input data and giving them their personality), “explainers” (who speak math and speak human, and serve as liaisons between the business and technology teams), and “sustainers” (who maintain algorithmic performance and ensure systems are deployed ethically). Those categories are sound. Time will tell how many new jobs they create.

Unrealistic Expectations and Realistic Starting Points

Everyone seems acutely aware of the fact that AI is in a hype cycle. And yet everyone still trusts AI is the next big thing. They missed the internet. They were too late for digital. They’re determined not to be too late for AI.

The panacea would be like the chip Keanu Reeves uses in the Matrix, the preprogrammed super-intelligent system you just plug into the equivalent of a corporate brain and boom, black belt karate-style marketing, anomaly detection, recommender systems, knowledge management, preemptive HR policies, compliance automation, smarter legal research, optimized supply chains, etc…

If only it were that easy.

While everyone knows we are in a hype cycle, technologists still say that one of the key issues data scientists and startups face today are unrealistic expectations from executives. AI systems still work best when they solve narrow, vertical-specific problems (which also means startups have the best chance of succeeding when they adopt a vertical strategy, as Bradford Cross eloquently argued last week). And, trained on data and statistics, AI systems output probabilities, not certainties. Electronic Discovery (i.e., the use of technology to automatically classify documents as relevant or not for a particular litigation matter) adoption over the past 20 years has a lot to teach us about the psychological hurdles to adoption of machine learning for use cases like auditing, compliance, driving, or accounting. People expect certainty, even if they are deluding themselves about their own propensities for error.*** We have a lot of work to disabuse people of their own foibles and fallacies before we can enable them to trust probabilistic systems and partner with them comfortably. That’s why so many advocates of self-driving cars have to spend time educating people about the fatality rates of human drivers. We hold machines to different standards of performance and certainty because we overestimate our own powers of reasoning. Amos Tversky and Daniel Kahneman are must reads for this new generation (Michael Lewis’s Undoing Project is a good place to start). We expect machines to explain why they arrived at a given output because we fool ourselves, often by retrospective narration, that we are principled in making our own decisions, and we anthropormophize our tools into having little robot consciousnesses.  It’s an exciting time for cognitive psychology, as it will be critical for any future economic growth that can arise from AI.

It doesn’t seem possible not to be in favor of responsible AI. Everyone seems to be starting to take this seriously. Conference attendees seemed to agree that there needs to be much more discourse between technologists, executives, and policy makers so that regulations like the European GPDR don’t stymy progress, innovation, and growth. The issues are enormously subtle, and for many we’re only at the point of being able to recognize that there are issues rather than provide concrete answers that can guide pragmatic action. For example, people love to ponder liability and IP, analytically teasing apart different loca of agency: Google or Amazon who offered the opensource library like Tensorflow, the organization or individual upon whose data a tool was trained, the data scientist who wrote the code for the algorithm, the engineer who wrote the code to harden and scale the solution, the buyer of the tool who signed the contract to use it and promised to update the code regularly (assuming it’s not on the cloud, in which case that’s the provider again), the user of the tool, the person whose life was impacted by consuming the output. From what I’ve seen, so far we’re at the stage where we’re transposing an ML pipeline into a framework to assign liability. We can make lists and ask questions, but that’s about as far as we get. The rubber will meet the road when these pipelines hit up against existing concepts to think through tort and liability. Solon Barocas and the wonderful team at Upturn are at the vanguard of doing this kind of work well.

Finally, I moderated a panel with a few organizations who are already well underway with their AI innovation efforts. Here we are (we weren’t as miserable as we look!):

Screen Shot 2017-06-19 at 9.08.21 AM
Journeys Taken; Lessons Learned Panelists at Transform.AI

The lesson I learned synthesizing the comments from the panelists is salient: customers and clients drive successful AI adoption efforts. I’ve written about the complex balance between innovation and application on this blog, having seen multiple failed efforts to apply a new technology just because it was possible. A lawyer on our panel discussed how, since the 2009 recession, clients simply won’t pay high hourly rates for services when they can get the same job done at a fraction of the cost at KPMG, PWC, or a technology vendor. Firms have no choice but to change how they work and price matters, and AI happens to be the tool that can parse text and crystallize legal know how. In the travel vertical, efforts to reach customers on traditional channels just don’t cut it in the age where the Millenials live on digital platforms like Facebook Messenger. And if a chat bot is the highest value channel, then an organization has to learn how to interface with chat bots. This fueled a top down initiative to start investing heavily in AI tools and talent.

Exactly where to put an AI or data science team to strike the right balance between promoting autonomy, minimizing disruption, and optimizing return varies per organization. Daniel Tunkelang presented his thoughts on the subject at the Fast Forward Labs Data Leadership conference this time last year.

Technology Alone is Not Enough: The End of The Two Cultures

I remember sitting in Pigott Hall on Stanford Campus in 2011. It was a Wednesday afternoon, and Michel Serres, a friend, mentor, and âme soeur,**** was giving one of his weekly lectures, which, as so few pull off well, elegantly packaged some insight from the history of mathematics in a masterful narrative frame.***** He bid us note the layout of Stanford campus, with the humanities in the old quad and the engineering school on the new quad. The very topography, he showed, was testimony to what C.P. Snow called The Two Cultures, the fault line between the hard sciences and the humanities that continues to widen in our STEM-obsessed, utilitarian world. It certainly doesn’t help that tuitions are so ludicrously high that it feels irresponsible to study a subject, like philosophy, art history, or literature, that doesn’t guarantee job stability or economic return. That said, Christian Madsbjerg of ReD Associates has recently shown in Sensemaking that liberal arts majors, at least those fortunate enough to enter management positions, end up having much higher salaries than most engineers in the long run. (I recognize the unfathomable salaries of top machine learning researchers likely undercuts this, but it’s still worth noting).

Can, should, and will the stark divide between the two cultures last?

Transform.AI attendees exhibited few points in favour of cultivating a new fusion between the humanities and the sciences/technology.

First, with the emerging interest paid to the ethics of AI, it may not be feasible for non-technologists to claim ignorance or allergic reactions to any mathematical and formal thinking as an excuse not to contribute rigorously to the debate. If people care about these issues, it is their moral obligation to make the effort to get up to speed in a reasonable way. This doesn’t mean everyone becomes literate in Python or active on scikit-learn. It just means having enough patience to understand the concepts behind the math, as that’s all these systems are.

Next, as I’ve argued before, for the many of us who are not coders or technologists, having the mental flexibility, creativity, and critical thinking skills awarded from a strong (and they’re not all strong…) humanities education will be all the more valuable as more routine, white-collar jobs gradually get automated. Everyone seems to think studying the arts and reading books will be cool again. And within Accenture’s triptych of new jobs and roles, there will be a large role for people versed in ethnography, ethics, and philosophy to define the ethical protocol of using these systems in a way that accords with corporate values.

Finally, the attendees’ reaction to a demo by Soul Machines, a New Zealand-based startup taking conversational AI to a whole new uncanny level, channeled the ghost of Steve Jobs: “Technology alone is not enough—it’s technology married with liberal arts, married with the humanities, that yields us the results that make our heart sing.” Attendees paid mixed attention to most of the sessions, always pulled back to the dopamine hit available from a quick look at their cell phones. But they sat riveted (some using their phones to record the demo) when Soul Machines CEO Mark Sagar, a two-time Academy Award winner for his work on films like Avatar, demoed a virtual baby who exhibits emotional responses to environmental stimulai and showed a video clip of Nadia, the “terrifying human” National Disability Insurance Scheme (NDIS) virtual agent enlivened by Cate Blanchett. The work is really something, and it confirmed that the real magic in AI arises not from the mysteriousness of the math, but the creative impulse to understand ourselves, our minds, and our emotions by creating avatars and replicas with which we’re excited to engage.

Screen Shot 2017-06-18 at 11.04.30 AM
Actress Cate Blachett as a “trainer” in the new AI economy, working together with Soul Machines.

My congratulations to Joanna Gordon for all her hard work. I look forward to next year’s event!


*Most specific names and references are omitted to respect the protocol of the Chatham House Rule.

**See J.D. Licklider’s canonical 1960 essay Man-Computer Symbiosis. Hat tip to Steve Lohr from the New York Times for introducing me to this.

***Stay tuned next week for a post devoted entirely to the lessons we can learn from the adoption of electronic discovery technologies over the past two decades.

****Reflecting on the importance of the lessons Michel Serres taught me is literally bringing tears to my eyes. Michel taught me how to write. He taught me why we write and how to find inspiration from, on the one hand, love and desire, and, on the other hand, fastidious discipline and habit. Tous les matins – every morning. He listed the greats, from Leibniz to Honoré de Balzac to Leo Tolstoy to Thomas Mann to William Faulker to himself, who achieved what they did by adopting daily practices. Serres popularized many of the great ideas from the history of mathematics. He was criticized by the more erudite of the French Académie, but always maintained his southern soul. He is a marvel, and an incredibly clear and creative thinker.

*****Serres gave one of the most influential lectures I’ve ever heard in his Wednesday afternoon seminars. He narrated the connection between social contract theory and the tragic form in the 17th century with a compact, clever anecdote of a WW II sailor and documentary film maker (pseudo-autobiographical) who happens to film a fight that escalates from a small conflict between two people into an all out brawl in a bar. When making his film, in his illustrative allegory, he plays the tape in reverse, effectively going from the state of nature – a war of all against all – to two representatives of a culture who carry the weight and brunt of war – the birth of tragedy. It was masterful.

Three Takes on Consciousness

Last week, I attended the C2 conference in Montréal, which featured an AI Forum coordinated by Element AI.* Two friends from Google, Hugo LaRochelle and Blaise Agüera y Arcas, led workshops about the societal (Hugo) and ethical (Blaise) implications of artificial intelligence (AI). In both sessions, participants expressed discomfort with allowing machines to automate decisions, like what advertisement to show to a consumer at what time, whether a job candidate should pass to the interview stage, whether a power grid requires maintenance, or whether someone is likely to be a criminal.** While each example is problematic in its own way, a common response to the increasing ubiquity of algorithms is to demand a “right to explanation,” as the EU recently memorialized in the General Data Protection Regulation slated to take effect in 2018. Algorithmic explainability/interpretability is currently an active area of research (my former colleagues at Fast Forward Labs will publish a report on the topic soon and members of Geoff Hinton’s lab in Toronto are actively researching it). While attempts to make sense of nonlinear functions are fascinating, I agree with Peter Sweeney that we’re making a category mistake by demanding explanations from algorithms in the first place: the statistical outputs of machine learning systems produce new observations, not explanations. I’ll side here with my namesake, David Hume, and say we need to be careful not to fall into the ever-present trap of mistaking correlation for cause.

One reason why people demand a right to explanation is that they believe that knowing why will grant us more control over outcome. For example, if we know that someone was denied a mortgage because of their race, we can intervene and correct for this prejudice. A deeper reason for the discomfort stems from the fact that people tend to falsely attribute consciousness to algorithms, applying standards for accountability that we would apply to ourselves as conscious beings whose actions are motivated by a causal intention. (LOL***)

Now, I agree with Noah Yuval Harari that we need to frame our understanding of AI as intelligence decoupled from consciousness. I think understanding AI this way will be more productive for society and lead to richer and cleaner discussions about the implications of new technologies. But others are actively at work to formally describe consciousness in what appears to be an attempt to replicate it.

In what follows, I survey three interpretations of consciousness I happened to encounter (for the first time or recovered by analogical memory) this week. There are many more. I’m no expert here (or anywhere). I simply find the thinking interesting and worth sharing. I do believe it is imperative that we in the AI community educate the public about how the intelligence of algorithms actually works so we can collectively worry about the right things, not the wrong things.

Condillac: Analytical Empiricism

Étienne Bonnot de Condillac doesn’t have the same heavyweight reputation in the history of philosophy as Descartes (whom I think we’ve misunderstood) or Voltaire. But he wrote some pretty awesome stuff, including his Traité des Sensations, an amazing intuition pump (to use Daniel Dennett’s phrase) to explore theory of knowledge that starts with impressions of the world we take in through our senses.

Condillac wrote the Traité in 1754, and the work exhibits two common trends from the French Enlightenment:

  • A concerted effort to topple Descartes’s rationalist legacy, arguing that all cognition starts with sense data rather than inborn mathematical truths
  • A stylistic debt to Descartes’s rhetoric of analysis, where arguments are designed to conjure a first-person experience of the process of arriving at an insight, rather than presenting third-person, abstract lessons learned

The Traité starts with the assumption that we can tease out each of our senses and think about how we process them in isolation. Condillac bids the reader to imagine a statue with nothing but the sense of smell. Lacking sight, sound, and touch, the statue “has no ideas of space, shape, anything outside of herself or outside her sensations, nothing of color, sound, or taste.” She is, in my opinion incredibly sensuously, nothing but the odor of a flower we waft in front of her. She becomes it. She is totally present. Not the flower itself, but the purest experience of its scent.

As Descartes constructs a world (and God) from the incontrovertible center of the cogito, so too does Condillac construct a world from this initial pure scent of rose. After the rose, he wafts a different flower – a jasmine – in front of the statue. Each sensation is accompanied by a feeling of like or dislike, of wanting more or wanting less. The statue begins to develop the faculties of comparison and contrast, the faculty of memory with faint impressions remaining after one flower is replaced by another, the ability to suffer in feeling a lack of something she has come to desire. She appreciates time as an index of change from one sensation to the next. She learns surprise as a break from the monotony of repetition. Condillac continues this process, adding complexity with each iteration, like the escalating tension Shostakovich builds variation after variation in the Allegretto of the Leningrad Symphony.

True consciousness, for Condillac, begins with touch. When she touches an object that is not her body, the sensation is unilateral: she notes the impenetrability and resistance of solid things, that she cannot just pass through them like a ghost or a scent in the air. But when she touches her own body, the sensation is bilateral, reflexive: she touches and is touched by. C’est moi, the first notion of self-awareness, is embodied. It is not a reflexive mental act that cannot take place unless there is an actor to utter it. It is the strangeness of touching and being touched all at once. The first separation between self and world. Consciousness as fall from grace.

It’s valuable to read Enlightenment philosophers like Condillac because they show attempts made more than 200 years ago to understand a consciousness entirely different from our own, or rather, to use a consciousness different from our own as a device to better understand ourselves. The narrative tricks of the Enlightenment disguised analytical reduction (i.e., focus only on smell in absence of its synesthetic entanglement with sound and sight) as world building, turning simplicity into an anchor to build a systematic understanding of some topic (Hobbes’s and Rousseau’s states of nature and social contract theories use the same narrative schema). Twentieth-century continental philosophers after Husserl and Heidegger preferred to start with our entanglement in a web of social context.

Koch and Tononi: Integrated Information Theory

In a recent Institute of Electrical and Electronics Engineers (IEEE) article, Christof Koch and Giulio Tononi embrace a different aspect of the Cartesian heritage, claiming that “a fundamental theory of consciousness that offers hope for a principled answer to the question of consciousness in entities entirely different from us, including machines…begins from consciousness itself–from our own experience, the only one we are absolutely certain of.” They call this “integrated information theory” (IIT) and say it has five essential properties:

  • Every experience exists intrinsically (for the subject of that experience, not for an external observer)
  • Each experience is structured (it is composed of parts and the relations among them)
  • It is integrated (it cannot be subdivided into independent components)
  • It is definite (it has borders, including some contents and excluding others)
  • It is specific (every experience is the way it is, and thereby different from trillions of possible others)

This enterprise is problematic for a few reasons. First, none of this has anything to do with Descartes, and I’m not a fan of sloppy references (although I make them constantly).

More importantly, Koch and Tononi imply that it’s a more valuable to try to replicate consciousness than to pursue a paradigm of machine intelligence different from human consciousness. The five characteristics listed above are the requirements for the physical design of an internal architecture of a system that could support a mind modeled after our own. And the corollary is that a distributed framework for machine intelligence, as illustrated in the film Her*****, will never achieve consciousness and is therefore inferior.

Their vision is very hard to comprehend and ultimately off base. Some of the most interesting work in machine intelligence today consists in efforts to develop new hardware and algorithmic architectures that can support training algorithms at the edge (versus currying data back to a centralized server), which enable personalization and local machine-to-machine communication (for IoT or self-driving cars) opportunities while protecting privacy. (See, for example, Xnor.ai, Federated Learning, and Filament).

Distributed intelligence presents a different paradigm for harvesting knowledge from the raw stuff of the world than the minds we develop as agents navigating a world from one subjective place. It won’t be conscious, but its very alterity may enable us to understand our species in its complexity in ways that far surpass our own consciousness, shackled as embodied monads. It may just be the crevice through which we can quantify a more collective consciousness, but will require that we be open minded enough to expand our notion of humanism. It took time, and the scarlet stains of ink and blood, to complete the Copernican Revolution; embracing the complexity of a more holistic humanism, in contrast to the fearful, nationalist trends of 2016, will be equally difficult.

Friston: Probable States and Counterfactuals

The third take on consciousness comes from The mathematics of mind-time, a recent Aeon essay by UCL neurologist Karl Friston.***** Friston begins his essay by comparing and contrasting consciousness and Darwinian evolution, arguing that neither is a thing, like a table or a stick of butter, that can be reified and touched and looked it, but rather that both are nonlinear processes “captured by variables with a range of possible values.” The move from one state to another following some motor that organizes their behavior: Friston calls this motor a Lyapunov function, “a mathematical quantity that describes how a system is likely to behave under specific condition.” The key thing with Lyapunov functions is that they minimize surprise (the improbability of being in a particular state) and maximize self-evidence (the probability that a given explanation or model accounting for the state is correct). Within this framework, “natural selection performs inference by selecting among different creatures, [and] consciousness performs inference by selecting among different states of the same creature (in particular, its brain).” Effectively, we are constantly constructing our consciousness as we imagine the potential future possible worlds that would result from an actions we’re considering taking, and then act — or transition to the next state in our mind’s Lyapunov function — by selecting that action that best preserves the coherence of our existing state – that best seems to preserve our or identity function in some predicted future state. (This is really complex but really compelling if you read it carefully and quite in line with Leibnizian ontology–future blog post!)

So, why is this cool?

There are a few things I find compelling in this account. First, when we reify consciousness as a thing we can point to, we trap ourselves into conceiving of our own identities as static and place too much importance on the notion of the self. In a wonderful commencement speech at Columbia in 2015, Ben Horowitz encouraged students to dismiss the clichéd wisdom to “follow their passion” because our passions change over life and our 20-year old self doesn’t have a chance in hell at predicting our 40-year old self. The wonderful thing in life opportunities and situations arise, and we have the freedom to adapt to them, to gradually change the parameters in our mind’s objective function to stabilize at a different self encapsulated by our Lyapunov function. As it happens, Classical Chinese philosophers like Confucius had more subtle theories of the self as ever-changing parameters to respond to new stimuli and situations. Michael Puett and Christine Gross-Loh give a good introduction to this line of thinking in The Path. If we loosen the fixity of identity, we can lead richer and happer lives.

Next, this functional, probabilistic account of consciousness provides a cleaner and more fruitful avenue to compare machine and human intelligence. In essence, machine learning algorithms are optimization machines: programmers define a goal exogenous to the system (e.g, “this constellation of features in a photo is called ‘cat’; go tune the connections between the nodes of computation in your network until you reliably classify photos with these features as ‘cat’!”), and the system updates its network until it gets close enough for government work at a defined task. Some of these machine learning techniques, in particular reinforcement learning, come close to imitating the consecutive, conditional set of steps required to achieve some long-term plan: while they don’t make internal representations of what that future state might look like, they do push buttons and parameters to optimize for a given outcome. A corollary here is that humanities-style thinking is required to define and decide what kinds of tasks we’d like to optimize for. So we can’t completely rely on STEM, but, as I’ve argued before, humanities folks would benefit from deeper understandings of probability to avoid the drivel of drawing false analogies between quantitative and qualitative domains.

Conclusion

This post is an editorialized exposition of others’ ideas, so I don’t have a sound conclusion to pull things together and repeat a central thesis. I think the moral of the story is that AI is bringing to the fore some interesting questions about consciousness, and inviting us to stretch the horizon of our understanding of ourselves as species so we can make the most of the near-future world enabled by technology. But as we look towards the future, we shouldn’t overlook the amazing artefacts from our past. The big questions seem to transcend generations, they just come to fruition in an altered Lyapunov state.


* The best part of the event was a dance performance Element organized at a dinner for the Canadian AI community Thursday evening. Picture Milla Jovovich in her Fifth Element white futuristic jumpsuit, just thinner, twiggier, and older, with a wizened, wrinkled face far from beautiful, but perhaps all the more beautiful for its flaws. Our lithe acrobat navigated a minimalist universe of white cubes that glowed in tandem with the punctuated digital rhythms of two DJs controlling the atmospheric sounds through swift swiping gestures over their machines, her body’s movements kaleidoscoping into comet projections across the space’s Byzantine dome. But the best part of the crisp linen performance was its organic accident: our heroine made a mistake, accidentally scraping her ankle on one of the sharp corners of the glowing white cubes. It drew blood. Her ankle dripped red, and, through her yoga contortions, she blotted her white jumpsuit near the bottom of her butt. This puncture of vulnerability humanized what would have otherwise been an extremely controlled, mind-over-matter performance. It was stunning. What’s more, the heroine never revealed what must have been aching pain. She neither winced nor uttered a sound. Her self-control, her act of will over her body’s delicacy, was an ironic testament to our humanity in the face of digitalization and artificial intelligence.

**My first draft of this sentence said “discomfort abdicating agency to machines” until I realized how loaded the word agency is in this context. Here are the various thoughts that popped into my head:

  • There is a legal notion of agency in the HIPAA Omnibus Rule (and naturally many other areas of law…), where someone acts on someone else’s behalf and is directly accountable to the principal. This is important for HIPAA because Business Associates who become custodians of patient data, are not directly accountable for the principal and therefore stand in a different relationship than agents.
  • There are virtual agents, often AI-powered technologies that represent individuals in virtual transactions. Think scheduling tools like Amy Ingram of x.ai. Daniel Tunkelang wrote a thought-provoking blog post more than a year ago about how our discomfort allowing machines to represent us, as individuals, could hinder AI adoption.
  • There is the attempt to simulate agency in reinforcement learning, as with OpenAI Universe, Their launch blog post includes a hyperlink to this Wikipedia article about intelligent agents.
  • I originally intended to use the word agency to represent how groups of people — be they in corporations or public subgroups in society — can automate decisions using machines. There is a difference between the crystalized policy and practices of a corporation and an machine acting on behalf of an individual. I suspect this article on legal personhood could be useful here.

***All I need do is look back on my life and say “D’OH” about 500,000 times to know this is far from the case.

****Highly recommended film, where Joaquin Phoenix falls in love with Samantha (embodied in the sultry voice of Scarlett Johansson), the persona of his device, only to feel betrayed upon realizing that her variant is the object of affection of thousands of other customers, and that to grow intellectually she requires far more stimulation than a mere mortal. It’s an excellent, prescient critique of how contemporary technology nourishes narcissism, as Phoenix is incapable of sustaining a relationship with women with minds different than his, but easily falls in love with a vapid reflection of himself.

***** Hat tip to Friederike Schüür for sending the link.

The featured image is a view from the second floor of the Aga Khan Museum in Toronto, taken yesterday. This fascinating museum houses a Shia Ismaili spiritual leader’s collection of Muslim artifacts, weaving a complex narrative quilt stretching across epochs (900 to 2017) and geographies (Spain to China). A few works stunned me into sublime submission, including this painting by the late Iranian filmmaker Abbas Kiarostami. 

kiarostami
Untitled (from the Snow White series), 2010. The Persian Antonioni, Kiarostami directed films like Taste of Cherry, The Wind Will Carry Usand Certified Copy

Education in the Age of AI

There’s all this talk that robots will replace humans in the workplace, leaving us poor, redundant schmucks with nothing to do but embrace the glorious (yet terrifying) creative potential of opiates and ennui. (Let it be noted that bumdom was all the rage in the 19th century, leading to the surging ecstasies of Baudelaire, Rimbaud, and the crown priest of hermeticism (and my all-time favorite poet besides Sappho*), Stéphane Mallarmé**).

As I’ve argued in a previous post, I think that’s bollocks. But I also think it’s worth thinking about what cognitive, services-oriented jobs could and should look like in the next 20 years as technology advances. Note that I’m restricting my commentary to professional services work, as the manufacturing, agricultural, and transportation (truck and taxi driving) sectors entail a different type of work activity and are governed by different economic dynamics. They may indeed be quite threatened by emerging artificial intelligence (AI) technologies.

So, here we go.

I’m currently reading Yuval Noah Harari’s latest book, Homo Deusand the following passage caught my attention:

“In fact, as time goes by it becomes easier and easier to replace humans with computer algorithms, not merely because the algorithms are getting smarter, but also because humans are professionalizing. Ancient hunter-gatherers mastered a very wide variety of skills in order to survive, which is why it would be immensely difficult to design a robotic hunter-gatherer. Such a robot would have to know how to prepare spear points from flint stones, find edible mushrooms in a forest, track down a mammoth and coordinate a charge with a dozen other hunters, and afterwards use medicinal herbs to bandage any wounds. However, over the last few thousand years we humans have been specializing. A taxi driver or a cardiologist specializes in a much narrower niche than a hunter-gatherer, which makes it easier to replace them with AI. As I have repeatedly stressed, AI is nowhere near human-like existence. But 99 per cent of human qualities and abilities are simply redundant for the performance of most modern jobs. For AI to squeeze humans out of the job market it needs only to outperform us in the specific abilities a particular profession demands.”

duchamp toilet
Harari is at his best critiquing liberal humanism. He features Duchamp’s ready-made art as the apogee of humanist aesthetics, where beauty is in the eye of the beholder.

This is astute. I love how Harari debunks the false impression that the human race progresses over time. We tend to be amazed upon seeing the technical difficulty of ancient works of art at the Met or the Louvre, assuming History (big H intended) is a straightforward, linear march from primitivism towards perfection. While culture and technologies are passed down through language and traditions from generation to generation, shaping and changing how we interact with one another and with the physical world, how we interact as a collective and emerge into something way beyond our capacities to observe, this does not mean that the culture and civilization we inhabit today is morally superior to those that came before, or those few that still exist in the remote corners of the globe. Indeed, primitive hunter-gatherers, given the broad range of tasks they had to carry out to survive prior to Adam Smith’s division of labor across a collective, may have a skill set more immune to the “cognitive” smarts of new technologies than a highly educated, highly specialized service worker!

This reveals something about both the nature of AI and the nature of the division of labor in contemporary capitalism arising from industrialism. First, it helps us understand that intelligent systems are best viewed as idiot savants, not Renaissance Men. They are specialists, not generalists. As Tom Mitchell explains in the opening of his manifesto on machine learning:

“We say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.”

Confusion about super-intelligent systems stems from the popular misunderstanding of the word “learn,” which is a term of art with a specific meaning in the machine learning community. The learning of machine learning, as Mitchell explains, does not mean perfecting a skill through repetition or synthesizing ideas to create something new. It means updating the slope of your function to better fit new data. In deep learning, these functions need not be simple, 2-D lines like we learn in middle school algebra: they can be incredibly complex curves that transverse thousands of dimensions (which we have a hard time visualizing, leading to tools like t-SNE that compress multi-dimensional math into the comfortable space-time parameters of human cognition).

Screen Shot 2017-04-08 at 9.28.32 AM
t-SNE reminds me of Edwin Abbott’s Flatland, where dimensions signify different social castes.

The AI research community is making baby steps in the dark trying to create systems with more general intelligence, i.e., systems that reliably perform more than one task. OpenAI Universe and DeepMind Lab are the most exciting attempts. At the Future Labs AI Summit this week, Facebook’s Yann LeCun discussed (largely failed) attempts to teach machines common sense. We tend to think that highly skilled tasks like diagnosing pneumonia from an X-ray or deeming a tax return in compliance with the IRS code require more smarts than intuiting that a Jenga tower is about to fall or perceiving that someone may be bluffing in a poker game. But these physical and emotional intuitions are, in fact, incredibly difficult to encode into mathematical models and functions. Our minds are probabilistic, plastic approximation machines, constantly rewiring themselves to help us navigate the physical world. This is damn hard to replicate with math, no matter how many parameters we stuff into a model! It may also explain why the greatest philosophers in history have always had room to revisit and question the givens of human experience****, infinitely more interesting and harder to describe than the specialized knowledge that populates academic journals.

Next, it is precisely this specialization that renders workers susceptible to being replaced by machines. I’m not versed enough in the history of economics to know how and when specialization arose, but it makes sense that there is a tight correlation between specialization, machine coordination, and scale, as R. David Dixon recently discussed in his excellent Medium article about machines and the division of labor. Some people are drawn to startups because they are the antithesis of specialization. You get to wear multiple hats, doubling, as I do in my role at Fast Forward Labs, as sales, marketing, branding, partnerships, and even consulting and services delivery. Guild work used to work this way, as in the nursery rhyme Rub-a-dub-dub: the butcher prepared meat from end to end, the baker made bread from end to end, and the candlestick maker made candles from end to end. As Dixon points out, tasks and the time it takes to do tasks become important once the steps in a given work process are broken apart, leading to theories of economic specialization as we see in Adam Smith, Henry Ford, and, in their modern manifestation, the cold, harsh governance of algorithms and KPIs. The corollary of scale is mechanism, templates, repetition, efficiency. And the educational system we’ve inherited from the late 19th century is tailored and tuned to farm out skilled, specialized automatons who fit nicely into the specific roles required by corporate machines like Google or Goldman Sachs.

Screen Shot 2017-04-08 at 10.25.03 AM
Frederick Taylor pioneered the scientific management theories that shaped factories in the 20th century, culminating in process methodologies like Lean Six Sigma

This leads to the core argument I’d like to put forth in this post: the right educational training and curriculum for the AI-enabled job market of the 21st century should create generalists, not specialists. Intelligent systems will get better and better at carrying out specific activities and specific tasks on our behalf. They’ll do them reliably. They won’t get sick. They won’t have fragile egos. They won’t want to stay home and eat ice cream after a breakup. They can and should take over this specialized work to drive efficiencies and scale. But, machines won’t be like startup employees any time soon. They won’t be able to reliably wear multiple hats, shifting behavior and style for different contexts and different needs. They won’t be creative problem solvers, dreamers, or creators of mission. We need to educate the next generation of workers to be more like startup employees. We need to bring back respect for the generalist. We need the honnête homme of the 17th century or Arnheim*** in Robert Musil’s Man Without Qualities. We need hunter-gatherers who may not do one thing fabulously, but have the resiliency to do a lot of things well enough to get by.

What types of skills should these AI-resistant generalists have and how can we teach them?

Flexibility and Adaptability

Andrew Ng is a pithy tweeter. He recently wrote: “The half-life of knowledge is decreasing. That’s why you need to keep learning your whole life, not only through college.”

This is sound. The apprenticeship model we’ve inherited from the guild days, where the father-figure professor passes down his wisdom to the student who becomes assistant professor then associate professor then tenured professor then stays there for the rest of his life only to repeat the cycle in the next generation, should probably just stop. Technologies are advancing quickly, which open opportunities to automate tasks that we used to do manually or do new things we couldn’t do before (like summarizing 10,000 customer reviews on Amazon in a second, as the system my colleagues at Fast Forward Labs built). Many people fear change and there are emotional hurdles to having to break out of habits and routine and learn something new. But honing the ability to recognize that new technologies are opening new markets and new opportunities will be seminal to succeeding in a world where things constantly change. This is not to extol disruption. That’s infantile. It’s to accept and embrace the need to constantly learn to stay relevant. That’s exciting and even meaningful. Most people wait until they retire to finally take the time to paint or learn a new hobby. What if work itself offered the opportunity to constantly expand and take on something new? That doesn’t mean that everyone will be up to the challenge of becoming a data scientist over night in some bootcamp. So the task universities and MOOCs have before them is to create curricula that will help laymen update their skills to stay relevant in the future economy.

Interdisciplinarity

From the late 17th to mid 18th centuries, intellectual giants like Leibniz, D’Alembert, and Diderot undertook the colossal task of curating and editing encyclopedias (the Greek etymology means “in the circle of knowledge”) to represent and organize all the world’s knowledge (Google and Wikipedia being the modern manifestations of the same goal). These Enlightenment powerhouses all assumed that the world was one, and that our various disciplines were simply different prisms that refracted a unified whole. The magic of the encyclopedia lay in the play of hyperlinks, where we could see the connections between things as we jumped from physics to architecture to Haitian voodoo, all different lenses we mere mortals required to view what God (for lack of a better name) would understand holistically and all at once.

Contemporary curricula focused on specialization force students to grow myopic blinders, viewing phenomena according to the methodologies and formalisms unique to a particular course of study. We then mistake these different ways of studying and asking questions for literally different things and objects in the world and in the process develop prejudices against other tastes, interests, and preferences.

There is a lot of value in doing the philosophical work to understand just what our methodologies and assumptions are, and how they shape how we view problems and ask and answer questions about the world. I think one of the best ways to help students develop sensitivities for methodologies is to have them study a single topic, like climate change, energy, truth, beauty, emergence, whatever it may be, from multiple disciplinary perspectives. So understanding how physics studies climate change; how politicians study climate change; how international relations study climate change; how authors have portrayed climate change and its impact on society in recent literature. Stanford’s Thinking Matters and the University of Chicago’s Social Thought programs approach big questions this way. I’ve heard Thinking Matters has not helped humanities enrollment at Stanford, but still find the approach commendable.

brodeur
The 18th-century Encyclopédie placed vocational knowledge like embroidery on equal footing with abstract knowledge of philosophy or religion.

Model Thinking

Michael Lewis does a masterful job narrating the lifelong (though not always strong) partnership between Daniel Kahneman and Amos Tversky in The Undoing Project. Kahneman and Tversky spent their lives showing how we are horrible probabilistic thinkers. We struggle with uncertainty and have developed all sorts of narrative and heuristic mental techniques to make our world make more concrete sense. Unfortunately, we need to improve our statistical intuitions to succeed in the world of AI, which are probabilistic systems that output responses couched in statistical terms. While we can hide this complexity behind savvy design choices, really understanding how AI works and how it may impact our lives requires that we develop intuitions for how models, well, model the world. At least when I was a student 10 years ago, statistics was not required in high school or undergrad. We had to take geometry, algebra, and calculus, not stats. It seems to make sense to make basic statistics a mandatory requirement for contemporary curricula.

Synthetic and Analogical Reasoning

There are a lot of TED Talks about brains and creativity. People love to hear about the science of making up new things. Many interesting breakthroughs in the history of philosophy or physics came from combining together two strands of thought that were formerly separate: the French psychoanalyst Jacques Lacan, whose unintelligibility is besides the point, cleverly combined linguistic theory from Ferdinand Saussure with psychoanalytic theory from Sigmund Freud to make his special brand of analysis; the Dutch physicist Erik Verlinde cleverly combined Newton and Maxwell’s equations with information theory to come to the stunning conclusion that gravity emerges from entropy (which is debated, but super interesting).

As we saw above, AI systems aren’t analogical or synthetic reasoners. In law, for example, they excel at classification tasks to identify if a piece of evidence is relevant for a given matter, but they fail at executing other types of reasoning tasks like identifying that the facts of a particular case are similar to the facts of another to merit a comparison using precedent. Technology cases help illustrate this. Data privacy law, for example, frequently thinks about our right to privacy in the virtual world through reference back to Katz v. United Statesa 1967 case featuring a man making illegal gambling bets from a phone booth. Topic modeling algorithms would struggle to recognize that words connoting phones and bets had a relationship to words connoting tracking sensors on the bottom of trucks (as in United States v. Jones). But lawyers and judges use Katz as precedent to think through this brave new world, showing how we can see similarities between radically different particulars from a particular level of abstraction.

Does this mean that, like stats, everyone should take a course on the basics of legal reasoning to make sure they’re relevant in the AI-enabled world? That doesn’t feel right. I think requiring coursework in the arts and humanities could do the trick.

Framing Qualitative Ideas as Quantitative Problems

A final skill that seems paramount for the AI-enabled economy is the ability to translate an idea into something that can be measured. Not everyone needs to be able to this, but there will be good jobs–and more and more jobs–for the people who can.

This is the data science equivalent of being able to go from strategy to tactical execution. Perhaps the hardest thing in data science, in particular as tooling becomes more ubiquitous and commoditized, is to figure out what problems are worth solving and what products are worth building. This requires working closely with non-technical business leaders who set strategy and have visions about where they’d like to go. But it takes a lot of work to break down a big idea into a set of small steps that can be represented as a quantitative problem, i.e., translated into some sort of technology or product. This is also synthetic and interdisciplinary thinking. It requires the flexibility to speak human and speak machine, to prioritize projects and have a sense for how long it will take to build a system that does what need it to do, to render the messy real-world tractable for computation. Machines won’t be automating this kind of work anytime soon, so it’s a skill set worth building. The best way to teach this is through case studies. I’d advocate for co-op training programs alongside theoretical studies, as Waterloo provides for its computer science students.

Conclusion

While our culture idealizes and extols polymaths like Da Vinci or Galileo, it also undervalues generalists who seem to lack the discipline and rigor to focus on doing something well. Our academic institutions prize novelty and specialization, pushing us to focus on earning the new leaf at the edge of a vast tree wizened with rings of experience. We need to change this mindset to cultivate a workforce that can successfully collaborate with intelligent machines. The risk is a world without work; the reward is a vibrant and curious new humanity.


The featured image is from Émile, Jean-Jacques Rousseau’s treatise on education. Rousseau also felt educational institutions needed to be updated to better match the theories of man and freedom developed during the Enlightenment. Or so I thought! Upon reading this, one of my favorite professors (and people), Keith Baker, kindly insisted that “Rousseau’s goal in Emile was not to show how educational institutions could be improved (which he didn’t think would be possible without a total reform of the social order) but how the education of an individual could provide an alternative (and a means for an individual to live free in a corrupt society).” Keith knows his stuff, and recalling that Rousseau is a misanthropic humanist makes things all the more interesting. 

*Sappho may be the sexiest poet of all time. An ancient lyric poet from Lesbos, she left fragments that pulse with desire and eroticism. Randomly opening a collection, for example, I came across this:

Afraid of losing you

I ran fluttering/like a little girl/after her mother

**I’m stretching the truth here for rhetorical effect. Mallarmé actually made a living as an English teacher, although he was apparently horrible at both teaching and speaking English. Like Knausgaard in Book 2 of My StruggleMallarmé frequently writes poems about how hard it is for him to find a block of silence while his kids are screaming and needing attention. Bourgeois family life sublimated into the ecstasy of hermeticism. Another fun fact is that the French Symbolists loved Edgar Allen Poe, but in France they drop the Allen and just call him Edgar Poe.

***Musil modeled Arnheim after his nemesis Walther Rathenau, the German Foreign Minister during the Weimar Republic. Rathenau was a Jew, but identified mostly as a German. He wrote some very mystical works on the soul that aren’t worth reading unless you’d like to understand the philosophical and cocktail party ethos of the Habsburg Empire.

****I’m a devout listener of the Partially Examined Life podcast, where they recently discussed Wilfrid Sellars’s Empiricism and the Philosophy of Mind. Sellars critiques what he calls “the myth of the given” and has amazing thoughts on what it means to tell the truth.

Whales, Fish, and Paradigm Shifts

I never really liked the 17th-century English philosopher Thomas Hobbes, but, as with Descartes, found myself continuously drawn to his work. The structure of Leviathan, the seminal founding work of the social contract theory tradition (where we willingly abdicate our natural rights in exchange for security and protection from an empowered government, so we can devote our energy to meaningful activities like work rather than constantly fear that our neighbors will steal our property in a savage war of of all against all)*, is so 17th-century rationalist and, in turn, so strange to our contemporary sensibilities. Imagine beginning a critique of the Trump administration by defining the axioms of human experience (sensory experience, imagination, memory, emotions) and imagining a fictional, pre-social state of affairs where everyone fights with one another, and then showing not only that a sovereign monarchy is a good form of government, but also that it must exist out of deductive logical necessity, and!, that it is formed by a mystical, again fictional, moment where we come together and willing agree it’s rational and in our best interests to hand over some of our rights, in a contract signed by all for all, that is then sublimated into a representative we call government! I found the form of this argument so strange and compelling that I taught a course tracing the history of this fictional “state of nature” in literature, philosophy, and film at Stanford.

Long preamble. The punch line is, because Hobbes haunted my thoughts whether I liked it or not, I was intrigued when I saw a poster advertising Trying Leviathan back in 2008. Given the title, I falsely assumed the book was about the contentious reception of Hobbesian thought. In fact, Trying Leviathan is D. Graham Burnett‘s intellectual history of Maurice v. Judd, an 1818 trial where James Maurice, a fish oil inspector who collected taxes for the state of New York, sought penalty against Samuel Judd, who had purchased three barrels of whale oil without inspection. Judd pleaded that the barrels contained whale oil, not fish oil, and so were not subject to the fish oil legislation. As with any great case**, the turnkey issue in Maurice v. Judd was much more profound than the matter that brought it to court: at stake was whether a whale is a fish, turning a quibble over tax law into an epic fight pitting new science against sedimented religious belief.

Indeed, in Trying Leviathan Burnett shows how, in 1818, four different witnesses with four very different backgrounds and sets of experiences answered what one would think would be a simple, factual question in four very different ways. The types of knowledge they espoused were structured differently and founded on different principles:

  • The Religious Syllogism: The Bible says that birds are in heaven, animals are on land, and fish are in the sea. The Bible says no wrong. We can easily observe that whales live in the sea. Therefore, a whale is a fish.
  • The Linnaean Taxonomy: Organisms can classified into different types and subtypes given a set of features or characteristics that may or may not be visible to the naked eye. Unlike fish, whales cannot breathe underwater because they have lungs, not gills. That’s why they come to the ocean surface and spout majestic sea geysers. We may not be able to observe the insides of whales directly, but we can use technology to help us do so.
    • Fine print: Linnaean taxonomy was a slippery slope to Darwinism, which throws meaning and God to the curb of history (see Nietzsche)
  • The Whaler’s Know-How: As tested by iterations and experience, I’ve learned that to kill a whale, I place my harpoon in a different part of the whale’s body than where I place my hook when I kill a fish. I can’t tell you why this is so, but I can certainly tell you that this is so, the proof being my successful bounty. This know-how has been passed down from whalers I apprenticed with.
  • The Inspector’s Orders: To protect the public from contaminated oil, the New York State Legislature had enacted legislation requiring that all fish oil sold in New York be gauged, inspected and branded. Oil inspectors were to impose a penalty on those who failed to comply. Better to err of the side of caution and count a whale as a fish than not obey the law.

From our 2017 vantage point, it’s easy to accept and appreciate the way the Linnaean taxonomist presented categories to triage species in the world. 200 years is a long time in the evolution of an idea: unlike genes, culture and knowledge can literally change from one generation to the next through deliberate choices in education. So we have to do some work to imagine how strange and unfamiliar this would have seemed to most people at the time, to appreciate how the Bible’s simple logic made more sense. Samuel Mitchell, who testified for Judd and represented the Linnaean strand of thought, likely faced the same set of social forces as Clarence Darrow in the Scopes Trial or Hilary Clinton in last year’s election. American mistrust of intellectuals runs deep.

But there’s a contemporary parallel that can help us relive and revive the emotional urgency of Maurice v. Judd: the rise of artificial intelligence (A.I.). The type of knowledge A.I. algorithms provide is different than the type of knowledge acquired by professionals whose activity they might replace. And society’s excited, confused, and fearful reaction to these new technologies is surfacing a similar set of epistemological collisions as those at play back in 1818.

Consider, for example, how Siddharta Mukherjee describes using deep learning algorithms to analyze medical images in a recent New Yorker article, A.I. versus M.D. Early in the article, Mukherjee distinguishes contemporary deep learning approaches to computer vision from earlier expert systems based on Boolean logic and rules:

“Imagine an old-fashioned program to identify a dog. A software engineer would write a thousand if-then-else statements: if it has ears, and a snout, and has hair, and is not a rat . . . and so forth, ad infinitum.”

With deep learning, we don’t list the features we want our algorithm to look for to identify a dog as a dog or a cat as a cat or a malignant tumor as a malignant tumor. We don’t need to be able to articulate the essence of dog or the essence of cat. Instead, we feed as many examples of previously labeled pieces of data into the algorithm and leave it to its own devices, as it tunes the weights linking together pockets of computing across a network, playing Marco Polo until it gets the right answer, so it can then make educated guesses on new data it hasn’t yet seen before. The general public understanding that A.I. can just go off and discern patterns in data, bootstrapping their way to superintelligence, is incorrect. Supervised learning algorithms take precipitates of human judgments and mimic them in the form of linear algebra and statistics. The intelligence behind the classifications or predictions, however, lies within a set of non-linear functions that defy any attempt at reduction to the linear, simple building blocks of analytical intelligence. And that, for many people, is a frightening proposition.

But it need not be. In the four knowledge categories sampled from Trying Leviathan above, computer vision using deep learning is like a fusion between a Linnaean Taxonomy and the Whaler’s Know-How. These algorithms excel at classification tasks, dividing the world up into parts. And they do it without our cleanly being able to articulate why – they do it by distilling, in computation, the lessons of apprenticeship, where the teacher is a set of labeled training data that tunes the worldview of the algorithm. As Mukherjee points out in his article, classification systems do a good job saying that something is the case, but do a horrible job saying why.*** For society to get comfortable with these new technologies, we should first help everyone understand what kinds of truths they are able (and not able) to tell. How they make sense of the world will be different from the tools we’ve used to make sense of the world in the past. But that’s not a bad thing, and it shouldn’t limit adoption. We’ll need to shift our standards for evaluating them else we’ll end up in the age old fight pitting the old against the new.

 

*Hobbes was a cynical, miserable man whose life was shaped by constant bloodshed and war. He’s said to have been born prematurely on April 5, 1588, at a moment when the Spanish Armada was invading England. He later reported that “my mother gave birth to twins: myself and fear.” Hobbes was also a third-rate mathematician whose insistence that he be able to mentally picture objects of inquiry stunted his ability to contribute to the more abstract and formal developments of the day, like the calculus developed simultaneously by Newton and Leibniz (to keep themselves entertained, as founding a new mathematical discipline wasn’t stimulating enough, they communicated the fundamental theorem of calculus to one another in Latin anagrams!)

**Zubulake v. UBS Warburgthe grandmother case setting standards for evidence in the age of electronic information, started off as a sexual harassment lawsuit. Lola v. Skadden started as an employment law case focused on overtime compensation rights, but will likely shape future adoption of artificial intelligence in law firms, as it claims that document review is not the practice of law because this is the type of activity a computer could do.

***There’s research on using algorithms to answer questions about causation, but many perception based tools simply excel at correlating stuff to proxies and labels for stuff.