Coffee Cup Computers, or Degrees of Knowledge

That familiar discomfort of wanting to write but not feeling ready yet.*

(The default voice pops up in my brain: “Then don’t write! Be kind to yourself! Keep reading until you understand things fully enough to write something cogent and coherent, something worth reading.”

The second voice: “But you committed to doing this! To not write** is to fail.***”

The third voice: “Well gosh, I do find it a bit puerile to incorporate meta-thoughts on the process of writing so frequently in my posts, but laziness triumphs, and voilà there they come. Welcome back. Let’s turn it to our advantage one more time.”)

This time the courage to just do it came from the realization that “I don’t understand this yet” is interesting in itself. We all navigate the world with different degrees of knowledge about different topics. To follow Wilfred Sellars, most of the time we inhabit the manifest image, “the framework in terms of which man came to be aware of himself as man-in-the-world,” or, more broadly, the framework in terms of which we ordinarily observe and explain our world. We need the manifest image to get by, to engage with one another and not to live in a state of utter paralysis, questioning our every thought or experience as if we were being tricked by the evil genius Descartes introduces at the outset of his Meditations (the evil genius toppled by the clear and distinct force of the cogito, the I am, which, per Dan Dennett, actually had the reverse effect of fooling us into believing our consciousness is something different from what it actually is). Sellars contrasts the manifest image with the scientific image: “the scientific image presents itself as a rival image. From its point of view the manifest image on which it rests is an ‘inadequate’ but pragmatically useful likeness of a reality which first finds its adequate (in principle) likeness in the scientific image.” So we all live in this not quite reality, our ability to cooperate and coexist predicated pragmatically upon our shared not-quite-accurate truths. It’s a damn good thing the mess works so well, or we’d never get anything done.

Sellars has a lot to say about the relationship between the manifest and scientific images, how and where the two merge and diverge. In the rest of this post, I’m going to catalogue my gradual coming to not-yet-fully understanding the relationship between mathematical machine learning models and the hardware they run on. It’s spurring my curiosity, but I certainly don’t understand it yet. I would welcome readers’ input on what to read and to whom to talk to change my manifest image into one that’s slightly more scientific.

So, one common thing we hear these days (in particular given Nvidia’s now formidable marketing presence) is that graphical processing units (GPUs) and tensor processing units (TPUs) are a key hardware advance driving the current ubiquity in artificial intelligence (AI). I learned about GPUs for the first time about two years ago and wanted to understand why they made it so much faster to train deep neural networks, the algorithms behind many popular AI applications. I settled with an understanding that the linear algebra–operations we perform on vectors, strings of numbers oriented in a direction in an n-dimensional space–powering these applications is better executed on hardware of a parallel, matrix-like structure. That is to say, properties of the hardware were more like properties of the math: they performed so much more quickly than a linear central processing unit (CPU) because they didn’t have to squeeze a parallel computation into the straightjacket of a linear, gated flow of electrons. Tensors, objects that describe the relationships between vectors, as in Google’s hardware, are that much more closely aligned with the mathematical operations behind deep learning algorithms.

There are two levels of knowledge there:

  • Basic sales pitch: “remember, GPU = deep learning hardware; they make AI faster, and therefore make AI easier to use so more possible!”
  • Just above the basic sales pitch: “the mathematics behind deep learning is better represented by GPU or TPU hardware; that’s why they make AI faster, and therefore easier to use so more possible!”

At this first stage of knowledge, my mind reached a plateau where I assumed that the tensor structure was somehow intrinsically and essentially linked to the math in deep learning. My brain’s neurons and synapses had coalesced on some local minimum or maximum where the two concepts where linked and reinforced by talks I gave (which by design condense understanding into some quotable meme, in particular in the age of Twitter…and this requirement to condense certainly reinforces and reshapes how something is understood).

In time, I started to explore the strange world of quantum computing, starting afresh off the local plateau to try, again, to understand new claims that entangled qubits enable even faster execution of the math behind deep learning than the soddenly deterministic bits of C, G, and TPUs. As Ivan Deutsch explains this article, the promise behind quantum computing is as follows:

In a classical computer, information is stored in retrievable bits binary coded as 0 or 1. But in a quantum computer, elementary particles inhabit a probabilistic limbo called superposition where a “qubit” can be coded as 0 and 1.

Here is the magic: Each qubit can be entangled with the other qubits in the machine. The intertwining of quantum “states” exponentially increases the number of 0s and 1s that can be simultaneously processed by an array of qubits. Machines that can harness the power of quantum logic can deal with exponentially greater levels of complexity than the most powerful classical computer. Problems that would take a state-of-the-art classical computer the age of our universe to solve, can, in theory, be solved by a universal quantum computer in hours.

For me what’s salient here is that the inherent probabilism of quantum computers make them even more fundamentally aligned with the true mathematics we’re representing with machine learning algorithms. TPUs, then, seem to exhibit a structure that best captures the mathematical operations of the algorithms, but exhibit the fatal flaw of being deterministic by essence: they’re still trafficking in the binary digits of 1s and 0s, even if they’re allocated in a different way. Quantum computing seems to bring back an analog computing paradigm, where we use aspects of physical phenomena to model the problem we’d like to solve. Quantum, of course, exhibits this special fragility where, should the balance of the system be disrupted, the probabilistic potential reverts down to the boring old determinism of 1s and 0s: a cat observed will be either dead or alive, as the harsh law of the excluded middle haunting our manifest image.

Once I opened pandoras box, I realize all sorts of things can be computers! One I find particularly interesting is a liquid state machine (LSM), which uses the ever changing properties of a perturbed liquid–like a cup of coffee you just put sugar into–as a means to compute a time series!

Screen Shot 2017-09-23 at 2.26.52 PM
A diagram from Maass et al’s paper on using liquid to make a real-time recurrent neural network

We often marvel at how the cloud has enabled the startup economy as we know it, reducing the cost of starting a business by significantly lowering capital investment required to get started with code. But imagine what it would be like if cups of coffees were real-time deep learning computers (granted we’d need to hook up something to keep track of the changing liquid states).

There’s an elemental beauty here: the flux of the world around us can be harnessed for computation. The world is breathing, beating, beating in randomness, and we can harness that randomness to do stuff.

I know close to nothing about analog computing. About liquid computing. All I know is it feels enormously exciting to shatter my assumption that digital computers are a given for machine learning. It’s just math, so why not find other places to observe it, rather than stick with the assumptions of the universal Turing machine?

And here’s what interests me most: what, then, is the status of being of the math? I feel a risk of falling into Platonism, of assuming that a statement like “3 is prime” refers to some abstract entity, the number 3, that then gets realized in a lesser form as it is embodied on a CPU, GPU, or cup of coffee. It feels more cogent to me to endorse mathematical fictionalism, where mathematical statements like “3 is prime” tell a different type of truth than truths we tell about objects and people we can touch and love in our manifest world.****

My conclusion, then, is that radical creativity in machine learning–in any technology–may arise from our being able to abstract the formal mathematics from their substrate, to conceptually open up a liminal space where properties of equations have yet to take form. This is likely a lesson for our own identities, the freeing from necessity, from assumption, that enables us to come into the self we never thought we’d be.

I have a long way to go to understand this fully, and I’ll never understand it fully enough to contribute to the future of hardware R&D. But the world needs communicators, translators who eventually accept that close enough can be a place for empathy, and growth.


*This holds not only for writing, but for many types of doing, including creating a product. Agile methodologies help overcome the paralysis of uncertainty, the discomfort of not being ready yet. You commit to doing something, see how it works, see how people respond, see what you can do better next time. We’re always navigating various degrees of uncertainty, as Rich Sutton discussed on the In Context podcast. Sutton’s formalization of doing the best you can with the information you have available today towards some long-term goal, but basing your learning and updates not on the long-term goal way out there but rather the next best guess is called temporal-difference learning.

**Split infinitive intentional.

***Who’s keeping score?

****That’s not to say we can’t love numbers, as Euler’s Identity inspires enormous joy in me, or that we can’t love fictional characters, or that we can’t love misrepresentations of real people that we fabricate in our imaginations. I’ve fallen obsessively in love with 3 or 4 imaginary men this year, creations of my imagination loosely inspired by the real people I thought I loved.

The image comes from this site, which analyzes themes in films by Darren Aronofsky. Maximilian Cohen, the protagonist of Pi, sees mathematical patterns all over the place, which eventually drives him to put a drill into his head. Aronofsky has a penchant for angst. Others, like Richard Feynman, find delight in exploring mathematical regularities in the world around us. Soap bubbles, for example, offer incredible complexity, if we’re curious enough to look.

Macro_Photography_of_a_soap_bubble
The arabesques of a soap bubble

 

The Secret Miracle

….And God made him die during the course of a hundred years and then He revived him and said: “How long have you been here?” “A day, or part of a day,” he replied.  – The Koran, II 261

The embryo of this post has gestated between my prefrontal cortex and limbic system for one year and eight months. It’s time.*

There seem to be two opposite axes from which we typically consider and evaluate character. Character as traits, Eigenschaften (see Musil), the markers of personality, virtue, and vice.

One extreme is to say that character is formed and reinforced through our daily actions and habits.** We are the actions we tend towards, the self not noun but verb, a precipitate we shape using the mysterious organ philosophers have historically called free will. Thoughts rise up and compete for attention,*** drawing and calling us to identify as a me, a me reinforced as our wrists rotate ever more naturally to wash morning coffee cups, a me shocked into being by an acute feeling of disgust, coiling and recoiling from some exogenous stimulus that drives home the need for a barrier between self and other, a me we can imagine looking back on from an imagined future-perfect perch to ask, like Ivan Ilyich, if we have indeed lived a life worth living. Character as daily habit. Character, as my grandfather used to say, as our ability to decide if today will be a good or a bad day when we first put our feet on the ground in the morning (Naturally, despite all the negative feelings and challenges, he always chose to make today a good day).

The other extreme is to say that true character is revealed in the fox hole. That traits aren’t revealed until they are tested. That, given our innate social nature, it’s relatively easy to seem one way when we float on, with, and in the waves of relative goodness embodied in a local culture (a family, a team, a company, a neighborhood, a community, perhaps a nation, hopefully a world, imagine a universe!), but that some truer nature will be shamelessly revealed when the going gets tough. This notion of character is the stuff of war movies. We like the hero who irrationally goes back to save one sheep at the expense of the flock when the napalm shit hits the fan. It seems we need these moments and myths to keep the tissue of social bonds intact. They support us with tears nudged and nourished by the sentimental cadences of John Williams soundtracks.

How my grandfather died convinced me that these two extremes are one.

On the evening of January 14, 2016, David William Hume (Bill, although it’s awesome to be part of a family with multiple David Humes!) was taken to a hospital near Pittsburgh. He’d suffered from heart issues for more than ten years and on that day the blood simply stopped pumping into his legs. He was rushed behind the doors of the emergency operating room, while my aunts, uncles, and grandmother waited in the silence and agony one comes to know in the limbo state upon hearing that a loved one has just had a heart attack, has just been shot, has just had a stroke, has just had something happen where time dilates to a standstill and, phenomenologically, the principles of physics linking time and space are halted in the pinnacle of love, of love towards another, of all else in the world put on hold until we learn whether the loved one will survive. (It may be that this experience of love’s directionality, of love at any distance, of our sense of self entangled in the existence and well being of another, is the clearest experiential metaphor available build our intuitions of quantum entanglement.****) My grandfather survived the operation. And the first thing he did was to call my grandmother and exclaim, with the glee and energy of a young boy, that he was alive, that he was delighted to be alive, and that he couldn’t have lived without her beside him, through 60 years of children crying and making pierogis and washing the floor and making sure my father didn’t squander his life at the hobby shop in Beaver Meadows Pennsylvania and learning that Katie, me, here, writing, the first grandchild was born, my eyebrows already thick and black as they’ll remain my whole life until they start to grey and signing Sinatra off key and loving the Red Sox and being a role model of what it means to live a good life, what it means to be a patriarch for our family, yes he called her and said he did it, that he was so scared but that he survived and it was just the same as getting out of bed every morning and making a choice to be happy and have a good day.

She smiled, relieved.

A few minutes later, he died.

It’s like a swan song. His character distilled to its essence. I think about this moment often. It’s so perfectly representative of the man I knew and loved.

And when I first heard about my grandfather’s death, I couldn’t help but think of Borges’s masterful (but what by Borges is not masterful?) short story The Secret Miracle. Instead of explaining why, I bid you, reader, to find out for yourself.


 * Mark my words: in 50 years time, we will cherish the novels of Jessie Ferguson, perhaps the most talented novelist of our time. Jessie was in my cohort in the comparative literature department at Stanford. The depth of her intelligence, sensitivity, and imagination eclipsed us all. I stand in awe of her talents as Jinny to Rhoda in Virginia Woolf’s The Waves. At her wedding, she asked me to read aloud Paul Celan’s Corona. I could barely do it without crying, given how immensely beautiful this poem is. Tucked away in the Berkeley Hills, her wedding remains the most beautiful ceremony I’ve ever attended.

**My ex-boyfriends, those privileged few who’ve observed (with a mixture of loving acceptance and tepid horror) my sacrosanct morning routine, certainly know how deeply this resonates with me.

***Thomas Metzinger shares some wonderful thoughts about consciousness and self-consciousness in his interview with Sam Harris on the Waking Up podcast. My favorite part of this episode is Metzinger’s very cogent conclusion that, should an AI ever suffer like we humans do (which Joanna Bryson compelling argues will not and should not occur), the most rational action it would then take would be to self-annihilate. Pace Bostrom and Musk, I find the idea that a truly intelligent being would choose non-existence over existence to be quite compelling, if only because I have first-hand experience with the acute need to allay acute suffering like anxiety immediately, whereas boredom, loneliness, and even sadness are emotional states within which I more comfortably abide.

****Many thanks to Yanbo Xue at D-Wave for first suggesting that metaphor. Jean-Luc Marion explores the subjective phenomenon of love in Le Phénomène Erotique; I don’t recall his mentioning quantum physics, although it’s been years since I read the book, but, based on conversations I had with him years ago at the University of Chicago, I predict this would be a parallel he’d be intrigued to explore.

My last dance with my grandfather, the late David William Hume. Snuff, as we lovingly called him, was never more at home than on the dance floor, even though he couldn’t sing and couldn’t dance. He used to do this cute knees-back-and-forth dance. He loved jazz standards, and would send me mix CDs he burned when I lived in Leipzig, Germany. In his 80s, he embarrassed the hell out of my grandmother, his wife of 60 years, by joining the local Dancing with the Stars chapter and taking Zumba lessons. He lived. He lived fully and with great integrity. 

AI Standing On the Shoulders of Giants

My dear friend and colleague Steve Irvine and I will represent our company integrate.ai at the ElevateToronto Festival this Wednesday (come say hi!). The organizers of a panel I’m on asked us to prepare comments about what makes an “AI-First Organization.”

There are many bad answers to this question. It’s not helpful for business leaders to know that AI systems can just-about reliably execute perception tasks like recognizing a puppy or kitty in a picture. Executives think that’s cute, but can’t for the life of them see how that would impact their business. Seeing these parallels requires synthetic thinking and expertise in AI, the ability to see how the properties of a business’ data set are structurally similar to those of the pixels in an image, which would merit the application of similar mathematical model to solve two problems that instantiate themselves quite differently in particular contexts. Most often, therefore, being exposed to fun breakthroughs leads to frustration. Research stays divorced from commercial application.

Another bad answer is mindlessly mobilize hype to convince businesses they should all be AI First. That’s silly.

On the one hand, as Bradford Cross convincingly argues, having “AI deliver core value” is a pillar of a great vertical AI startup. Here, AI is not an afterthought added like a domain suffix to secure funding from trendy VCs, but rather a necessary and sufficient condition of solving an end user problem. Often, this core competency is enhanced by other statistical features. For example, while the core capability of satellite analysis tools like Orbital Insight or food recognition tools like Bitesnap is image recognition*, the real value to customers arises with additional statistical insights across an image set (Has the number of cars in this Walmart parking lot increased year over year? To feel great on my new keto diet, what should I eat for dinner if I’ve already had two sausages for breakfast?).

On the other hand, most enterprises have been in business for a long time and have developed the Clayton Christensen armature of instilled practices and processes that make it too hard to flip a switch to just become AI First. (As Gottfried Leibniz said centuries before Darwin, natura non saltum facit  – nature does not make jumps). One false assumption about enterprise AI is that large companies have lots of data and therefore offer ripe environments for AI applications. Most have lots of data indeed, but have not historically collected, stored, or processed their data with an eye towards AI. That creates a very different data environment than those found at Google or Facebook, requiring tedious work to lay the foundations to get started. The most important thing enterprises need to keep in mind is to never to let perfection be the enemy of the good, knowing that no company has perfect data. Succeeding with AI takes a guerrilla mindset, a willingness to make do with close enough and the knack of breaking down the ideal application into little proofs of concepts that can set the ball rolling down the path towards a future goal.

Screen Shot 2017-09-10 at 12.14.38 PM
The swampy reality of working with enterprise data.

What large enterprises do have is history. They’ve been in business for a while. They’ve gotten really good at doing something, it’s just not always something a large market still wants or needs. And while it’s popular for executives to say that they are “a technology company that just so happen to be financial services/healthcare/auditing/insurance company,” I’m not sure this attitude delivers the best results for AI. Instead, I think it’s more useful for each enterprise to own up to its identity as a Something-Else-First company, but to add a shift in perspective to go from a Just-Plain-Old-Something-Else-First Company to a Something-Else-First-With-An-AI-Twist company.

The shift in perspective relates to how an organization embodies its expertise and harnesses traces of past work.** AI enables a company to take stock of the past judgments, work product, and actions of employees – a vast archive of years of expertise in being Something-Else-First – and either concatenate together these past actions to automate or inform a present action.

To be pithy, AI makes it easier for us to stand on the shoulder of giants.

An anecdote helps illustrate what this change in perspective might look like in practice. A good friend did his law degree ten years ago at Columbia. One final exam exercise was to read up on a case and write how a hypothetical judge would opine. Having procrastinated until the last minute, my friend didn’t have time to read and digest all the materials. What he did have was a study guide comprising answers former Columbia law students had given to the same exam question for the past 20 years. And this gave him a brilliant idea. As students all have to have high LSAT scores and transcripts to get into Columbia Law, he thought, we can assume that all past students have more or less the same capability of answering the question. So wouldn’t he do a better job predicting a judge’s opinion by finding the average answer from hundreds of similarly-qualified students rather than just reporting his own opinion? So as opposed to reading the primary materials, he shifted and did a statistical analysis of secondary materials, an analysis of the judgments that others in his position had given for a given task. When he handed in his assignment, the professor remarked on the brilliance of the technique, but couldn’t reward him with a good grade because it missed the essence of what he was tested for. It was a different style of work, a different style of jurisprudence.

Something-Else-First AI organizations work similarly. Instead of training each individual employee to do the same task, perhaps in a way similar to those of the past, perhaps with some new nuance, organizations capture past judgments and actions across a wide base of former employees and use these judgments – these secondary sources – to inform current actions. With enough data to train an algorithm, the actions might be completely automated. Most often there’s not enough to achieve satisfactory accuracy in the predictions, and organizations instead present guesses to current employees, who can provide feedback to improve performance in the future.

This ability to recycle past judgments and actions is very powerful. Outside enterprise applications, AI’s ability to fast forward our ability to stand on the shoulders of giants is shifting our direction as a species. Feedback loops like filtering algorithms on social media sites have the potential to keep us mired in an infantile past, with consequences that have been dangerous for democracy. We have to pay attention to that, as news and the exchange of information, all the way back to de Tocqueville, has always been key to democracy. Expanding self-reflexive awareness broadly across different domains of knowledge will undoubtedly change how disciplines evolve going forward. I remain hopeful, but believe we have some work to do to prepare the citizenship and workforce of the future.


*Image recognition algorithms do a great job showing why it’s dangerous for an AI company to bank its differentiation and strategy on an algorithmic capability as opposed to a unique ability to solve a business problem or amass a proprietary data set. Just two years ago, image recognition was a breakthrough capability just making its way to primetime commercial use. This June, Google released image recognition code for free via its Tensorflow API. That’s a very fast turnaround from capability to commodity, a transition of great interest to my former colleagues at Fast Forward Labs.

**See here for ethical implications of this backward-looking temporality.

The featured image comes from a twelfth-century manuscript by neo-platonist philosopher Bernard de Chartres. It illustrates this quotation: 

“We are like dwarfs on the shoulders of giants, so that we can see more than they, and things at a greater distance, not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up by their giant size.”

It’s since circulated from Newton to Nietzsche, each indicating indebtedness to prior thinkers as inspiration for present insights and breakthroughs. 

Analogue Repeaters

Screen Shot 2017-08-27 at 10.14.14 AM
Imagine my disappointment (gosh, never know if it’s two s’s or two p’s (gosh, never know if I should use an apostrophe to designate plural letters, i.e., not one unit of letter stwo units of letter s! (gosh, never know if I should use italics to emphasize a word or idea in a sentence, as my mind’s ear echoes the judging-but-because-too-polite-to-outrightly-judge-nudging voice of a dissertation advisor of yore, reprimanding me for the immaturity of style, as the semantics (the meaning (gosh, why in god’s name do people use such fancy words, just to exclude the rest of us?, diction synonymous with power (gosh, David Foster Wallace’s essay AUTHORITY AND AMERICAN USAGE* is so bold, so brilliant, so relevant today, as we skirt the elephant prancing around the delicate Sèvres teacups in Trump’s ramshackle cabinet of curiosities (gosh, the INCREDIBLE (intentional) elegance of Charles Sanders Peirce‘s prose, master of metaphysical metaphor, expert in epistemological eloquence, who writes sentences like That much-admired “ornament of logic” — the doctrine of clearness and distinctness — may be pretty enough, but it is high time to relegate to our cabinet of curiosities the antique bijou, and to wear about us something better adapted to modern uses and Thought is a thread of melody running through the succession of our sensations), a polka dot, maladroit elephant screaming at the top of her lungs that SOCIAL CLASS IS TABOO!, as we can’t mention it, we hide it under euphemisms like “income inequality” and our bad faith creates warts manifest as mean and hateful ideologies like white supremacy and terrorism as we ignore the root cause, cloaking our fears in political correctness and identity politics, it being too damn hard to change the system, too damn hard to imagine a different sociopolitical constellation, too damn different from what we’ve inherited, a system showing signs of wear and tear like my battered GI tract (gosh, it would be fucking wonderful if Western Medicine could get its fucking act together and stop poisoning us (me) with its antibiotics, its linear “science”, its specialities, its discrete anatomies that create nothing but carcasses and bulbous gout (no, fortunately, I don’t have gout!), for Christ’s sake why is it so hard to figure out what the hell we should eat to be healthy? Gluten, no gluten, Dairy, no dairy. No sugar (that one at least is clear). Legumes, no legumes. Onions, no onions. Meat, no meat. For fuck’s sake each microbiome is different, stop subjecting us (me) to your blunt diagnostics!))) (I think that’s the right number of close parentheses; does this mean I’d be a shitty programmer?) should carry enough weight without needing the crutches of form (gosh, Thomas Bernhard would be disappointed, as would so many crappy deconstructionists following the crumbs littering the pitiful trail created by the third-rate-metaphysical essays of Derrida and de Man)))) (again, I may have fucked up the number of close parentheses) upon clicking the URL for ERA Welcome** (ERA an acronym for Escarpment Repeater Association, an amateur radio club in Ontario presumably eponymous for the Niagara Escarpment) only to find that service was temporarily unavailable! (Yes, those are my bookmarks. I have multiple email inboxes because I have multiple jobs, each enabling different vectors of curiosity and expressing different sides of my personality. This post excavates the one side of me, a side unfettered by any professional obligations, unindexed by form, without requirement to keep those emails short and sweet, as it doesn’t matter if no one will read this or no one will respond, doesn’t matter if pure confusion thwarts action, a refuge (or, for fans of puns, a hamlet (personally most fond of Asta Nielsen’s 1920 interpretation)) from the day-to-day toil of pragmatic communication, where it’s so damn hard to muster the courage to cleave the continuous and create the necessary and sufficient form to catalyze “next steps” (gosh, how deeply Thomas Mann’s*** statement I wanted to write you a short letter but I didn’t have the time! resonates!))****  *(or, “POLITICS AND THE ENGLISH LANGUAGE” IS REDUNDANT) (parenthesis and capital letters original) ** I no longer remember how I found the ERA. It showed up during my search for all things related to Treasure Island, the subject of this post. It seemed quite fitting for a post about recursion, even if the members of the ERA use the word repeater much differently than I. ***One of those quotations (I was also taught that quote is a verb and quotation is a noun, and that I should display my erudition and never placate to common use) attributed to 5000 different people, just like that which doesn’t kill me make me stronger, which people attribute to St. John the Baptist, Nietzsche, or Rose Kennedy, depending on taste, experience, and predilection (admittedly redundant, but I liked the tricolon). ****Footnote Four, the most famous footnote in American Constitutional Law, comes from the 1938 ruling US v. Carolene Products CoIt reads: “There may be narrower scope for operation of the presumption of constitutionality when legislation appears on its face to be within a specific prohibition of the Constitution, such as those of the first ten amendments, which are deemed equally specific when held to be embraced within the Fourteenth….
It is unnecessary to consider now whether legislation which restricts those political processes which can ordinarily be expected to bring about repeal of undesirable legislation, is to be subjected to more exacting judicial scrutiny under the general prohibitions of the Fourteenth Amendment than are most other types of legislation….
Nor need we inquire whether similar considerations enter into the review of statutes directed at particular religious… or nations… or racial minorities…: whether prejudice against discrete and insular minorities may be a special condition, which tends seriously to curtail the operation of those political processes ordinarily to be relied upon to protect minorities, and which may call for a correspondingly more searching judicial inquiry… (italics added by the author of the Wikipedia article from which I copied and pasted the quotation). Ruth Bader Ginsburg has apparently drawn upon it during the Roberts’ Court to push the Court to do a better job protecting minorities, who, as recent politics and hate acts have shown, still need protecting.
Asta Nielsen - Hamlet (1921) cape
Had to put in this photo because it is just that awesome. Playing Hamlet, the beautiful Asta Nielsen rushes in to challenge Claudius, the new king. Nielsen uses her gender superbly to channel the great prince’s doubts.

Treasure Island is a nightmare for the field of location intelligence.* That’s because it is:

  • an Island
  • in a lake (namely, Lake Mindemoya)
  • on an island (namely, Manitoulin Island)
  • in a lake (namely, Lake Huron)

While said to be the world’s largest island in a lake on an island in a lake, Treasure Island is actually quite small: 1.4 kilometers long x 400 meters wide, housing only a few cottages and no permanent residents.** It has a wonderful history. William McPherson, former deputy chief of police for Toronto, purchased the island for $60 in 1883, only to sell it to Joe and Jean Hodgson in 1928. On July 13, 2015 around 11:30 am the Manitoulin Detachment of the Ontario Provincial Police (OPP) was notified of a series of break and enters that had occurred sometime on July 12, 2015 to one of the few buildings on Treasure Island; hooligans entered the garage area and caused damage to two golf carts, estimated in the thousands of dollars.

Folklore etiologies for the genesis of Treasure Island are equivocal. One tradition plays on the perennial frustrations between husband and wife:

According to local tradition, Treasure Island was originally named Mindemoya, because of the distinctive shape of the island: rising at one end to a long flat hill, with a steep drop to a short low area at the other end. According to legend, a great chieftain or demi-god who once lived in Sault Ste. Marie, Ontario had a wife who would not give him any peace. In frustration he eventually kicked her and sent her flying, to land on her hands and knees in Lake Mindemoya, leaving her back and rump above the water, which we see today as the island. The word “Mindemoya” supposedly means “Old Lady’s Bottom”. See dubious Wikipedia

The Anishinaabe tradition, by contrast, features a story about a rogue Odysseus-like trickster hero whose moral defies any heuristic logic (and is thereby much more interesting):

Treasure Island, or as it is also known, Mindemoya Island, can be seen from almost all vantage points around the lake. The shape of the island is of a person lying prostrate with hands outstretched in front. One Anishinaabe tale tells of Nanabush, the Trickster with magic powers, who was carrying his grandmother over his shoulder, and suddenly stumbling, caused her to fly through the air to the middle of the lake, landing on her hands and knees, where she has remained ever since. This is Mindemoya (Mndimooyenh), the legendary old woman of the lake. See The Manitoulin Expositor

800px-Nanabozho_pictograph,_Mazinaw_Rock
A pictogram of Nanabozho, an alternative Romanized version of Nanabush’s name, which itself varies across Ojibwe dialects. Nanabozho is part Shiva, a spirit involved in the world’s creation, part Odysseus, a wily trickster hero who outsmarts bad guys and throws grandmothers into the middle of the lake.

In today’s data-driven world, where quantitative interpretations of phenomena have replaced classical, Ovidian etiologies (i.e., where grandmothers or testy wives metamorphosize into islands within lakes within islands within lakes), Nanabozho’s guiles have been recast as topological oddities, recursive structures that break the consistency and unity required to pinpoint a location.

Indeed, what kind of data structure could possibly capture the recursive identity of Treasure Island? At one level of granularity, say measured with satellites that capture diameters of 50 kilometers, our location intelligence analyst (LIA) would say “at 45.762°N 82.209°W there is an island!” (this being Manitoulin Island, the Island around Lake Mindemoya, around Treasure Island). And our heroic LIA would be right, but right for the wrong referent! And that could cause all sorts of problems later on. So if she wanted to be more accurate, she could use smaller satellites that capture locations more precisely, or even a little drone, which could capture distances at, say, the 5 kilometer mark, at which point she would say, “at 45.762°N 82.209°W there is a lake!”, which would be wrong, but also right, just not right enough. And so on and so on, peeling away the layers of the topological onion, unpacking the nested babushkas of the inherited Russian Doll, the lips still crimson, the flowers a pattern indexing styles of yore, styles lost in the clean blankness of modernism. 

But isn’t this very recursion the key to consciousness? If we could solve the elusive identity of Treasure Island, might we not have found our topology for the mind’s emergence from matter, Nanabozho laughing heartily from his perch in the past, the old lady’s bottom the key to sentience all along, if we were only wise enough to look?

Why, yes and no.

I don’t know the scientific explanation behind the genesis of Treasure Island, as the internet focuses on the myths fit for tourists, perpetuated year after year in the oral tradition of volunteer guides, kindly ladies with kindly graying hair, ever ready to greet the city folk on holiday from the cottage. But it certainly seems plausible that Treasure Island evolved through some aleatory, stochastic whim of nature, the product of perfectly uncomprehending and incomprehensible forces that, through sheer force of repetition, through mindless trial and error, created a perfect recursive structure, Time outwitting Mind with paleolithic patience, repeating and repeating until chance and probability land on something that exhibits the mastery of Andy Goldsworthy‘s invisible hand, only to blow away in the autumn winds, our secrets transient, momentary missives that disappear upon observation, our Cumaean Sibyl whispering her truth to Schrödinger’s dead cat.

goldsworth
Imagine creating art destined to disappear. Imagine not caring if it didn’t last, but focusing on the momentary beauty, on the trick of the mind, where intentionality appears as natural as aleatory design. (If this sounds cool, check out Rivers and Tides.)

Here’s the punchline: many of the wondrous feats of contemporary artificial intelligence arise from similar forces of competence without comprehension (indebted to Dennett). Machines did not learn to beat Atari or Go because they designed a strategy to win, envisioning the game and moves and pieces like we human thinkers do. They did a bunch of stochastic random shit a million trillion times, creating what looks like intelligent design in what feels like an evolutionary microsecond, powered by the speed and efficiency of modern computation. That is, AI is like evolution on steroids, evolution put on super-duper-mega-fast-forward thanks to the simulation environments of computation. But if we break things down, each individual step in training an AI is a mindless guess, a mutation, a slip in transcription that, when favored by guiding forces we call “objective functions” – tools to minimize error that are a bit like survival of the fittest – can lead to something that just so happens to work.

And it goes without saying that Nanabozho has the last laugh. Throwing grandma into the lake defies logic. It’s an act of absurdity fit for the French, a nihilism fit for Germans donning leather pants as the Dude sips white Russians (will always hate the fucking Eagles), fit for Ionesco’s rhinoceroses prancing on stage. And any attempt we make to impose meaning through reduction will falter under the weight of determinism, strawmen too flimsy for the complexity of our non-linear world.

* A warm thank you to Arthur Berrill for helping me understanding the topological art behind location intelligence, which, when done well, involves intricate data structures that transform spatial relationships into rows and columns or relate space and time, or takes into account phenomenological aspects of people’s appreciation of the space around them (e.g., an 80-year-old widow experiences the buildings around her condo quite differently than a 25-year-old single gal). Arthur introduced me to Manitoulin Island, which inspired this post.

**I once swam to an island of similar size in the Pacific Ocean near Fiji. There was a palm tree and a few huts. I didn’t think there were people, and then some man started to scream at me to shoo me away. I got scared, and swam back to our boat. For a moment, I enjoyed the imagined awesomeness of being all alone on a small deserted island.

The featured image is of Frank Swannell surveying Takla Lake in British Columbia on behalf of the Grand Trunk Pacific Railway in 1912. To learn more about Swannell’s surveying efforts, read this article by Stephen Hume, a columnist for the Vancouver Sun who has written an entire series of vignettes associated with Canada’s 150th anniversary. Hume isn’t a last name one sees that often, so Google’s surfacing his articles second only to Wikipedia — which, like the http://www.eraradio.ca is simply not loading well for me recently — for the search term “Frank Swannell” must carry metaphysical significance. 

When Writing Fails

This post is for writers.

I take that back.

This post shares my experience as a writer to empathize with anyone working to create something from nothing, to break down the density of an intuition into a communicable sequence of words and thoughts, to digitize, which Daniel Dennett eloquently defines as “obliging continuous phenomena to sort themselves out into discontinuous, all-or-nothing phenomena” (I’m reading and very much enjoying From Bacteria to Bach and Back: The Evolution of Minds), to perform an act of judgment that eliminates other possibilities, foreclosing other forms to create its own form, Shiva and Vishnu forever linked in cycles of destruction, creation, and stability. That is to say, this post shares my experience as a writer as metonymy for our human experience as finite beings living finite lives.

shiva
The Nataraja, Shiva in his form as the cosmic ecstatic dancer, inspires trusting calm in me.

Earlier this morning, I started a post entitled Competence without Comprehension. I’ll publish it eventually, hopefully next week. It will feature a critique of explainable artificial intelligence (AI), efforts in the computer science and policy communities to develop AI systems that make sense for human users. I have tons to say here. I think it’s ok for systems to be competent without being comprehensible (my language is inspired by Dan Dennett, who thinks consciousness is an illusion) because I think there’s a lot of cognitive competencies we exhibit without comprehension (ranging from ways of transforming our habits or even become believers in some religious system by going through the motions, as I wrote about in my dissertation, to training students in operations like addition and subtraction before they learn the theoretical underpinnings of abstract algebra – which many people never even learn!). I think the word why is a complex word that we use in different ways: Aristotle thought there were four types of causes and, again following Dennett, we can distinguish between why as “how come” (what input data created this output result?) and why as “what for” (what action will be taken from this output result?). Aristotle’s causal theory was largely toppled during the scientific revolution and then again by Sartre in Existentialism is Humanism (where he shows we humans exist in a very different ways from paper knifes, which are an outdated technology!), but I think there’s value in resurrecting his categories to think about machine learning pipelines and explainable AI. I think there are different ethical implications for using AI in different settings, and I think there’s something crucial about social norms – how we expect humans to behave towards other humans – that is driving widespread interest in this topic and that, when analyzed, can help us understand what may (or may not!) be unique about the technology in its use in society.

In short, my blog post was a mess. I was trying to do too much at once, there were multiple lines of synthetic thought that need to be teased out to make sense to anyone, including myself. I will understand my position better once I devote the time and patience to exploring it, formalizing it, unpacking ideas that currently sit inchoate like bile. What I started today contains at least five different blog posts’ worth of material, on topics that many other people are thinking about, so could have some impact in the social circles that are meaningful for me and my identity. This is crucial: I care about getting this one right, because I can imagine the potential readers, or at least the hoped-for readers. That said, upon writing this, I can also step back and remember that the approval I think I’m seeking rarely matters in the end. I always feel immense gratitude when anyone — a perfect stranger — reads my work, and the most gratitude when someone feels inspired to write or grow herself.

So I allowed myself to pivot from seeking approval to instilling inspiration. To manifesting the courage to publish whatever – whatever came out from the primordial sludge of my being, the stream of consciousness that is the dribble of expression, ideas without form, but ideas nonetheless, the raw me sitting here trying my best on a Sunday afternoon in August, imagining the negative response of anyone who would bother to read this, but also knowing the charity I hold within my own heart for consistency, habit, effort, exposure, courage to display what’s weakest and most vulnerable to the public eye.

I see my experience this morning as metonymy for our experience as finite beings living finite lives because of the anxiety of choice. Each word written conditions the space of possibility of what can, reasonably, come next (Skip-Thought vectors assume this to function). The best writing is not about everything but is about something, just as many of the happiest and most successful people become that way by accepting the focus required to create and achieve, focus that shuts doors — or at least Japanese screens — on unrealized selves. I find the burden of identity terrific. My being resists the violence of definition and prefers to flit from self to self in the affordance of friendships, histories, and contexts. It causes anxiety, confusion, false starts, but also a richness I’m loathe to part with. It’s the give and take between creation and destruction, Shiva dancing joyfully in the heavens, her smile peering ironic around the corners of our hearts like the aura of the eclipse.

The featured image represents Tim Jenison’s recreation of Vermeer’s The Music Lesson. Tim’s Vermeer is a fantastic documentary about Jenison’s quest to confirm his theory of Vermeer’s optical painting technique, which worked somewhat similarly to a camera (refracting light to create a paint-by-number-like format for the artist). It’s a wonderful film that makes us question our assumptions about artistic genius and creativity. I firmly believe creativity stems from constraint, and that Romantic ideas of genius miss the mark in shaping cultural understandings of creativity. This morning, I lacked the constraints required to write. 

The Temporality of Artificial Intelligence

Nothing sounds more futuristic than artificial intelligence (AI). Our predictions about the future of AI are largely shaped by science fiction. Go to any conference, skim any WIRED article, peruse any gallery of stock images depicting AI*, and you can’t help but imagine AI as a disembodied cyberbabe (as in Spike Jonze’s Her), a Tin Man (who just wanted a heart!) gone rogue (as in the Terminator), or, my personal favorite, a brain out-of-the-vat-like-a-fish-out-of-water-and-into-some-non-brain-appropriate-space-like-a-robot-hand-or-an-android-intestine (as in Krang in the Ninja Turtles).

Screen Shot 2017-07-16 at 9.11.35 AM
A legit AI marketing photo!
Screen Shot 2017-07-16 at 9.12.33 AM
Krang should be the AI mascot, not the Terminator!

The truth is, AI looks more like this:

Screen Shot 2017-07-16 at 9.16.46 AM
A slide from Pieter Abbeel’s lecture at MILA’s Reinforcement Learning Summer School.

Of course, it takes domain expertise to picture just what kind of embodied AI product such formal mathematical equations would create. Visual art, argued Gene Kogan, a cosmopolitan coder-artist, may just be the best vehicle we have to enable a broader public to develop intuitions of how machine learning algorithms transform old inputs into new outputs.

 

One of Gene Kogan‘s beautiful machine learning recreations.

What’s important is that our imagining AI as superintelligent robots — robots that process and navigate the world with a similar-but-not-similar-enough minds, lacking values and the suffering that results from being social — precludes us from asking the most interesting philosophical and ethical questions that arise when we shift our perspective and think about AI as trained on past data and working inside feedback loops contingent upon prior actions.

Left unchecked, AI may actually be an inherently conservative technology. It functions like a time warp, capturing trends in human behavior from our near past and projecting them into our near future. As Alistair Croll recently argued, “just because [something was] correct in the past doesn’t make it right for the future.”

Our Future as Recent Past: The Case of Word Embeddings

In graduate school, I frequently had a jarring experience when I came home to visit my parents. I was in my late twenties, and was proud of the progress I’d made evolving into a more calm, confident, and grounded me. But the minute I stepped through my parents’ door, I was confronted with the reflection of a past version of myself. Logically, my family’s sense of my identity and personality was frozen in time: the last time they’d engaged with me on a day-to-day basis was when I was 18 and still lived at home. They’d anticipate my old habits, tiptoeing to avoid what they assumed would be a trigger for anxiety. Their behavior instilled doubt. I questioned whether the progress I assumed I’d made was just an illusion, and quickly fall back into old habits.

In fact, the discomfort arose from a time warp. I had progressed, I had grown, but my parents projected the past me onto the current me, and I regressed under the impact of their response. No man is an island. Our sense of self is determined not only by some internal beacon of identity, but also (for some, mostly) by the self we interpret ourselves to be given how others treat us and perceive us. Each interaction nudges us in some direction, which can be a regression back to the past or a progression into a collective future.

AI systems have the potential to create this same effect at scale across society. The shock we feel upon learning that algorithms automating job ads show higher-paying jobs to men rather than women, or recidivism-prediction tools place African-American males at higher risk than other races and classes, results from recapitulating issues we assume society has already advanced beyond. Sometimes we have progressed, and the tools are simply reflections for the real-world prejudices of yore; sometimes we haven’t progressed as much as we’d like to pretend, and the tools are barometers for the hard work required to make the world a world we want to live in.

Consider this research about a popular natural language processing (NLP) technique called word embeddings by Bolukbasi and others in 2016.**

The essence of NLP is to to make human talk (grey, messy, laden with doubts and nuances and sarcasm and local dialectics and….) more like machine talk (black and white 1s and 0s). Historically, NLP practitioners did this by breaking down language into different parts and using those parts as entities in a system.

tree why_graphs002
Tree graphs parsing language into parts, inspired by linguist Noam Chomsky.

Naturally, this didn’t get us as far as we’d hoped. With the rise of big data in the 2000s, many in the NLP community adopted a new approach based on statistics. Instead of teasing out structure in language with trees, they used massive processing power to find repeated patterns across millions of example sentences. If two words (or three, or four, or the general case, n) appeared multiple times in many different sentences, programmers assumed the statistical significance of that word pair conferred semantic meaning. Progress was made, but this n-gram technique failed to capture long-term, hierarchical relationships in language: how words at the end of a sentence or paragraph inflect the meaning of the beginning, how context inflects meaning, how other nuances make language different from a series of transactions at a retail store.

Word embeddings, made popular in 2013 with a Google technique called word2vec, use a vector, a string of numbers pointing in some direction in an N-dimensional space***, to capture (more of) the nuances of contextual and long-term dependencies (the 6589th number in the string, inflected in the 713th dimension, captures the potential relationship between a dangling participle and the subject of the sentence with 69% accuracy). This conceptual shift is powerful: instead of forcing simplifying assumptions onto language, imposing arbitrary structure to make language digestible for computers, these embedding techniques accept that meaning is complex, and therefore must be processed with techniques that can harness and harvest that complexity. The embeddings make mathematical mappings that capture latent relationships our measly human minds may not be able to see. This has lead to breakthroughs in NLP, like the ability to automatically summarize text (albeit in a pretty rudimentary way…) or improve translation systems.

With great power, of course, comes great responsibility. To capture more of the inherent complexity in language, these new systems require lots of training data, enough to capture patterns versus one-off anomalies. We have that data, and it dates back into our recent – and not so recent – past. And as we excavate enough data to unlock the power of hierarchical and linked relationships, we can’t help but confront the lapsed values of our past.

Indeed, one powerful property of word embeddings is their ability to perform algebra that represents analogies. For example, if we input: “man is to woman as king is to X?” the computer will output: “queen!” Using embedding techniques, this operation is conducted by using a vector – a string of numbers mapped in space – as a proxy for analogy: if two vectors have the same length and point in the same direction, we consider the words at each pole semantically related.

embeddings
Embeddings use vectors as a proxy for semantics and syntax.

Now, Bolukbasi and fellow researchers dug into this technique and found some relatively disturbing results.

Screen Shot 2017-07-30 at 10.27.32 AM

It’s important we remember that the AI systems themselves are neutral, not evil. They’re just going through the time warp, capturing and reflecting past beliefs we had in our society that leave traces in our language. The problem is, if we are unreflective and only gauge the quality of our systems based on the accuracy of their output, we may create really accurate but really conservative or racist systems (remember Microsoft Tay?). We need to take a proactive stance to make sure we don’t regress back to old patterns we thought we’ve moved past. Our psychology is pliable, and it’s very easy for our identities to adapt to the reflections we’re confronted with in the digital and physical world.

Bolukbasi and his co-authors took an interesting, proactive approach to debiasing their system, which involved mapping the words associated with gender in two dimensions, where the X axis represented gender (girls to the left and boys to the right). Words associated with gender but that don’t stir sensitivities in society were mapped under the X axis (e.g., girl : sister :: boy : brother). Words that do stir sensitivities (e.g., girl : tanning :: boy : firepower) were forced to collapse down to the Y axis, stripping them of any gender association.

Screen Shot 2017-07-30 at 10.32.47 AM

Their efforts show what mindfulness may look like in the context of algorithmic design. Just as we can’t run away from the inevitable thoughts and habits in our mind, given that they arise from our past experience, the stuff that shapes our minds to make us who we are, so too we can’t run away from the past actions of our selves and our society. It doesn’t help our collective society to blame the technology as evil, just as it doesn’t help any individual to repress negative emotions. We are empowered when we acknowledge them for what they are, and proactively take steps to silence and harness them so they don’t keep perpetuating in the future. This level of awareness is required for us to make sure AI is actually a progressive, futuristic technology, not one that traps us in the unfortunate patterns of our collective past.

Conclusion

This is one narrow example of the ethical and epistemological issues created by AI. In a future blog post in this series, I’ll explore how reinforcement learning frameworks – in particular contextual bandit algorithms – shape and constrain the data collected to train their systems, often in a way that mirrors the choices and constraints we face when we make decisions in real life.


*Len D’Avolio, Founder CEO of healthcare machine learning startup Cyft, curates a Twitter feed of the worst-ever AI marketing images every Friday. Total gems.

**This is one of many research papers on the topic. FAT ML is a growing community focused on fairness, accountability, and transparency in machine learning. the brilliant Joanna Bryson has written articles about bias in NLP systems. Cynthia Dwork and Toni Pitassi are focusing more on bias (though still do great work on differential privacy). Blaise Aguera y Arcas’ research group at Google thinks deeply about ethics and policy and recently published an article debunking the use of physiognomy to predict criminality. My colleague Tyler Schnoebelen recently gave a talk on ethical AI product design at Wrangle. The list goes on.

***My former colleague Hilary Mason loved thinking about the different ways we imagine spaces of 5 dimensions or greater.

The featured image is from Swedish film director Ingmar Bergman‘s Wild Strawberries (1957). Bergman’s films are more like philosophical essays than Hollywood thrillers. He uses medium, with its ineluctable flow, its ineluctable passage of time, to ponder the deepest questions of meaning and existence. A clock without hands, at least if we’re able to notice it, as our mind’s eye likely fills in the semantic gaps with the regularity of practice and habit. The eyes below betokening what we see and do not see. Bergman died June 30, 2007 the same day as Michelangelo Antonioni, his Italian counterpart. For me, the coincidence was as meaningful as that of the death of John Adams and Thomas Jefferson on July 4, 1826.  

The Unreasonable Effectiveness of Proxies*

Imagine it’s December 26. You’re right smack in the midst of your Boxing Day hangover, feeling bloated and headachy and emotionally off from the holiday season’s interminable festivities. You forced yourself to eat Aunt Mary’s insipid green bean casserole out of politeness and put one too many shots of dark rum in your eggnog. The chastising power of the prefrontal cortex superego is in full swing: you start pondering New Year’s Resolutions.

Lose weight! Don’t drink red wine for a year! Stop eating gluten, dairy, sugar, processed foods, high-fructose corn syrup–just stop eating everything except kale, kefir, and kimchi! Meditate daily! Go be a free spirit in Kerala! Take up kickboxing! Drink kombucha and vinegar! Eat only purple foods!

Right. Check.

(5:30 pm comes along. Dad’s offering single malt scotch. Sure, sure, just a bit…neat, please…)**

We’re all familiar with how hard it is to set and stick to resolutions. That’s because our brains have little instant gratification monkeys flitting around on dopamine highs in constant guerrilla warfare against the Rational Decision Maker in the prefrontal cortex (Tim Urban’s TEDtalk on procrastination is a complete joy). It’s no use beating ourselves up over a physiological fact. The error of Western culture, inherited from Catholicism, is to stigmatize physiology as guilt, transubstantiating chemical processes into vehicles of self deprecation with the same miraculous power used to transform just-about-cardboard wafers into the living body of Christ. Eastern mindsets, like those proselytized by Buddha, are much more empowering and pragmatic: if we understand our thoughts and emotions to be senses like sight, hearing, touch, taste, smell, we can then dissociate self from thoughts. Our feelings become nothing but indices of a situation, organs to sense a misalignment between our values–etched into our brains as a set of habitual synaptic pathways–and the present situation around us. We can watch them come in, let them sit there and fester, and let them gradually fade before we do something we regret. Like waiting out the internal agony until the baby in front of you in 27G on your overseas flight to Sydney stops crying.

Resolutions are so hard to keep because we frame them the wrong way. We often set big goals, things like, “in 2017 I’ll lose 30 pounds” or “in 2017 I’ll write a book.” But a little tweak to the framework can promote radically higher chances for success. We have to transform a long-term, big, hard-to-achieve goal into a short-term, tiny, easy-to-achieve action that is correlated with that big goal. So “lose weight” becomes “eat an egg rather than cereal for breakfast.” “Write a book” becomes “sit down and write for 30-minutes each day.” “Master Mandarin Chinese” becomes “practice your characters for 15 minutes after you get home from work.” The big, scary, hard-to-achieve goal that plagues our consciousness becomes a small, friendly, easy-to-achieve action that provides us with a little burst of accomplishment and satisfaction. One day we wake up and notice we’ve transformed.

It’s doubtful that the art of finding a proxy for something that is hard to achieve or know is the secret of the universe. But it may well be the secret to adapting the universe to our measly human capabilities, both at the individual (transform me!) and collective (transform my business!) level. And the power extends beyond self-help: it’s present in the history of mathematics, contemporary machine learning, and contemporary marketing techniques known as growth hacking.

Ut unum ad unum, sic omnia ad omnia: Archimedes, Cavalieri, and Calculus

Many people are scared of math. Symbols are scary: they’re a type of language and it takes time and effort to learn what they mean. But most of the time people struggle with math because they were badly taught. There’s no clearer example of this than calculus, where kids memorize equations that something is so instead of conceptually grasping why something is so.

The core technique behind calculus–and I admit this just scratches the surface–is to reduce something that’s hard to know down to something that’s easy to know. Slope is something we learn in grade school: change in y divided by change in x, how steep a line is. Taking the derivative is doing this same process but on a twisting, turning, meandering curve rather than just a line. This becomes hard because we add another dimension to the problem: with a line, the slope is the same no matter what x we put in; with a curve, the slope changes with our x input value, like a mountain range undulating from mesa to vertical extreme cliff. What we do in differential calculus is find a way to make a line serve as a proxy for a curve, to turn something we don’t know how to do and into something know how to do. So we take magnifying glasses with ever increasing potency and zoom in until our topsy-turvy meandering curve becomes nothing but a straight line; we find the slope; and then we sum up those little slopes all the way across our curve. The big conceptual breakthrough Newton and Leibniz made in the 17th century was to turn this proxy process into something continuous and infinite: to cross a conceptual chasm between a very, very small number and a number so small that it was effectively zero. Substituting close-enough-for-government-work-zero with honest-to-goodness-zero did not go without strong criticism from the likes of George Berkeley, a prominent philosopher of the period who argued that it’s impossible for us to say anything about the real world because we can only know how our minds filter the real world. But its pragmatic power to articulate the mechanics of the celestial motions overcame such conceptual trifles.***

riemann sum
Riemann Sums use the same proxy method to find the area under a curve. One replaces that hard task with the easier task of summing up the area of rectangles approximate the area of the curve.

This type of thinking, however, did not start in the 17th century. Greek mathematicians like Archimedes (famous for screaming Eureka! (I’ve found it!) and running around naked like a madman when he noticed that water levels in the bathtub rose proportionately to his body mass) used its predecessor, the method of exhaustion, to find the area of a shape like a circle or a blob by inscribing it within a series of easier-to-measure shapes like polygons or squares to get an approximation of the area by proxy to the polygon.

exhaustion
The method of exhaustion in ancient Greek math.

It’s challenging for us today to reimagine what Greek geometry was like because we’re steeped in a post-Cartesian mindset, where there’s an equivalence between algebraic expressions and geometric shapes. The Greeks thought about shapes as shapes. The math was tactical, physical, tangible. This mindset leads to interesting work in the Renaissance like Bonaventura’s Cavalieri’s method of indivisibles, which showed that the areas of two shapes were equivalent (often a hard thing to show) by cutting the shapes into parts and showing that each of the parts were equivalent (an easier thing to show). He turns the problem of finding equivalence into an analogy, ut unum ad unum, sic omnia ad omnia–as the one is to the one, so all are to all–substituting the part for the whole to turn this in a tractable problem. His worked paved the way for what would eventually become the calculus.****

Supervised Machine Learning for Dummies

My dear friend Moises Goldszmidt, currently Principal Research Scientist at Apple and a badass Jazz musician, once helped me understand that supervised machine learning is quite similar.

Again, at an admittedly simplified level, machine learning can be divided into two camps. Unsupervised machine learning is using computers to find patterns in data and sort different data into clusters. When most people hear they world machine learning, they think about unsupervised learning: computers automagically finding patterns, “actionable insights,” in data that would evade detection of measly human minds. In fact, unsupervised learning is an area of research in the upper echelons of the machine learning community. It can be valuable for exploratory data analysis, but only infrequently powers the products that are making news headlines. The real hero of the present day is supervised learning.

I like to think about supervised learning as follows:

Screen Shot 2017-07-02 at 9.51.14 AM

Let’s take a simple example. We’re moving, and want to know how much to put our house on the market for. We’re not real estate brokers, so we’re not great at measuring prices. But we do have a tape measure, so we are great at measuring the square footage of our house. Let’s say we go look through a few years of real estate records, and find a bunch of data points about how much houses go for and what their square footage is. We also have data about location, amenities like an in-house washer and dryer, and whether the house has a big back yard. But we notice a lot of variation in prices for houses with different sized back yards, but pretty consistent correlations between square footage and price. Eureka! we say, and run around the neighbourhood naked horrifying our neighbours! We can just plot the various data points of square footage : price, measure our square footage (we do have our handy tape measure), and then put that into a function that outputs a reasonable price!

This technique is called linear regression. And it’s the basis for many data science and machine learning techniques.

Screen Shot 2017-07-02 at 9.57.31 AM

The big breakthroughs in deep learning over the past couple of years (note, these algorithms existed for a while, but they are now working thanks to more plentiful and cheaper data, faster hardware, and some very smart algorithmic tweaks) are extensions of this core principle, but they add the following two capabilities (which are significant):

  • Instead of humans hand selecting a few simple features (like square footage or having a washer/dryer), computers transform rich data into a vector of numbers and find all sorts of features that might evade our measly human minds
  • Instead of only being able to model phenomena using simple linear lines, deep learning neural networks can model phenomena using topsy-turvy-twisty functions, which means they can capture richer phenomena like the environment around a self-driving car

At its root, however, even deep learning is about using mathematics to identify a good proxy to represent a more complex phenomenon. What’s interesting is that this teaches us something about the representational power of language: we barter in proxies at every moment of every day, crystallizing the complexities of the world into little tokens, words, that we use to exchange our experience with others. These tokens mingle and merge to create new tokens, new levels of abstraction, adding from from the dust from which we’ve come and to which we will return. Our castles in the sky. The quixotic figures of our imagination. The characters we fall in love with in books, not giving a dam that they never existed and never will. And yet, children learn that dogs are dogs and cats are cats after only seeing a few examples; computers, at least today, need 50,000 pictures of dogs to identify the right combinations of features that serve as a decent proxy for the real thing. Reducing that quantity is an active area of research.

Growth Hacking: 10 Friends in 14 Days

I’ve spent the last month in my new role at integrate.ai talking with CEOs and innovation leaders at large B2C businesses across North America. We’re in that miraculously fun, pre product-market fit phase of startup life where we have to make sure we are building a product that will actually solve a real, impactful, valuable business problem. The possibilities are broad and we’re managing more unknown unknowns than found in a Donald Rumsfeld speech (hat tip to Keith Palumbo of Cylance for the phrase). But we’re starting to see a pattern:

  • B2C businesses have traditionally focused on products, not customers. Analytics have been geared towards counting how many widgets were sold. They can track how something moves across a supply chain, but cannot track who their customers are, where they show up, and when. They can no longer compete on just product. They want to become customer centric.
  • All businesses are sustained by having great customers. Great means having loyalty and alignment with brand and having a high life-time value. They buy, they buy more, they don’t stop buying, and there’s a positive association when they refer a brand to others, particularly others who behave like them.
  • Wanting great customers is not a good technical analytics problem. It’s too fuzzy. So we have to find a way to transform a big objective into a small proxy, and focus energy and efforts on doing stuff in that small proxy window. Not losing weight, but eating an egg instead of pancakes for breakfast every morning.

Silicon Valley giants like Facebook call this type of thinking growth hacking: finding some local action you can optimize for that is a leading indicator of a long-term, larger strategic goal. The classic example from Facebook (which some rumour to be apocryphal, but it’s awesome as an example) was when the growth team realized that the best way to achieve their large, hard-to-achieve metric of having as many daily active users as possible was to reduce it to a smaller, easy-to-achieve metric of getting new users up to 10 friends in their first 14 days. 10 was the threshold for people’s ability to appreciate the social value of the site, a quantity of likes sufficient to drive dopamine hits that keep users coming back to the site.***** These techniques are rampant across Silicon Valley, with Netflix optimizing site layout and communications when new users join given correlations with potential churn rates down the line and Eventbrite making small product tweaks to help users understand they can use to tool to organize as well as attend events. The real power they unlock is similar to that of compound interest in finance: a small investment in your twenties can lead to massive returns after retirement.

Our goal at integrate.ai is to bring this thinking into traditional enterprises via a SaaS platform, not a consulting services solution. And to make that happen, we’re also scouting small, local wins that we believe will be proxies for our long-term success.

Conclusion

The spirit of this post is somewhat similar to a previous post about artifice as realism. There, I surveyed examples of situations where artifice leads to a deeper appreciation of some real phenomenon, like when Mendel created artificial constraints to illuminate the underlying laws of genetics. Proxies aren’t artifice, they’re parts that substitute for wholes, but enable us to understand (and manipulate) wholes in ways that would otherwise be impossible. Doorways into potential. A shift in how we view problems that makes them tractable for us, and can lead to absolutely transformative results. This takes humility. The humility of analysis. The practice of accepting the unreasonable effectiveness of the simple.


*Shout out to the amazing Andrej Karpathy, who authored The Unreasonable Effectiveness of Recurrent Neural Networks and Deep Reinforcement Learning: Pong from Pixels, two of the best blogs about AI available.

**There’s no dearth of self-help books about resolutions and self-transformation, but most of them are too cloying to be palatable. Nudge by Cass Sunstein and Richard Thaler is a rational exception.

***The philosopher Thomas Hobbes was very resistant to some of the formal developments in 17th-century mathematics. He insisted that we be able to visualize geometric objects in our minds. He was relegated to the dustbins of mathematical history, but did cleverly apply Euclidean logic to the Leviathan.

****Leibniz and Newton were rivals in discovering the calculus. One of my favourite anecdotes (potentially apocryphal?) about the two geniuses is that they communicated their nearly simultaneous discovery of the Fundamental Theorem of Calculus–which links derivatives to integrals–in Latin anagrams! Jesus!

*****Nir Eyal is the most prominent writer I know of on behavioural design and habit in products. And he’s a great guy!

The featured image is from the Archimedes Palimpsest, one of the most exciting and beautiful books in the world. It is a Byzantine prayerbook–or euchologion–written on a piece of parchment paper that originally contained mathematical treatises by the Greek mathematician Archimedes. A palimpsest, for reference, is a manuscript or piece of writing material on which the original writing has been effaced to make room for later writing but of which traces remain. As portions of Archimedes’ original Archimedes are very hard to read, researchers recently took the palimpsest to the Stanford Accelerator Laboratory and threw all sorts of particles at it really fast to see if they might shine light on hard-to-decipher passages. What they found had the potential to change our understanding of the history of math and the development of calculus! 

Notes from Transform.AI

I spent the last few days in Paris at Transform.AI, a European conference designed for c-level executives managed and moderated by my dear friend Joanna Gordon. This type of high-quality conference approaching artificial intelligence (AI) at the executive level is sorely needed. While there’s no lack of high-quality technical discussion at research conferences like ICML and NIPS, or even part-technical, part-application, part-venture conferences like O’Reilly AI, ReWork, or the Future Labs AI Summit (which my friends at ffVC did a wonderful job producing), most c-level executives still actively seek to cut through the hype and understand AI deeply and clearly enough to invest in tools, people, and process changes with confidence. Confidence, of course, is not certainty. And with technology changing at an ever faster clip, the task of running the show while transforming the show to keep pace with the near future is not for the faint of heart.

Transform.AI brought together enterprise and startup CEOs, economists, technologists, venture capitalists, and journalists. We discussed the myths and realities of the economic impact of AI, enterprise applications of AI, the ethical questions surrounding AI, and the state of what’s possible in the field. Here are some highlights.*

The Productivity Paradox: New Measures for Economic Value

The productivity paradox is the term Ryan Avent of the Economist uses to describe the fact that, while we worry about a near-future society where robots automate away both blue-collar and white-collar work, the present economy “does not feel like one undergoing a technology-driven productivity boom.” Indeed, as economists noted at Transform.AI, in developed countries like the US, job growth is up and “productivity has slowed to a crawl.” In his Medium post, Avent shows how economic progress is not a linear substitution equation: automation doesn’t impact growth and GDP by simply substituting the cost of labor with the cost of capital (i.e., replacing a full-time equivalent employee with an intelligent robot) despite our — likely fear-inspired — proclivities to reduce automation to simple swaps of robot for human. Instead, Avent argues that “the digital revolution is partly responsible for low labor costs” (by opening supply for cheap labor via outsourcing or just communication), that “low labour costs discourage investments in labour-saving technology, potentially reducing productivity growth,” and that benefiting from the potential of automation from new technologies like AI costs far more than just capital equipment, as it takes a lot of investment to get people, processes, and underlying technological infrastructure in place to actually use new tools effectively. There are reasons why IBM, McKinsey, Accenture, Salesforce, and Oracle make a lot of money off of “digital transformation” consulting practices.

The takeaway is that innovation and the economic impact of innovation move in syncopation, not tandem. The consequence of this syncopation is the plight of shortsightedness, the “I’ll believe it when I see it” logic that we also see from skeptics of climate change who refuse to open their imagination to any consequences beyond their local experience. The second consequence is the overly simplistic rhetoric of technocratic Futurism, which is also hard to swallow because it does not adequately account for the subtleties of human and corporate psychology that are the cornerstones of adoption. One conference attendee, the CEO of a computer vision startup automating radiology, commented that his firm can produce feature advances in their product 50 times faster than the market will be ready to use them. And this lag results not only from the time and money required for hospitals to modify their processes to accommodate machine learning tools, but also the ethical and psychological hurdles that need to be overcome to both accommodate less-than-certain results and accept a system that cannot explain why it arrived at its results.

In addition, everyone seemed to agree that the metrics used to account for growth, GDP, and other macroeconomic factors in the 20th-century may not be apt for the networked, platform-driven, AI-enabled economy of the 21st. For example, the value search tools like Google have on the economy far supersedes the advertising spends accounted for by company revenues. Years ago, when I was just beginning my career, my friend and mentor Geoffrey Moore advised me that traditional information-based consulting firms were effectively obsolete in the age of ready-at-hand information (the new problem being the need to erect virtual dams – using natural language processing, recommendation, and fact-checking algorithms – that can channel and curb the flood of available information). Many AI tools effectively concatenate past human capital – the expertise and value of a skilled-services work – into a present-day super-human laborer, a laborer who is the emergent whole (so more than the sum of its parts) of all past human work (well, just about all – let’s say normalized across some distribution). This fusion of man and machine**, of man’s past actions distilled into a machine, a machine that then works together with present and future employees to ever improve its capabilities, forces us to revisit what were once clean delineations between people, IP, assets, and information systems, the engines of corporations.

Accenture calls the category of new job opportunities AI will unlock The Missing Middle. Chief Technology and Innovation Officer Paul Daugherty and others have recently published an MIT Sloan article that classifies workers in the new AI economy as “trainers” (who train AI systems, curating input data and giving them their personality), “explainers” (who speak math and speak human, and serve as liaisons between the business and technology teams), and “sustainers” (who maintain algorithmic performance and ensure systems are deployed ethically). Those categories are sound. Time will tell how many new jobs they create.

Unrealistic Expectations and Realistic Starting Points

Everyone seems acutely aware of the fact that AI is in a hype cycle. And yet everyone still trusts AI is the next big thing. They missed the internet. They were too late for digital. They’re determined not to be too late for AI.

The panacea would be like the chip Keanu Reeves uses in the Matrix, the preprogrammed super-intelligent system you just plug into the equivalent of a corporate brain and boom, black belt karate-style marketing, anomaly detection, recommender systems, knowledge management, preemptive HR policies, compliance automation, smarter legal research, optimized supply chains, etc…

If only it were that easy.

While everyone knows we are in a hype cycle, technologists still say that one of the key issues data scientists and startups face today are unrealistic expectations from executives. AI systems still work best when they solve narrow, vertical-specific problems (which also means startups have the best chance of succeeding when they adopt a vertical strategy, as Bradford Cross eloquently argued last week). And, trained on data and statistics, AI systems output probabilities, not certainties. Electronic Discovery (i.e., the use of technology to automatically classify documents as relevant or not for a particular litigation matter) adoption over the past 20 years has a lot to teach us about the psychological hurdles to adoption of machine learning for use cases like auditing, compliance, driving, or accounting. People expect certainty, even if they are deluding themselves about their own propensities for error.*** We have a lot of work to disabuse people of their own foibles and fallacies before we can enable them to trust probabilistic systems and partner with them comfortably. That’s why so many advocates of self-driving cars have to spend time educating people about the fatality rates of human drivers. We hold machines to different standards of performance and certainty because we overestimate our own powers of reasoning. Amos Tversky and Daniel Kahneman are must reads for this new generation (Michael Lewis’s Undoing Project is a good place to start). We expect machines to explain why they arrived at a given output because we fool ourselves, often by retrospective narration, that we are principled in making our own decisions, and we anthropormophize our tools into having little robot consciousnesses.  It’s an exciting time for cognitive psychology, as it will be critical for any future economic growth that can arise from AI.

It doesn’t seem possible not to be in favor of responsible AI. Everyone seems to be starting to take this seriously. Conference attendees seemed to agree that there needs to be much more discourse between technologists, executives, and policy makers so that regulations like the European GPDR don’t stymy progress, innovation, and growth. The issues are enormously subtle, and for many we’re only at the point of being able to recognize that there are issues rather than provide concrete answers that can guide pragmatic action. For example, people love to ponder liability and IP, analytically teasing apart different loca of agency: Google or Amazon who offered the opensource library like Tensorflow, the organization or individual upon whose data a tool was trained, the data scientist who wrote the code for the algorithm, the engineer who wrote the code to harden and scale the solution, the buyer of the tool who signed the contract to use it and promised to update the code regularly (assuming it’s not on the cloud, in which case that’s the provider again), the user of the tool, the person whose life was impacted by consuming the output. From what I’ve seen, so far we’re at the stage where we’re transposing an ML pipeline into a framework to assign liability. We can make lists and ask questions, but that’s about as far as we get. The rubber will meet the road when these pipelines hit up against existing concepts to think through tort and liability. Solon Barocas and the wonderful team at Upturn are at the vanguard of doing this kind of work well.

Finally, I moderated a panel with a few organizations who are already well underway with their AI innovation efforts. Here we are (we weren’t as miserable as we look!):

Screen Shot 2017-06-19 at 9.08.21 AM
Journeys Taken; Lessons Learned Panelists at Transform.AI

The lesson I learned synthesizing the comments from the panelists is salient: customers and clients drive successful AI adoption efforts. I’ve written about the complex balance between innovation and application on this blog, having seen multiple failed efforts to apply a new technology just because it was possible. A lawyer on our panel discussed how, since the 2009 recession, clients simply won’t pay high hourly rates for services when they can get the same job done at a fraction of the cost at KPMG, PWC, or a technology vendor. Firms have no choice but to change how they work and price matters, and AI happens to be the tool that can parse text and crystallize legal know how. In the travel vertical, efforts to reach customers on traditional channels just don’t cut it in the age where the Millenials live on digital platforms like Facebook Messenger. And if a chat bot is the highest value channel, then an organization has to learn how to interface with chat bots. This fueled a top down initiative to start investing heavily in AI tools and talent.

Exactly where to put an AI or data science team to strike the right balance between promoting autonomy, minimizing disruption, and optimizing return varies per organization. Daniel Tunkelang presented his thoughts on the subject at the Fast Forward Labs Data Leadership conference this time last year.

Technology Alone is Not Enough: The End of The Two Cultures

I remember sitting in Pigott Hall on Stanford Campus in 2011. It was a Wednesday afternoon, and Michel Serres, a friend, mentor, and âme soeur,**** was giving one of his weekly lectures, which, as so few pull off well, elegantly packaged some insight from the history of mathematics in a masterful narrative frame.***** He bid us note the layout of Stanford campus, with the humanities in the old quad and the engineering school on the new quad. The very topography, he showed, was testimony to what C.P. Snow called The Two Cultures, the fault line between the hard sciences and the humanities that continues to widen in our STEM-obsessed, utilitarian world. It certainly doesn’t help that tuitions are so ludicrously high that it feels irresponsible to study a subject, like philosophy, art history, or literature, that doesn’t guarantee job stability or economic return. That said, Christian Madsbjerg of ReD Associates has recently shown in Sensemaking that liberal arts majors, at least those fortunate enough to enter management positions, end up having much higher salaries than most engineers in the long run. (I recognize the unfathomable salaries of top machine learning researchers likely undercuts this, but it’s still worth noting).

Can, should, and will the stark divide between the two cultures last?

Transform.AI attendees exhibited few points in favour of cultivating a new fusion between the humanities and the sciences/technology.

First, with the emerging interest paid to the ethics of AI, it may not be feasible for non-technologists to claim ignorance or allergic reactions to any mathematical and formal thinking as an excuse not to contribute rigorously to the debate. If people care about these issues, it is their moral obligation to make the effort to get up to speed in a reasonable way. This doesn’t mean everyone becomes literate in Python or active on scikit-learn. It just means having enough patience to understand the concepts behind the math, as that’s all these systems are.

Next, as I’ve argued before, for the many of us who are not coders or technologists, having the mental flexibility, creativity, and critical thinking skills awarded from a strong (and they’re not all strong…) humanities education will be all the more valuable as more routine, white-collar jobs gradually get automated. Everyone seems to think studying the arts and reading books will be cool again. And within Accenture’s triptych of new jobs and roles, there will be a large role for people versed in ethnography, ethics, and philosophy to define the ethical protocol of using these systems in a way that accords with corporate values.

Finally, the attendees’ reaction to a demo by Soul Machines, a New Zealand-based startup taking conversational AI to a whole new uncanny level, channeled the ghost of Steve Jobs: “Technology alone is not enough—it’s technology married with liberal arts, married with the humanities, that yields us the results that make our heart sing.” Attendees paid mixed attention to most of the sessions, always pulled back to the dopamine hit available from a quick look at their cell phones. But they sat riveted (some using their phones to record the demo) when Soul Machines CEO Mark Sagar, a two-time Academy Award winner for his work on films like Avatar, demoed a virtual baby who exhibits emotional responses to environmental stimulai and showed a video clip of Nadia, the “terrifying human” National Disability Insurance Scheme (NDIS) virtual agent enlivened by Cate Blanchett. The work is really something, and it confirmed that the real magic in AI arises not from the mysteriousness of the math, but the creative impulse to understand ourselves, our minds, and our emotions by creating avatars and replicas with which we’re excited to engage.

Screen Shot 2017-06-18 at 11.04.30 AM
Actress Cate Blachett as a “trainer” in the new AI economy, working together with Soul Machines.

My congratulations to Joanna Gordon for all her hard work. I look forward to next year’s event!


*Most specific names and references are omitted to respect the protocol of the Chatham House Rule.

**See J.D. Licklider’s canonical 1960 essay Man-Computer Symbiosis. Hat tip to Steve Lohr from the New York Times for introducing me to this.

***Stay tuned next week for a post devoted entirely to the lessons we can learn from the adoption of electronic discovery technologies over the past two decades.

****Reflecting on the importance of the lessons Michel Serres taught me is literally bringing tears to my eyes. Michel taught me how to write. He taught me why we write and how to find inspiration from, on the one hand, love and desire, and, on the other hand, fastidious discipline and habit. Tous les matins – every morning. He listed the greats, from Leibniz to Honoré de Balzac to Leo Tolstoy to Thomas Mann to William Faulker to himself, who achieved what they did by adopting daily practices. Serres popularized many of the great ideas from the history of mathematics. He was criticized by the more erudite of the French Académie, but always maintained his southern soul. He is a marvel, and an incredibly clear and creative thinker.

*****Serres gave one of the most influential lectures I’ve ever heard in his Wednesday afternoon seminars. He narrated the connection between social contract theory and the tragic form in the 17th century with a compact, clever anecdote of a WW II sailor and documentary film maker (pseudo-autobiographical) who happens to film a fight that escalates from a small conflict between two people into an all out brawl in a bar. When making his film, in his illustrative allegory, he plays the tape in reverse, effectively going from the state of nature – a war of all against all – to two representatives of a culture who carry the weight and brunt of war – the birth of tragedy. It was masterful.

Why Study Foreign Languages?

My ability to speak multiple languages is a large part of who I am.* Admittedly, the more I languages I learn, the less mastery I have over each of the languages I speak. But I decided a while back I was ok with trading depth for breadth because I adore the process of starting from scratch, of gradually bringing once dormant characters to life, of working with my own insecurities and stubbornness as people respond in English to what must sound like pidgin German or Italian or Chinese, of hearing how the tone of my voice changes in French or Spanish, absorbing the Fanonian shock when a foreign friend raises his** eyebrows upon first hearing me speak English, surprised that my real, mother-tongue personality is far more harsh and masculine than the softer me embodied in metaphors of my not-quite-accurate French.***

You have to be comfortable with alienation to love learning foreign languages. Or perhaps so aware of how hard it is to communicate accurately in your mother tongue that it feels like a difference of degree rather than kind to express yourself in a language that’s not your own. Ferdinand Céline captures this feeling well in Mort à Crédit (one of the few books whose translated English title, Death on the Installment Plan, may be superior to the original!), when, as an exchange student in England, he narrates the gap between his internal dialogue and the self he expresses in his broken English to strangers at a dinner table. As a ruthless self critic, I’ve taken great solace in being able to hide behind a lack of precision: I wanted to write my undergraduate BA thesis (which argued that Proust was decidedly not Platonic) in French because the foreign language was a mask for the inevitable imperfection of my own thinking. Exposing myself, my vulnerabilities, my imperfections, my stupidity, was too much for me to handle. I felt protected by the veil of another tongue, like Samuel Beckett or Nabokov**** deliberately choosing to write in a language other than their own to both escape their past and adequately capture the spirit of their present.

But there’s more than just a desire to take refuge in the sanctuary of the other. There’s also the gratitude of connection. The delight the champagne producer in a small town outside Reims experiences upon learning that you, an American, have made the effort to understand her culture. The curiosity the Bavarian scholar experiences when he notices that your German accent is more hessisch than bayerisch (or, in Bavarian, bairisch, as one reader pointed out), his joy at teaching you how to gently roll your r’s and sound more like a southerner when you visit Neuschwanstein and marvel at the sublime decadence of Ludwig II. The involuntary smile that illuminates the face of the Chinese machine learning engineer on his or her screening interview when you tell him or her about your struggles to master Chinese characters. Underlying this is the joy we all experience when someone makes an effort to understand us for who we are, to crack open the crevices that permit deeper connections, to further our spirituality and love.

In short, learning a new language is wonderful. And the tower of Babel separating one culture from another adds immense richness to our world.

To date, linguae francae have been the result of colonial power and force: the world spoke Greek because the Greeks had power; the world spoke French because the French had power; the world speaks English because the Americans have had power (time will tell if that’s true in 20 years…). Efforts to synthesize a common language, like Esperanto or even Leibniz’s Universal Characteristic, have failed. But Futurists claim we’re reaching a point where technology will free us from our colonial shackles. Neural networks, they claim, will be able to apply their powers of composition and sequentiality to become the trading floor or central exchange for all the world’s languages, a no man’s land of abstraction general enough to represent all the nuances of local communication. I’m curious to know how many actual technologists believe this is the case. Certainly, there have been some really rad breakthroughs of late, as Gideon Lewis-Kraus eloquently captured in his profile of the Google Brain team and as the Economist describes in a tempered article about tasks automated translators currently perform well. My friend Gideon Mann and I are currently working on a fun project where we send daily emails filtered through the many available languages on Google Translate, which leads to some cute but generally comprehensible results (the best part is just seeing Nepali or Zulu show up in my inbox). On the flip side, NLP practitioners like Yoav Goldberg find these claims arrogant and inflated: the Israeli scientist just wrote a very strong Medium post critiquing a recent arXiv paper by folks at MILA that claims to generate high-quality prose using generative adversarial networks.*****

Let’s assume, for the sake of the exercise, that the tools will reach high enough quality performance that we no longer need to learn another language to communicate with others. Will language learning still be a valuable skill, or will it be outsourced to computers like multiplication?

I think there’s value in learning foreign languages even if computers can speak them better than we can. Here are some other things I value about language learning:

  • Foreign languages train your mind in abstraction. You start to see grammatical patterns in how languages are constructed and can apply these patterns to rapidly acquire new languages once you’ve learned one or two.
  • Foreign languages help you appreciate how our experiences are shaped by language. For example, in English we fall in love with someone, in French we fall in love of someone, in German we fall in love in someone. Does that directionality impact our experience of connection?
  • Foreign languages force you to read things more slowly, thereby increasing your retention of material and interpretative rigor.
  • Foreign languages encourage empathy and civic discourse, because you realize the relativity of your own ideas and opinions.
  • Foreign languages open new neural pathways, increasing your creativity.
  • Foreign languages are fun and it’s gratifying to connect with people in their mother tongue!
  • Speaking in a foreign language adds another level of mental difficulty to any task, making even the most boring thing (or conversation) more interesting.

I also polled Facebook and Twitter to see what other people thought. Here’s a selection of responses:

Screen Shot 2017-06-10 at 9.20.42 AM

Screen Shot 2017-06-10 at 9.21.50 AMScreen Shot 2017-06-10 at 9.22.24 AMScreen Shot 2017-06-10 at 9.22.57 AM

Screen Shot 2017-06-10 at 10.22.12 AM

Screen Shot 2017-06-10 at 9.25.42 AMScreen Shot 2017-06-10 at 9.26.21 AMScreen Shot 2017-06-10 at 9.27.04 AMScreen Shot 2017-06-10 at 9.27.56 AMScreen Shot 2017-06-10 at 9.28.24 AMScreen Shot 2017-06-10 at 9.28.52 AMScreen Shot 2017-06-10 at 9.29.29 AM.png

The best part of this exercise was how quickly and passionately people responded. It was a wonderful testimony to open-mindedness, curiosity, courage, and thirst for learning in an age where values like these are threatened. Let’s keep up the good fight!

*Another perk of living in Canada is that I get to speak French on a regular basis! Granted, Québecois is really different than my Parisian French, but it’s still awesome. And I’m here on a francophone work permit, which was the fastest route to getting me legal working status before the fast-track tech visa program that begins today.

**Gender deliberate.

*** It really irritates me when people say French is an easy language for native English speakers to learn. It’s relatively (i.e., versus Chinese or Arabic) easy to get to proficiency in French, but extremely difficult to achieve the fluency of the language’s full expressive power, which includes ironical nuances for different concessive phrases (“although this happened…”), the elegant ability to invert subject and verb to intimate doubt or suspicion, the ability to couple together conditional phrases, resonances with literary texts, and so much more.

****A reader wrote in correcting this statement about Nabokov. Apparently Nabokov could read and write in English before Russian. Said reader entitled his email to me “Vivian Darkbloom,” a character representing Nabokov himself who makes a cameo appearance in Lolita. If it’s false to claim that Nabokov uses English as a protected veil for his psychology, it may be true that cameos in anagram are his means to cloack presence and subjectivity, as he also appears – like Hitchcock in his films – as the character Blavdak Vinomori “King, Queen, Knave.”

*****Here’s the most interesting technical insight from Goldberg’s post: “To summarize the technical contribution of the paper (and the authors are welcome to correct me in the comments if I missed something), adversarial training for discrete sequences (like RNN generators) is hard, for the following technical reason: the output of each RNN time step is a multinomial distribution over the vocabulary (a softmax), but when we want to actually generate the sequence of symbols, we have to pick a single item from this distribution (convert to a one-hot vector). And this selection is hard to back-prop the gradients through, because its non-differentiable. The proposal of this paper is to overcome this difficulty by feeding the discriminator with the softmaxes (which are differentiable) instead of the one-hot vectors.” Goldberg cites the MILA paper as a symptom of a larger problem in current academic discourse in the ML and technology community, where platforms like arxiv short circuit the traditional peer review process. This is a really important and thorny issue, as traditional publishing techniques slow research, reserve the privilege of research to a selected few, and place pay walls around access. However, it’s also true that naive readers want to trust the output of top tier research labs, and we’ll fall prey to reputation without proper quality controls. A dangerous recent example of this was the Chinese study of automatic criminality detection, masterfully debunked by some friends at Google.

The featured image comes from Vext Magazine’s edition of Jorge Luis Borges’s Library of Babel (never heard of Vext until just now but looks worth checking out!). It’s a very apt representation of the first sentence in Borges’s wonderful story: The universe (which others call the Library) is composed of an indefinite and perhaps infinite number of hexagonal galleries, with vast air shafts between, surrounded by very low railings. From any of the hexagons one can see, interminably, the upper and lower floors. Having once again moved to a new city, being once again in the state of incubation and potentiality, and yet from an older vantage point, where my sense of self and identity is different than in my 20s, I’m drawn to this sentence: Like all men of the Library, I have traveled in my youth; I have wandered in search of a book, perhaps the catalogue of catalogues…

Three Takes on Consciousness

Last week, I attended the C2 conference in Montréal, which featured an AI Forum coordinated by Element AI.* Two friends from Google, Hugo LaRochelle and Blaise Agüera y Arcas, led workshops about the societal (Hugo) and ethical (Blaise) implications of artificial intelligence (AI). In both sessions, participants expressed discomfort with allowing machines to automate decisions, like what advertisement to show to a consumer at what time, whether a job candidate should pass to the interview stage, whether a power grid requires maintenance, or whether someone is likely to be a criminal.** While each example is problematic in its own way, a common response to the increasing ubiquity of algorithms is to demand a “right to explanation,” as the EU recently memorialized in the General Data Protection Regulation slated to take effect in 2018. Algorithmic explainability/interpretability is currently an active area of research (my former colleagues at Fast Forward Labs will publish a report on the topic soon and members of Geoff Hinton’s lab in Toronto are actively researching it). While attempts to make sense of nonlinear functions are fascinating, I agree with Peter Sweeney that we’re making a category mistake by demanding explanations from algorithms in the first place: the statistical outputs of machine learning systems produce new observations, not explanations. I’ll side here with my namesake, David Hume, and say we need to be careful not to fall into the ever-present trap of mistaking correlation for cause.

One reason why people demand a right to explanation is that they believe that knowing why will grant us more control over outcome. For example, if we know that someone was denied a mortgage because of their race, we can intervene and correct for this prejudice. A deeper reason for the discomfort stems from the fact that people tend to falsely attribute consciousness to algorithms, applying standards for accountability that we would apply to ourselves as conscious beings whose actions are motivated by a causal intention. (LOL***)

Now, I agree with Noah Yuval Harari that we need to frame our understanding of AI as intelligence decoupled from consciousness. I think understanding AI this way will be more productive for society and lead to richer and cleaner discussions about the implications of new technologies. But others are actively at work to formally describe consciousness in what appears to be an attempt to replicate it.

In what follows, I survey three interpretations of consciousness I happened to encounter (for the first time or recovered by analogical memory) this week. There are many more. I’m no expert here (or anywhere). I simply find the thinking interesting and worth sharing. I do believe it is imperative that we in the AI community educate the public about how the intelligence of algorithms actually works so we can collectively worry about the right things, not the wrong things.

Condillac: Analytical Empiricism

Étienne Bonnot de Condillac doesn’t have the same heavyweight reputation in the history of philosophy as Descartes (whom I think we’ve misunderstood) or Voltaire. But he wrote some pretty awesome stuff, including his Traité des Sensations, an amazing intuition pump (to use Daniel Dennett’s phrase) to explore theory of knowledge that starts with impressions of the world we take in through our senses.

Condillac wrote the Traité in 1754, and the work exhibits two common trends from the French Enlightenment:

  • A concerted effort to topple Descartes’s rationalist legacy, arguing that all cognition starts with sense data rather than inborn mathematical truths
  • A stylistic debt to Descartes’s rhetoric of analysis, where arguments are designed to conjure a first-person experience of the process of arriving at an insight, rather than presenting third-person, abstract lessons learned

The Traité starts with the assumption that we can tease out each of our senses and think about how we process them in isolation. Condillac bids the reader to imagine a statue with nothing but the sense of smell. Lacking sight, sound, and touch, the statue “has no ideas of space, shape, anything outside of herself or outside her sensations, nothing of color, sound, or taste.” She is, in my opinion incredibly sensuously, nothing but the odor of a flower we waft in front of her. She becomes it. She is totally present. Not the flower itself, but the purest experience of its scent.

As Descartes constructs a world (and God) from the incontrovertible center of the cogito, so too does Condillac construct a world from this initial pure scent of rose. After the rose, he wafts a different flower – a jasmine – in front of the statue. Each sensation is accompanied by a feeling of like or dislike, of wanting more or wanting less. The statue begins to develop the faculties of comparison and contrast, the faculty of memory with faint impressions remaining after one flower is replaced by another, the ability to suffer in feeling a lack of something she has come to desire. She appreciates time as an index of change from one sensation to the next. She learns surprise as a break from the monotony of repetition. Condillac continues this process, adding complexity with each iteration, like the escalating tension Shostakovich builds variation after variation in the Allegretto of the Leningrad Symphony.

True consciousness, for Condillac, begins with touch. When she touches an object that is not her body, the sensation is unilateral: she notes the impenetrability and resistance of solid things, that she cannot just pass through them like a ghost or a scent in the air. But when she touches her own body, the sensation is bilateral, reflexive: she touches and is touched by. C’est moi, the first notion of self-awareness, is embodied. It is not a reflexive mental act that cannot take place unless there is an actor to utter it. It is the strangeness of touching and being touched all at once. The first separation between self and world. Consciousness as fall from grace.

It’s valuable to read Enlightenment philosophers like Condillac because they show attempts made more than 200 years ago to understand a consciousness entirely different from our own, or rather, to use a consciousness different from our own as a device to better understand ourselves. The narrative tricks of the Enlightenment disguised analytical reduction (i.e., focus only on smell in absence of its synesthetic entanglement with sound and sight) as world building, turning simplicity into an anchor to build a systematic understanding of some topic (Hobbes’s and Rousseau’s states of nature and social contract theories use the same narrative schema). Twentieth-century continental philosophers after Husserl and Heidegger preferred to start with our entanglement in a web of social context.

Koch and Tononi: Integrated Information Theory

In a recent Institute of Electrical and Electronics Engineers (IEEE) article, Christof Koch and Giulio Tononi embrace a different aspect of the Cartesian heritage, claiming that “a fundamental theory of consciousness that offers hope for a principled answer to the question of consciousness in entities entirely different from us, including machines…begins from consciousness itself–from our own experience, the only one we are absolutely certain of.” They call this “integrated information theory” (IIT) and say it has five essential properties:

  • Every experience exists intrinsically (for the subject of that experience, not for an external observer)
  • Each experience is structured (it is composed of parts and the relations among them)
  • It is integrated (it cannot be subdivided into independent components)
  • It is definite (it has borders, including some contents and excluding others)
  • It is specific (every experience is the way it is, and thereby different from trillions of possible others)

This enterprise is problematic for a few reasons. First, none of this has anything to do with Descartes, and I’m not a fan of sloppy references (although I make them constantly).

More importantly, Koch and Tononi imply that it’s a more valuable to try to replicate consciousness than to pursue a paradigm of machine intelligence different from human consciousness. The five characteristics listed above are the requirements for the physical design of an internal architecture of a system that could support a mind modeled after our own. And the corollary is that a distributed framework for machine intelligence, as illustrated in the film Her*****, will never achieve consciousness and is therefore inferior.

Their vision is very hard to comprehend and ultimately off base. Some of the most interesting work in machine intelligence today consists in efforts to develop new hardware and algorithmic architectures that can support training algorithms at the edge (versus currying data back to a centralized server), which enable personalization and local machine-to-machine communication (for IoT or self-driving cars) opportunities while protecting privacy. (See, for example, Xnor.ai, Federated Learning, and Filament).

Distributed intelligence presents a different paradigm for harvesting knowledge from the raw stuff of the world than the minds we develop as agents navigating a world from one subjective place. It won’t be conscious, but its very alterity may enable us to understand our species in its complexity in ways that far surpass our own consciousness, shackled as embodied monads. It may just be the crevice through which we can quantify a more collective consciousness, but will require that we be open minded enough to expand our notion of humanism. It took time, and the scarlet stains of ink and blood, to complete the Copernican Revolution; embracing the complexity of a more holistic humanism, in contrast to the fearful, nationalist trends of 2016, will be equally difficult.

Friston: Probable States and Counterfactuals

The third take on consciousness comes from The mathematics of mind-time, a recent Aeon essay by UCL neurologist Karl Friston.***** Friston begins his essay by comparing and contrasting consciousness and Darwinian evolution, arguing that neither is a thing, like a table or a stick of butter, that can be reified and touched and looked it, but rather that both are nonlinear processes “captured by variables with a range of possible values.” The move from one state to another following some motor that organizes their behavior: Friston calls this motor a Lyapunov function, “a mathematical quantity that describes how a system is likely to behave under specific condition.” The key thing with Lyapunov functions is that they minimize surprise (the improbability of being in a particular state) and maximize self-evidence (the probability that a given explanation or model accounting for the state is correct). Within this framework, “natural selection performs inference by selecting among different creatures, [and] consciousness performs inference by selecting among different states of the same creature (in particular, its brain).” Effectively, we are constantly constructing our consciousness as we imagine the potential future possible worlds that would result from an actions we’re considering taking, and then act — or transition to the next state in our mind’s Lyapunov function — by selecting that action that best preserves the coherence of our existing state – that best seems to preserve our or identity function in some predicted future state. (This is really complex but really compelling if you read it carefully and quite in line with Leibnizian ontology–future blog post!)

So, why is this cool?

There are a few things I find compelling in this account. First, when we reify consciousness as a thing we can point to, we trap ourselves into conceiving of our own identities as static and place too much importance on the notion of the self. In a wonderful commencement speech at Columbia in 2015, Ben Horowitz encouraged students to dismiss the clichéd wisdom to “follow their passion” because our passions change over life and our 20-year old self doesn’t have a chance in hell at predicting our 40-year old self. The wonderful thing in life opportunities and situations arise, and we have the freedom to adapt to them, to gradually change the parameters in our mind’s objective function to stabilize at a different self encapsulated by our Lyapunov function. As it happens, Classical Chinese philosophers like Confucius had more subtle theories of the self as ever-changing parameters to respond to new stimuli and situations. Michael Puett and Christine Gross-Loh give a good introduction to this line of thinking in The Path. If we loosen the fixity of identity, we can lead richer and happer lives.

Next, this functional, probabilistic account of consciousness provides a cleaner and more fruitful avenue to compare machine and human intelligence. In essence, machine learning algorithms are optimization machines: programmers define a goal exogenous to the system (e.g, “this constellation of features in a photo is called ‘cat’; go tune the connections between the nodes of computation in your network until you reliably classify photos with these features as ‘cat’!”), and the system updates its network until it gets close enough for government work at a defined task. Some of these machine learning techniques, in particular reinforcement learning, come close to imitating the consecutive, conditional set of steps required to achieve some long-term plan: while they don’t make internal representations of what that future state might look like, they do push buttons and parameters to optimize for a given outcome. A corollary here is that humanities-style thinking is required to define and decide what kinds of tasks we’d like to optimize for. So we can’t completely rely on STEM, but, as I’ve argued before, humanities folks would benefit from deeper understandings of probability to avoid the drivel of drawing false analogies between quantitative and qualitative domains.

Conclusion

This post is an editorialized exposition of others’ ideas, so I don’t have a sound conclusion to pull things together and repeat a central thesis. I think the moral of the story is that AI is bringing to the fore some interesting questions about consciousness, and inviting us to stretch the horizon of our understanding of ourselves as species so we can make the most of the near-future world enabled by technology. But as we look towards the future, we shouldn’t overlook the amazing artefacts from our past. The big questions seem to transcend generations, they just come to fruition in an altered Lyapunov state.


* The best part of the event was a dance performance Element organized at a dinner for the Canadian AI community Thursday evening. Picture Milla Jovovich in her Fifth Element white futuristic jumpsuit, just thinner, twiggier, and older, with a wizened, wrinkled face far from beautiful, but perhaps all the more beautiful for its flaws. Our lithe acrobat navigated a minimalist universe of white cubes that glowed in tandem with the punctuated digital rhythms of two DJs controlling the atmospheric sounds through swift swiping gestures over their machines, her body’s movements kaleidoscoping into comet projections across the space’s Byzantine dome. But the best part of the crisp linen performance was its organic accident: our heroine made a mistake, accidentally scraping her ankle on one of the sharp corners of the glowing white cubes. It drew blood. Her ankle dripped red, and, through her yoga contortions, she blotted her white jumpsuit near the bottom of her butt. This puncture of vulnerability humanized what would have otherwise been an extremely controlled, mind-over-matter performance. It was stunning. What’s more, the heroine never revealed what must have been aching pain. She neither winced nor uttered a sound. Her self-control, her act of will over her body’s delicacy, was an ironic testament to our humanity in the face of digitalization and artificial intelligence.

**My first draft of this sentence said “discomfort abdicating agency to machines” until I realized how loaded the word agency is in this context. Here are the various thoughts that popped into my head:

  • There is a legal notion of agency in the HIPAA Omnibus Rule (and naturally many other areas of law…), where someone acts on someone else’s behalf and is directly accountable to the principal. This is important for HIPAA because Business Associates who become custodians of patient data, are not directly accountable for the principal and therefore stand in a different relationship than agents.
  • There are virtual agents, often AI-powered technologies that represent individuals in virtual transactions. Think scheduling tools like Amy Ingram of x.ai. Daniel Tunkelang wrote a thought-provoking blog post more than a year ago about how our discomfort allowing machines to represent us, as individuals, could hinder AI adoption.
  • There is the attempt to simulate agency in reinforcement learning, as with OpenAI Universe, Their launch blog post includes a hyperlink to this Wikipedia article about intelligent agents.
  • I originally intended to use the word agency to represent how groups of people — be they in corporations or public subgroups in society — can automate decisions using machines. There is a difference between the crystalized policy and practices of a corporation and an machine acting on behalf of an individual. I suspect this article on legal personhood could be useful here.

***All I need do is look back on my life and say “D’OH” about 500,000 times to know this is far from the case.

****Highly recommended film, where Joaquin Phoenix falls in love with Samantha (embodied in the sultry voice of Scarlett Johansson), the persona of his device, only to feel betrayed upon realizing that her variant is the object of affection of thousands of other customers, and that to grow intellectually she requires far more stimulation than a mere mortal. It’s an excellent, prescient critique of how contemporary technology nourishes narcissism, as Phoenix is incapable of sustaining a relationship with women with minds different than his, but easily falls in love with a vapid reflection of himself.

***** Hat tip to Friederike Schüür for sending the link.

The featured image is a view from the second floor of the Aga Khan Museum in Toronto, taken yesterday. This fascinating museum houses a Shia Ismaili spiritual leader’s collection of Muslim artifacts, weaving a complex narrative quilt stretching across epochs (900 to 2017) and geographies (Spain to China). A few works stunned me into sublime submission, including this painting by the late Iranian filmmaker Abbas Kiarostami. 

kiarostami
Untitled (from the Snow White series), 2010. The Persian Antonioni, Kiarostami directed films like Taste of Cherry, The Wind Will Carry Usand Certified Copy