Artificial Intelligence and the Fall of Eve

We seem to need foundational narratives.

Big picture stories that make sense of history’s bacchanal march into the apocalypse.

Broad-stroke predictions about how artificial intelligence (AI) will shape the future of humanity made by those with power arising from knowledge, money, and/or social capital.[1] Knowledge, as there still aren’t actually that many real-deal machine learning researchers in the world (despite the startling growth in paper submissions to conferences like NIPS), people who get excited by linear algebra in high-dimension spaces (the backbone of deep learning) or the patient cataloguing of assumptions required to justify a jump from observation to inference.[2] Money, as income inequality is a very real thing (and a thing too complex to say anything meaningful about in this post). For our purposes, money is a rhetoric amplifier, be that from a naive fetishism of meritocracy, where we mistakenly align wealth with the ability to figure things out better than the rest of us,[3] or cynical acceptance of the fact that rich people work in private organizations or public institutions with a scope that impacts a lot of people. Social capital, as our contemporary Delphic oracles spread wisdom through social networks, likes and retweets governing what we see and influencing how we see (if many people, in particular those we want to think like and be like, like something, we’ll want to like it too), our critical faculties on amphetamines as thoughtful consideration and deliberation means missing the boat, gut invective the only response fast enough to keep pace before the opportunity to get a few more followers passes us by, Delphi sprouting boredom like a 5 o’clock shadow, already on to the next big thing. Ironic that big picture narratives must be made so hastily in the rat race to win mindshare before another member of the Trump administration gets fired.

Most foundational narratives about the future of AI rest upon an implicit hierarchy of being that has been around for a long time. While proffered by futurists and atheists,  the hierarchy dates back to the Great Chain of Being that medieval Christian theologists like Thomas Aquinas built to cut the physical and spiritual world into analytical pieces, applying Aristotelian scientific rigor to the spiritual topics.

Screen Shot 2018-06-03 at 10.43.22 AM
Aquinas’ hierarchy of being on a blog by a fellow named David Haines I know nothing about but that seems to be about philosophy and religion.

The hierarchy provides a scale from inanimate matter to immaterial, pure intelligence. Rocks don’t get much love on the great chain of being, even if they carry the wisdom and resilience of millions of years of existence, contain, in their sifting shifting grain of sands, the secrets of fragility and the whispered traces of tectonic plates and sunken shores. Plants get a little more love than rocks, and apparently Venus fly traps (plants that resemble animals?) get more love than, say, yeast (if you’re a fellow member of the microbiome-issue club, you like me are in total awe of how yeast are opportunistic sons of bitches who sense the slightest shift in pH and invade vulnerable tissue with the collective force of stealth guerrilla warriors). Humans are hybrids, half animal, half rational spirit, our sordid materiality, our silly mortality, our mechanical bodies ever weighting us down and holding us back from our real potential as brains in vats or consciousnesses encoded to live forever in the flitting electrons of the digital universe. There are a shit ton of angels. Way more angel castes than people castes. It feels repugnant to demarcate people into classes, so why not project differences we live day in and day out in social interactions onto angels instead? And, in doing so, basically situate civilized aristocrats as closer to God than the lower and more animalistic members of the human race? And then God is the abstract patriarch on top of it all, the omnipotent, omniscient, benevolent patriarch who is also the seat of all our logical paradoxes, made of the same stuff as Gödel’s incompleteness theorem, the guy who can be at once father and son, be the circle with the center everywhere and the circumference nowhere, the master narrator who says, don’t worry, I got this, sure that hurricane killed tons of people, sure it seems strange that you can just walk into a store around the corner a buy a gun and there are mass shootings all the time, but trust me, if you could see the big picture like I see the big picture, you’d get how this confusing pain will actually result in the greatest good to the most people.

IMG_4395
Sandstone in southern Utah, the momentary, coincidental dance of wind and grain petrified into this shape at this moment in time. I’m sure it’s already somewhat different.

I’m going to be sloppy here and not provide hyperlinks to specific podcasts or articles that endorse variations of this hierarchy of being: hopefully you’ve read a lot of these and will have sparks of recognition with my broad stroke picture painting.[4] But what I see time and again are narratives that depict AI within a long history of evolution moving from unicellular prokaryotes to eukaryotes to slime to plants to animals to chimps to homo erectus to homo sapiens to transhuman superintelligence as our technology changes ever more quickly and we have a parallel data world where leave traces of every activity in sensors and clicks and words and recordings and images and all the things. These big picture narratives focus on the pre-frontal cortex as the crowning achievement of evolution, man distinguished from everything else by his ability to reason, to plan, to overcome the rugged tug of instinct and delay gratification until the future, to make guesses about the probability that something might come to pass in the future and to act in alignment with those guesses to optimize rewards, often rewards focused on self gain and sometimes on good across a community (with variations). And the big thing in this moment of evolution with AI is that things are folding in on themselves, we no longer need to explicitly program tools to do things, we just store all of human history and knowledge on the internet and allow optimization machines to optimize, reconfiguring data into information and insight and action and getting feedback on these actions from the world according to the parameters and structure of some defined task. And some people (e.g., Gary Marcus or Judea Pearl) say no, no, these bottom up stats are not enough, we are forgetting what is actually the real hallmark of our pre-frontal cortex, our ability to infer causal relationships between phenomena A and phenomena B, and it is through this appreciation of explanation and cause that we can intervene and shape the world to our ends or even fix injustices, free ourselves from the messy social structures of the past and open up the ability to exercise normative agency together in the future (I’m actually in favor of this kind of thinking). So we evolve, evolve, make our evolution faster with our technology, cut our genes crisply and engineer ourselves to be smarter. And we transcend the limitations of bodies trapped in time, transcend death, become angel as our consciousness is stored in the quick complexity of hardware finally able to capture plastic parallel processes like brains. And inch one step further towards godliness, ascending the hierarchy of being. Freeing ourselves. Expanding. Conquering the march of history, conquering death with blood transfusions from beautiful boys, like vampires. Optimizing every single action to control our future fate, living our lives with the elegance of machines.

It’s an old story.

Many science fiction novels feel as epic as Disney movies because they adapt the narrative scaffold of traditional epics dating back to Homer’s Iliad and Odyssey and Virgil’s Aeneid. And one epic quite relevant for this type of big picture narrative about AI is John Milton’s Paradise Lost, the epic to end all epics, the swan song that signaled the shift to the novel, the fusion of Genesis and Rome, an encyclopedia of seventeenth-century scientific thought and political critique as the British monarchy collapsed under  the rushing sword of Oliver Cromwell.

Most relevant is how Milton depicts the fall of Eve.

Milton lays the groundwork for Eve’s fall in Book Five, when the archangel Raphael visits his friend Adam to tell him about the structure of the universe. Raphael has read his Aquinas: like proponents of superintelligence, he endorses the great chain of being. Here’s his response to Adam when the “Patriarch of mankind” offers the angel mere human food:

Adam, one Almightie is, from whom
All things proceed, and up to him return,
If not deprav’d from good, created all
Such to perfection, one first matter all,
Indu’d with various forms various degrees
Of substance, and in things that live, of life;
But more refin’d, more spiritous, and pure,
As neerer to him plac’t or neerer tending
Each in thir several active Sphears assignd,
Till body up to spirit work, in bounds
Proportiond to each kind.  So from the root
Springs lighter the green stalk, from thence the leaves
More aerie, last the bright consummate floure
Spirits odorous breathes: flours and thir fruit
Mans nourishment, by gradual scale sublim’d
To vital Spirits aspire, to animal,
To intellectual, give both life and sense,
Fansie and understanding, whence the Soule
Reason receives, and reason is her being,
Discursive, or Intuitive; discourse
Is oftest yours, the latter most is ours,
Differing but in degree, of kind the same.

Raphael basically charts the great chain of being in the passage. Angels think faster than people, they reason in intuitions while we have to break things down analytically to have any hope of communicating with one another and collaborating. Daniel Kahnemann’s partition between discursive and intuitive thought in Thinking, Fast and Slow had an analogue in the seventeenth century, where philosophers distinguished the slow, composite, discursive knowledge available in geometry and math proofs from the fast, intuitive, social insights that enabled some to size up a room and be the wittiest guest at a cocktail party.

Raphael explains to Adam that, through patient, diligent reasoning and exploration, he and Eve will come to be more like angels, gradually scaling the hierarchy of being to ennoble themselves. But on the condition that they follow the one commandment never to eat the fruit from the forbidden tree, a rule that escapes reason, that is a dictum intended to remain unexplained, a test of obedience.

But Eve is more curious than that and Satan uses her curiosity to his advantage. In Book Nine, Milton fashions Satan in his trappings as snake as a master orator who preys upon Eve’s curiosity to persuade her to eat of the forbidden fruit. After failing to exploit her vanity, he changes strategies and exploits her desire for knowledge, basing his argument on an analogy up the great chain of being:

O Sacred, Wise, and Wisdom-giving Plant,
Mother of Science, Now I feel thy Power
Within me cleere, not onely to discerne
Things in thir Causes, but to trace the wayes
Of highest Agents, deemd however wise.
Queen of this Universe, doe not believe
Those rigid threats of Death; ye shall not Die:
How should ye? by the Fruit? it gives you Life
To Knowledge? By the Threatner, look on mee,
Mee who have touch’d and tasted, yet both live,
And life more perfet have attaind then Fate
Meant mee, by ventring higher then my Lot.
That ye should be as Gods, since I as Man,
Internal Man, is but proportion meet,
I of brute human, yee of human Gods.
So ye shall die perhaps, by putting off
Human, to put on Gods, death to be wisht,
Though threat’nd, which no worse then this can bring.

 

Satan exploits Eve’s mental model of the great chain of being to tempt her to eat the forbidden apple. Mere animals, snakes can’t talk. A talking snake, therefore, must have done something to cheat the great chain of being, to elevate itself to the status of man. So too, argues Satan, can Eve shortcut her growth from man to angel by eating the forbidden fruit. The fall of mankind rests upon our propensity to rely on analogy. May the defenders of causal inference rejoice.[5]

The point is that we’ve had a complex relationship with our own rationality for a long time. That Judeo-Christian thought has a particular way of personifying the artifacts and precipitates of abstract thoughts into moral systems. That, since the scientific revolution, science and religion have split from one another but continue to cross paths, if only because they both rest, as Carlo Rovelli so beautifully expounds in his lyrical prose, on our wonder, on our drive to go beyond the immediately visible, on our desire to understand the world, on our need for connection, community, and love.

But do we want to limit our imaginations to such a stale hierarchy of being? Why not be bolder and more futuristic? Why not forget gods and angels and, instead, recognize these abstract precipitates as the byproducts of cognition? Why not open our imaginations to appreciate the radically different intelligence of plants and rocks, the mysterious capabilities of photosynthesis that can make matter from sun and water (WTF?!?), the communication that occurs in the deep roots of trees, the eyesight that octopuses have all down their arms, the silent and chameleon wisdom of the slit canyons in the southwest? Why not challenge ourselves to greater empathy, to the unique beauty available to beings who die, capsized by senescence and always inclining forward in time?

IMG_4890
My mom got me herbs for my birthday. They were little tiny things, and now they look like this! Some of my favorite people working on artistic applications of AI consider tuning hyperparameters to be an act akin to pruning plants in a garden. An act of care and love.

Why not free ourselves of the need for big picture narratives and celebrate the fact that the future is far more complex than we’ll ever be able to predict?

How can we do this morally? How can we abandon ourselves to what will come and retain responsibility? What might we build if we mimic animal superintelligence instead of getting stuck in history’s linear march of progress?

I believe there would be beauty. And wild inspiration.


[1] This note should have been after the first sentence, but I wanted to preserve the rhetorical force of the bare sentences. My friend Stephanie Schmidt, a professor at SUNY Buffalo, uses the concept of foundational narratives extensively in her work about colonialism. She focuses on how cultures subjugated to colonial power assimilate and subvert the narratives imposed upon them.

[2] Yesterday I had the pleasure of hearing a talk by the always-inspiring Martin Snelgrove about how to design hardware to reduce energy when using trained algorithms to execute predictions in production machine learning. The basic operations undergirding machine learning are addition and multiplication: we’d assume multiplying takes more energy than adding, because multiplying is adding in sequence. But Martin showed how it all boils down to how far electrons need to travel. The broad-stroke narrative behind why GPUs are better for deep learning is that they shuffle electrons around criss-cross structures that look like matrices as opposed to putting them into the linear straight-jacket of the CPU. But the geometry can get more fine-grained and complex, as the 256×256 array in Google’s TPU shows. I’m keen to dig into the most elegant geometry for designing for Bayesian inference and sampling from posterior distributions.

[3] Technology culture loves to fetishize failure. Jeremy Epstein helped me realize that failure is only fun if it’s the mid point of a narrative that leads to a turn of events ending with triumphant success. This is complex. I believe in growth mindsets like Ray Dalio proposes in his Principles: there is real, transformative power in shifting how our minds interpret the discomfort that accompanies learning or stretching oneself to do something not yet mastered. I jump with joy at the opportunity to transform the paralyzing energy of anxiety into the empowering energy of growth, and believe its critical that more women adopt this mindset so they don’t hold themselves back from positions they don’t believe they are qualified for. Also, it makes total sense that we learn much, much more from failures than we do from successes, in science, where it’s important to falsify, as in any endeavor where we have motivation to change something and grow. I guess what’s important here is that we don’t reduce our empathy for the very real pain of being in the midst of failure, of not feeling like one doesn’t have what other have, of being outside the comfort of the bell curve, of the time it takes to outgrow the inheritance and pressure from the last generation and the celebrations of success. Worth exploring.

[4] One is from Tim Urban, as in this Google Talk about superintelligence. I really, really like Urban’s blog. His recent post about choosing a career is remarkably good and his Ted talk on procrastination is one of my favorite things on the internet. But his big picture narrative about AI irks me.

[5] Milton actually wrote a book about logic and was even a logic tutor. It’s at once incredibly boring and incredibly interesting stuff.

The featured image is the 1808 Butts Set version of William Blake’s “Satan Watching the Endearments of Adam and Eve.” Blake illustrated many of Milton’s works and illustrated Paradise Lost three times, commissioned by three different patrons. The color scheme is slightly different between the Thomas, Butts, and Linnell illustration sets. I prefer the Butts. I love this image. In it, I see Adam relegated to a supporting actor, a prop like a lamp sitting stage left to illuminate the real action between Satan and Eve. I feel empathy for Satan, want to ease his loneliness and forgive him for his unbridled ambition, as he hurdles himself tragically into the figure of the serpent to seduce Eve. I identify with Eve, identify with her desire for more, see through her eyes as they look beyond the immediacy of the sexual act and search for transcendence, the temptation that ultimately leads to her fall. The pain we all go through as we wise into acceptance, and learn how to love. 

Screen Shot 2018-06-03 at 8.22.32 AM
Blake’s image reminds me of this masterful kissing scene in Antonioni’s L’Avventura (1960)The scene focuses on Monica Vitti, rendering Gabriele Ferzetti an instrument for her pleasure and her interior movement between resistance and indulgence. Antonioni takes the ossified paradigm of the male gaze and pushes it, exposing how culture can suffocate instinct as we observe Vitti abandon herself momentarily to her desire.

Of Basilicas and Sandcastles

Sandstone blushed pink, washed gold drips spires like kids plodding sand clods, layer upon layer tapering into antique vases and Victorian crowns, cobweb queens crooning nocturnal arias of desert winds in desert pines, ghosts within Native American ghosts, burnt sage bushes carmelizing wellbeing and peace as they caress their canyons, their friends, leaving them be, pruning caves where they might dwell and carve and paint and eat, ghosts long silenced by Manifest Destiny blasting His metallic, electric, self-driving cries to Mars, for yes, the future is already here, just not evenly distributed, moguls gluttonously rich as our anorexic middle class, addicted to their heroine gaze in selfie sticks and Facebook Likes, vanishes from photos like Trotsky, a Mexican ice pick nailing his moustache to the cross, forsaken by his father as parched tears transubstantiate into blood droplets fixed in sacred time together with the ocean hoodoos, voodoo rocks moving, flowing, crooning their ocean hymns with the wind queens till ice cracks the foundations and the avalanche falls.

IMG_4466
The precious colour schemes of Bryce Canyon, up close.

But, somehow, these same rosy blushes and gold lashes appear in Barcelona, on the recently restored façade of Gaudí’s unfinished Sagrada Familia.

I behold the Barcelonan stone blushed pink, washed gold and time somersaults in my lonely chest. Bryce Canyon and the Sagrada Familia stand, silent, 5,586 miles apart. They have co-existed for 136 years. Neither is complete; neither ever will be. The truth is that their pink gold stones likely aren’t very similar to the scientific eye. But to me they are. Similar enough to tear apart the fabric of time as a lover tears silk to expose milk skin, Harem beauty, breasts blanched only by moon rays. So similar that tears pierce their possibility. I don’t hold them back even if others are watching. The others are busy being with loved ones anyway. No one watches. No one except every sometimes the perfect meeting eye to eye, not the groping kind, the seeking kind, the kind astonished to have encountered a self, a soul, curious to see deep inside for an instant, to cradle the shock of what must be beauty, observant enough to recognize unique meeting unique before the footsteps go too far and a we vanishes, stillborn.

I stand alone in Barcelona, walking streets watching others walk streets with others. My very loneliness grants me passage back to the silent sandcastles of Bryce Canyon. Inside this crevice of similarity, I recognize two separate constructions that have come to be one through the patient ravishing of wear and time. In Barcelona, by the grime of the city, exhaust from bankers’ lungs twitching stock exchange profit, orange precipitating scallops doused in chorizo oil; in Bryce, by the violence of the desert, ice tearing limestone hymen with glacier patience, tourist footsteps gently tweezing out the old ocean soul in camera flashes and plastic baggies. The difference is that in Barcelona this pristine blush of pink and gold only juts out when juxtaposed to the tarnished, uncleaned façade, whereas in Bryce it cannot but swallow everyone in its magnificence. It’s likely I wouldn’t have perceived the shocking similarity had I visited Barcelona a few months later, presuming the restoration work will have advanced to no longer leave the striking difference between clean and dirty façades. And I wouldn’t have been primed to see the similarity had I not visited Bryce Canyon just a few weeks before. I, then,–and by I, I mean the set of experiences collected into this unifier we call memory and consciousness, where analogies forge similarity in blacksmith strokes—am the condition of the similarity. It took me moving between continents to notice this unique and beautiful elision. And, it’s likely that it took me being alone to feel it deeply enough to make it matter. Had I wandered the world with a companion, I probably would have noticed the similarities, but they probably wouldn’t have penetrated deeply into the place where the beauty breathes so pure it hurts. Hurts because it carries with it the basic fact of my existence, inviting me to have a seat. To feast upon my life.

img_4762.jpg
It’s the pink and orange hues of the left façade that caught my eye.

Why yes, the hues of pink and gold in the muted limestone of Bryce Canyon and Barcelona are so beautiful because the perception of their similarity is the trace of my existence. The heightening of what is to what is meaningful. It’s a nostalgic and slightly mourning meaning, as walking the streets of Barcelona I think about García Llorca’s Yerma*, a play about a woman who never bears a child. I often face moments of sadness at not being married, not having children, not being cushioned by normativity’s blessings. But my jealousy and covetousness for others’ lives have eased over time. This is evident in how my relationship with my mother has changed. I’ve done the emotional prep work of still being without child at 40 or 45, empathizing with a future self in a future state and thereby also growing more compassionate to others, today. I’ve experienced many places and opened my heart to many people. It’s an existence worth a second act.

This took place yesterday morning. Saturday. Friday afternoon I recovered a different past. It’s likely Friday’s experience primed my mind and my emotions to notice Saturday’s sandstone similarities.

For Friday I walked into the Picasso museum in Barcelona’s Gothic quarter. The air was damp but the rain held off, at least then (later on I waited out the raindrops with strangers under a group of trees near the waterfront, watching a mother spoon yogurt to some little mouth covered by stroller canvas; the little one seemed to eat well, the yogurt went fast). I’d wandered to the Cathedral, saw the foreboding chiaro-oscuro of the heraldic escutcheons, black and shadowed and tall into cracked gothic arches. I wandered through narrow streets weaved with balconies, some square, others round like Gaudí hobbit holes.

The Picasso museum is housed in a medieval cloister. The entrance is asymmetric, with matte greens and greys and a staircase up the right-hand side. Standing in line for tickets, I encountered my sixteen-year-old self. She was waiting for me; she had never left; she lived in the matte green of the entrance hall. I relived the mild disgust noticing our Spanish teacher’s fanny pack hang like a limp holster under the taught piqué cotton of his mint green polo shirt, I saw the moles on his hairy legs and the forced kindness in his smile. He stood there waiting for all 17 or 18 or 20 of us to gather in the museum. Watching him, sixteen, I relived my projection of myself in the eyes of the boys on the trip, they were juniors, I was a sophomore, I had a crush on Lyle, my experience of Gaudí’s balconies and Picasso’s cubism and Velázquez’s portraits and Franco’s phallic monument and the Roman aqueducts in Segovia were filtered through this prism of insecurity and adolescent desire, my personality still so much in flux, my introversion still so marked. I brought my violin to Spain and played every day. I brought a suitcase that was much too large, as I had yet to pride myself on my practicality, how easily I could move about the world. At the time, I was absorbed by the pulse of my feelings, by the inklings of the self I wanted to project. I was so governed by how I thought others perceived me; still am, but more so then. Painfully so then, my superego cruel and chastising. I jumped forward a few years, into my mid twenties, where I regretted my stupid crushes and insecurity and self-absorption, as I didn’t have strong memories of the objects and monuments and art I was supposed to have learned about. I was so focused on Lyle, so focused on how I projected Lyle saw me, that I missed out on Gaudí, Barcelona, Spain.

picasso museum
The entrance to the Picasso museum, where I remembered things past.

But now, years later, I love the distortions. I love how walking into the Picasso museum on a damp Friday afternoon, I recover not just the memory of the place, but the feelings and vulnerability and sensations of the past observer. I love how I’m still there in my sensitivity, still there shaping observations based on who I am and what I’ve lived and where I project I might find future happiness. And that that snapshot of a self in development is still available to inhabit, to re-inhabit once more. That we can become again. That just as the pink and gold hues collapse space into the pulse of a single mind, so too do the matte greens collapse time, the identity of place revealing a self growing in time. Eighteen years of experience elapsed under a staircase.

I walked upstairs and, migrating at my own pace from room to room, understood how, like David Bowie, Picasso didn’t have one style, but iterated upon a given style in a given creative period until it was exhausted, then moved on. Impressionism cedes to the blue period cedes to the Russian ballets cedes to cubism cedes to the bombastic primitivism cedes to recreating las Meninas cedes to ad infinitum adoration of his wife cedes to the black and white still lives of old age. It was his recreating Las Meninas that caught my attention. Picasso takes this work, this exemplar of Spanish Golden Age style, where Velázquez enacts the Christ-like elision of creator and spectator, the Baroque practice of inverting the artwork—as representation of reality—to fuse the moment of creation with the moment of observation, perfected through the gaze of the painter himself as of the man escaping from the back lit door, well, Picasso takes this work and exploits the conceit of the artist and, with algorithmic insistence, repaints and repaints and recreates, distilling the essence of the form in the variations of style and look and feel. And the variations themselves eclipse his own journey as a painter, never tied to one style, always free to pivot and redefine himself anew. Picasso’s Meninas telescoping my own experience recovering my younger self, the privilege of my loneliness opening me raw and whole to meet him there, to imagine I might be with him and the pigeons while he painted. Another meeting eye to eye, the seeking kind, inside the artist, back to Velázquez. Complete in a way that can only be described as human.

Imagen 021
My favourite rendition of Picasso’s Las Meninas, a series from 1957.

*I picked up a book in the airport in Barcelona, nearly finished with Galloway’s diatribe against the Four. Niebla en Tánger, by Cristina López Barrio. Funny I’d just mentioned Yerma as I wrote this post on the plane, for I came across this sentence from a similarly childless protagonist: “Está vacío, como el de Yerma, piensa, hueco por esperar vida del hombre equivocado.”

The featured image dates from my recent trip to Bryce Canyon. It’s like a big field of deep dream art, dripping in its delicate phantasy. It was my favourite of the Utah canyons. 

 

Privacy Beyond the Individual

This week’s coverage of the Facebook and Cambridge Analytica debacle[1] (latest Guardian article) has brought privacy top of mind and raised multiple complex questions[2]:

Is informed consent nothing but a legal fiction in today’s age of data and machine learning, where “ongoing and extensive data collection can be neither fully informed nor truly consensual — especially since it is practically irrevocable?”

Is tacit consent–our silently agreeing to the fine print of privacy policies as we continue to use services–something we prefer to grant given the nuisance, time, and effort required to understand the nuances of data use? Is consent as a mechanism too reliant upon the supposition that privacy is an individual right and, therefore, available for an individual to exchange–in varying degrees–for the benefits and value from some service provider (i.e.,  Facebook likes satisfying our need to be both loved and lovely)? If consent is defunct, what legal structure should replace it?

Screen Shot 2018-03-25 at 9.08.17 AM
Scottish Enlightenment philosopher Adam Smith wrote about our need to love and be lovely in the Theory of Moral Sentiments. I’m dying to dig back into Smith because I suspect his work can help show that even personalization and online consumerism independent of any political context dulls our capacities to be active, rational participants in democracy. Indeed, listening to Russ Roberts’s EconTalk podcast, I’ve learned that Smith argued that commerce, i.e., in-person transactions with strangers to exchange value, provides an opportunity to practice regulating our emotions, as we can’t devolve into temper tantrums and emotional outbursts like we do with our families and spouses (the inimical paradoxes of intimacy…) if we want to get business done. Roberts wrote a poem extolling the wonders if libertarian emergence called It’s a Wonderful Loaf.

How should we update outdated notions of what qualifies as personally identifiable information (PII), which already vary across different countries and cultures, to account for the fact contemporary data processing techniques can infer aspects of our personal identity from our online (and, increasingly, offline) behavior that feel more invasive and private than our name and address? Can more harm be done to an individual using her social security/insurance number than psychographic traits? In which contexts?

Would regulatory efforts to force large companies like Facebook to “lock down” data they have about users actually make things worse, solidifying their incumbent position in the market (as Ben Thompson and Mike Masnick argue)?

Is the best solution, as Cory Doctorow at the Electronic Frontier Foundation argues, to shift from having users (tacitly) consent to data use, based on trust and governed by the indirect forces of market demand (people will stop using your product if they stop trusting you) and moral norms, to building privacy settings in the fabric of the product, enabling users to engage more thoughtfully with tools?[3]

Many more qualified than I are working to inform clear opinions on what matters to help entrepreneurs, technologists, policymakers, and plain-old people[4] respond. As I grapple with this, I thought I’d share a brief and incomplete history of the thinking and concepts undergirding privacy. I’ll largely focus on the United States because it would be a book’s worth of material to write even brief histories of privacy in other cultures and contexts. I pick the United States not because I find it the most important or interesting, but because it happens to be what I know best. My inspiration to wax historical stems from a keynote I gave Friday about the history of artificial intelligence (AI)[5] for AI + Public Policy: Understanding the shift, hosted by the Brookfield Institute in Toronto.

As is the wont of this blog, the following ideas are far from exhaustive and polished. I offer them for your consideration and feedback.


The Fourth Amendment: Knock-and-Announce

As my friend Lisa Sotto eloquently described in a 2015 lecture at the University of Pennsylvania, the United States (U.S.) considers privacy as a consumer right, parsed across different business sectors, and the European Union (EU) considers privacy as a human right, with a broader and more holistic concept of what kinds of information qualify as sensitive. Indeed, one look at the different definitions of sensitive personal data in the U.S. and France in the DLA Piper Data Protection Laws of the World Handbook shows that the categories and taxonomies are operating at different levels. In the U.S., sensitive data is parsed by data type; in France, sensitive data is parsed by data feature:

Screen Shot 2018-03-24 at 12.22.42 PM
Screenshot from the DLA Piper Data Protection Handbook. They conveniently organize information so a reader can compare how two countries differ on aspects of privacy law.

It seems potentially, possibly plausible (italics indicating I’m really unsure about this) that the U.S. concept of privacy as being fundamentally a consumer right dates back to the original elision of privacy and property in the Fourth Amendment to the U.S. Constitution:

The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

We forget how tightly entwined protection of property was to early U.S. political theory. In his Leviathan, for example, seventeenth-century English philosopher Thomas Hobbes derives his theory of legitimate sovereign power (and the notion of the social contract that influenced founding fathers like Jefferson and Madison) from the need to provide individuals with some recourse against intrusions on their property; otherwise we risk devolving to the perpetually anxious and miserable state of the war of all against all, where anyone can come in and ransack our stuff at any time.

The Wikipedia page on the Fourth Amendment explains it as a countermeasure against general warrants and writs of assistance British colonial tax collectors were granted to “search the homes of colonists and seize ‘prohibited and uncustomed’ goods.” What matters for this brief history is the foundation that early privacy law protected people’s property–their physical homes–from searches, inspections, and other forms of intrusion or surveillance by the government.

Katz v. United States: Reasonable Expectations of Privacy

Over the past 50 years, new technologies have fracked the bedrock assumptions of the Fourth Amendment.[6] The case law is expansive and vastly exceeds the cursory overview I’ll provide in this post. Cindy Cohn from the Electronic Frontier Foundation has written extensively and lucidly on the subject.[7] As has Daniel Solove.

Perhaps the seminal case shaping contemporary U.S. privacy law is Katz v. United States, 389 U.S. 347 (1967). A 2008 presentation from Artemus Ward at Northern Illinois University presents the facts and a summary of the Supreme Court Justices’ opinions in these three slides (there are also dissenting opinions):

Katz+v.+United+States+(1967)+The+Facts

Katz+v.+United+States+(1967)+Justice+Potter+Stewart+Delivered+the+Opinion+of+the+Court

harlan

There are two key questions here:

  • Does the right to privacy extend to telephone booths and other public places?
  • Is a physical intrusion necessary to constitute a search?

Justice Harlan’s comments regarding the “actual (subjective) expectation of privacy” that society is prepared to recognize as “reasonable” marked a conceptual shift to pave the way for the Fourth Amendment to make sense in the digital age. Katz shifted the locus of constitutional protections against unwarranted government surveillance from one’s private home–property that one owns–to public places that social norms recognize as private in practice if not name (a few cases preceding Katz paved the way for this judgment).

This is watershed: when any public space can be interpreted as private in the eyes of the beholder, the locus of privacy shifts from an easy-to-agree-upon-objective-space like someone’s home, doors locked and shades shut, to a hard-to-agree-upon-subjective-mindset like someone’s expectation of what should be private, even if it’s out in a completely public space, just as long as those expectations aren’t crazy (i.e., that annoying lady somehow expecting that no one is listening to her uber-personal conversation about the bad sex she had with the new guy from Tinder as she stands in a crowded checkout line waiting to purchase her chia seed concoction and her gluten-free crackers[8]) but accord with the social norms and practices of a given moment and generation.

Imagine how thorny it becomes to decide what qualifies as a reasonable expectation for privacy when we shift from a public phone booth occupied by one person who can shut the door (as in Katz) to:

  • internet service providers shuffling billions of text messages, phone calls, and emails between individuals, where (perhaps?) the standard expectation is that when we go through the trouble of protecting information with a password (or two-factor authentication), we’re branding these communications as private, not to be read by the government or the private company providing us with the service (and metadata?);
  • GPS devices placed on the bottom of vehicles, as in United States v. Jones, 132 S.Ct. 945 (2012), which in themselves may not seem like something everyone has to worry about often but which, given the category of data they generate, are similar to any and all information about how we transact and move in the world, revealing not just what our name is but which coffee shops and doctors (or lovers or political co-conspirators) we visit on a regular basis, prompting Justice Sandra Sotomayor to be very careful in her judgments;
  • social media platforms like Facebook, pseudo-public in nature, that now collect and analyze not only structured data on our likes and dislikes, but, thanks to advancing AI capabilities, image, video, text, and speech data;
  • etc.
Screen Shot 2018-03-25 at 10.34.26 AM
Slide from Kosta Derpanis‘s extremely witty and engaging talk on computer vision at Friday’s AI + Public Policy Conference. This shows how Facebook is now applying computer vision techniques to analyze images users post, not just structured data about what they like and dislike.

Just as Zeynep Tufecki argues that informed consent loses its power in an age where most users of internet services and products don’t rigorously understand what use of their data they’re consenting too, so too does Cohn believe that the “‘reasonable expectation of privacy’ test currently employed in Fourth Amendment jurisprudence is a poor test for the digital age.”[9] As with any shift from criticism to pragmatic solutions, however, the devil is in the details. If we eliminate a reasonableness test because it’s too flimsy for the digital age, what do we replace it to achieve the desired outcomes of protecting individual rights to free speech and preventing governmental overreach? Do we find a way to measure actual harm suffered by an individual? Or should we, as Lex Gill suggested Friday, somehow think about privacy as a public good rather than an individual choice requiring individual consent? What are the different harms we need to guard against in different contexts, given that use of data for targeted marketing has different ramifications than government wiretapping?

These questions are tricky to parse because, in an age where so many aspects of our lives are digital, privacy bleeds into and across different contexts of social, political, commercial, and individual activity. As Helen Nissenbaum has masterfully shown, our subjective experience of what’s appropriate in different social contexts influences our reasonable expectations of privacy in digital contexts. We don’t share all the intimate details of our personal life with colleagues in the same way we do with close friends or doctors bound by duties of confidentiality. Add to that that certain social contexts demand frivolity (that ironic self you fashion on Facebook) and others, like politics, invite a more aspirational self.[10] Nissenbaum’s theory of contextual integrity, where privacy is preserved when information flows respect the implicit, socially-constructed boundaries that graft the many sub-identities we perform and inhabit as individuals, applies well to Cambridge Analytica debacle. People are less concerned by private companies using social media data for psychographic targeting than they are for political targeting; the algorithms driving stickiness on site and hyper-personalized advertising aren’t fit to promote the omnivorous, balanced information diet required to understand different sides of arguments in a functioning democracy. Being at once watering hole to chat with friends, media company to support advertising, and platform for political persuasion, Facebook collapses distinct social spheres into one digital platform (which also complicates anti-trust arguments, as evident in this  excellent Intelligence Squared debate).

A New New Deal on Data: Privacy in the Age of Machine Learning

In 2009, Alex “Sandy” Pentland of the MIT Media Lab began the final section of his article calling for a “new deal” on data as follows:

Perhaps the greatest challenge posed by this new ability to sense the pulse of humanity is creating a “new deal” around questions of privacy and data ownership. Many of the network data that are available today are freely offered because the entities that control the data have difficulty extracting value from them. As we develop new analytical methods, however, this will change.[11]

This ability to “sense the pulse of humanity,” writes Pentland earlier in the article, arises from the new data generation, collection, and processing tools that have effectively given the human race “the beginnings of a working nervous system.” Pentland contrasts what we are able to know about people’s behavior today–where we move in the world, how many times our hearts beat per minute, whom we love, whom we are attracted to, what movies we watch and when, what books we read and stop reading in between, etc–with the “single-shot, self-report data” data, e.g., yearly censuses, public polls, and focus groups, that characterized demographic statistics in the recent past. Note that back in 2009, the hey day of the big data (i.e., collecting and storing data) era, Pentland commented that while a ton of data was collected, companies had difficulty extracting value. It was just a lot of noise backed by the promise of analytic potential.

This has changed.

Machine learning has unlocked the potential and risks of the massive amounts of data collected about people.

The standard risk assessment tools (like privacy impact assessments) used by the privacy community today focus on protecting the use of particular types of data, PII like proper names and e-mail addresses. There is a whole industry and tool kit devoted to de-identification and anonymization, automatically removing PII while preserving other behavioral information for statistical insights. The problem is that this PII-centric approach to privacy misses the boat in the machine learning age. Indeed, what Cambridge Analytica brought to the fore was the ability to use machine learning to probabilistically infer not proper names but features and types from behavior: you don’t need to check a gender box for the system to make a reasonably confident guess that you are a woman based on the pictures you post and the words you use; private data from conversations with your psychiatrist need not be leaked for the system to peg you as neurotic. Deep learning is so powerful because it is able to tease out and represent hierarchical, complex aspects of data that aren’t readily and effectively simplified down variables we can keep track of and proportionately weight in our heads: these algorithms can, therefore, tease meaning out of a series of actions in time. This may not peg you as you, but it can peg you as one of a few whose behavior can be impacted using a given technique to achieve a desired outcome.

Three things have shifted:

  • using machine learning, we can probabilistically construct meaningful units that tell us something about people without standard PII identifiers;
  • because we can use machine learning, the value of data shifts from the individual to statistical insights across a distribution; and
  • breaches of privacy that occur at the statistical layer instead of the individual data layer require new kinds of privacy protections and guarantees.

The technical solution to this last bullet point is a technique called differential privacy. Still in the early stages of commercial adoption,[12] differential privacy thinks about privacy as the extent to which individual data impacts the shape of some statistical distribution. If what we care about is the insight, not the person, then let’s make it so we can’t reverse engineer how one individual contributed to that insight. In other words, the task is to modify a database such that:

if you have two otherwise identical databases, one with your information and one without it, the probability that a statistical query will produce a given result is (nearly) the same whether it’s conducted on the first or second database.

Here’s an example Matthew Green from Johns Hopkins gives to help develop an intuition for how this works:

Imagine that you choose to enable a reporting feature on your iPhone that tells Apple if you like to use the 💩  emoji routinely in your iMessage conversations. This report consists of a single bit of information: 1 indicates you like 💩 , and 0 doesn’t. Apple might receive these reports and fill them into a huge database. At the end of the day, it wants to be able to derive a count of the users who like this particular emoji.

It goes without saying that the simple process of “tallying up the results” and releasing them does not satisfy the DP definition, since computing a sum on the database that contains your information will potentially produce a different result from computing the sum on a database without it. Thus, even though these sums may not seem to leak much information, they reveal at least a little bit about you. A key observation of the differential privacy research is that in many cases, DP can be achieved if the tallying party is willing to add random noise to the result. For example, rather than simply reporting the sum, the tallying party can inject noise from a Laplace or gaussian distribution, producing a result that’s not quite exact — but that masks the contents of any given row.

This is pretty technical. It takes time to understand it, in particular if you’re not steeped in statistics day in and day out, viewing the world as a set of dynamic probability distributions. But it poses a big philosophical question in the context of this post.

In the final chapters of Homo Deus, Yuval Noah Harari proposes that we are moving from the age of Humanism (where meaning emanates from the perspective of the individual human subject) to the age of Dataism (where we question our subjective viewpoints given our proven predilections for mistakes and bias to instead relegate judgment, authority, and agency to algorithms that know us better than we know ourselves). Reasonable expectations for privacy, as Justice Harlan indicated, are subjective, even if they must be supported by some measurement of norms to qualify as reasonable. Consent is individual and subjective, and results in principles like that of minimum use for an acknowledged purpose because we have limited ability to see beyond ourselves, we create traffic jams because we’re so damned focus on the next step, the proxy, as opposed to viewing the system as a whole from a wider vantage point, and only rarely (I presume?) self-identify and view ourselves under the round curves of a distribution. So, if techniques like differential privacy are better apt to protect us in an age where distributions matter more than data points, how should we construct consent, and how should we shape expectations, to craft the right balance between the liberal values we’ve inherited and this mathematical world we’re building? Or, do we somehow need to reformulate our values to align with Dataism?

And, perhaps most importantly, what should peaceful resistance look like and what goals should it achieve?


[1] What one decides to call the event reveals a lot about how one interprets it. Is it a breach? A scandal? If so, which actor exhibits scandalous behavior: Nix for his willingness to profit from the manipulation of people’s psychology to support the election of an administration that is toppling democracy? Zuckerberg for waiting so long to acknowledge that his social media empire is more than just an advertising platform and has critical impacts on politics and society? The Facebook product managers and security team for lacking any real enforcement mechanisms to audit and verify compliance with data policies? We the people, who have lost our ability and even desire to read critically, as we prefer the sensationalism of click bait, goading technocrats to optimize for whatever headline keeps us hooked to our feed, ever curious for more? Our higher education system, which, falling to economic pressures that date back to (before but were aggravated by) the 2008-2009 financial crisis are cutting any and all curricula for which it’s hard to find a direct, casual line to steady and lucrative employment, as our education system evolves from educating a few thoughtful priests to educating many industrial workers to educating engineers who can build stuff and optimize everything and define proxies and identify efficiencies so we can go faster, faster until we step back and take the time to realize the things we are building may not actually align with our values, that, perhaps, we may need to retain and reclaim our capacities to reflect and judge and reason if we want to sustain the political order we’ve inherited? Or perhaps all of this is just the symptom of much larger, complex trend in World History that we’re unable to perceive, that the Greeks were right in thinking that forms of government pass through inevitable cycles with the regularity of the earth rotating around the sun (an historical perspective itself, as the Greeks thought the inverse) and we should throw our hands up like happy nihilists, bowing down to the unstoppable systemic forces of class warfare, the give and take between the haves and the have nots, little amino acids ever unable to perceive how we impact the function of proteins and how they impact us in return?

And yet, it feels like there may be nothing more important than to understand this and to do what little–what big–we can to make the world a better place. This is our dignity, quixotic though it may be.

Screen Shot 2018-03-24 at 9.39.39 AM
The Greek term for the cycle of political regimes is anacyclosis. Interestingly enough, there is an institute devoted to this idea, seemingly located in North Carolina. Their vision is to “halt the cycle of revolution while democracy prevails.”

[2] One aspect of the fiasco* I won’t write about but that merits at least passing mention is Elon Musk’s becoming the mascot for the #DeleteFacebook movement (too strong a word?). The New York Times coverage of Musk’s move references Musk and Zuckerberg’s contrasting opinions on the risks AI might pose to humanity. From what I understand, as executives, they both operate on extremely long time scales (i.e., 100 years in the future), projecting way out into speculative futures and working backwards to decide what small steps man should take today to enable Man to take giant future leaps (gender certainly intended, especially in Musk’s case, as I find his aesthetic and many of the very muscular men I’ve met from Tesla at conferences is not dissimilar from the nationalistic masculinity performed by Vladimir Putin). Musk rebuffed Zuckerberg’s criticism that Musk’s rhetoric about the existential threat AI poses to humanity is “irresponsible” by saying that Zuckerberg’s “understanding of the subject is limited.” I had some cognitive dissonance reading this, as I presumed the risk Musk was referring to was that of super-intelligence run amok (à la Nick Bostrom, whom I admittedly reference as a straw man) rather than that of our having created an infrastructure that exacerbates short-term, emotional responses to stimuli and thereby threatens the information exchange required for democracy to function (see Alexis de Tocqueville on the importance of newspapers in Democracy in America). My takeaway from all of this is that there are so many different sub-issues all wrapped up together, and that we in the technology community really do need to work as hard as we can to references specifics rather than allow for the semantic slippage that leaves interpretation in the mind of the beholder. It’s SO HARD to do this, especially for pubic figures like Musk, given that people’s attention spans are limited and we like punchy quotables at a very high level. The devil is always in the details.

[3] Doctorow references Laurence Lessig’s Code and Other Laws of Cyberspace, which I have yet to read but is hailed as a classic text on the relationship between law and code, where norms get baked into our technologies in the choices of how we write code.

[4] I always got a kick out of the song Human by the Killers, whose lyrics seem to imply a mutually exclusive distinction between human and dancer. Does the animal kingdom offer better paradigms for dancers than us poor humans? Must depend on whether you’re a white dude.

[5] My talk drew largely from Chris Dixon‘s extraordinary Atlantic article How Aristotle Created the Computer. Extraordinary because he deftly encapsulates 2000 years of the history of logic into a compelling, easy-to-read article that truly helps the reader develop intuitions about deterministic computer programs and the shift to a more inductive machine learning paradigm, while also not leaving the reader with the bitter taste of having read an overly general dilettante. Here’s one of my slides, which represents how important it was for the history of computation to visualize and interpret Aristotelian syllogisms as sets (sets lead to algebra lead to encoding in logical gates lead to algorithms).

Screen Shot 2018-03-24 at 11.18.28 AM
As Dixon writes, “You can replace “Socrates” with any other object, and “mortal” with any other predicate, and the argument remains valid. The validity of the argument is determined solely by the logical structure. The logical words — “all,” “is,” are,” and “therefore” — are doing all the work.”

Fortunately (well, we put effort in to coordinate), my talk was a nice primer for Graham Taylor‘s superbly clear introduction to various forms of machine learning. I most liked his section on representation learning, where he showed how the choice of representation of data has an enormous impact on the performance of algorithms:

Screen Shot 2018-03-24 at 11.25.25 AM
The image is from deeplearningbook.org. Note that in Cartesian coordinates, it’s hard to draw a straight line that could separate the blue and green items, whereas in polar coordinates, the division is readily visible. Coordinate choice has a big impact on what qualifies as simple in math. Newton and Descartes, for example, disagreed over the simplicity of the equation for a circle: it’s a pretty complex equation when represented in Cartesian coordinates, but quite simple in Polar coordinates. Our frame of reference is a thinking tool we can use to solve problems — I first learned this in high-school physics, when Clyfe Beckwith taught us that we could tilt our Cartesian coordinates to align with the slope of a hill in a physics problem. It’s a foundational memory I have of ridding myself of ossified assumptions to open the creative thinking space to solve problems, not unlike adding 0 to an algebraic equation to leverage the power of b + -b.

[6] If you’re interested in contemporary Constitutional Law, I highly recommend Roman Mars’s What Trump Can Teach us about Con Law podcast. Mars and Elizabeth Joh, a law school professor at UC Davis, use Trump’s entirely anomalous behavior as catalyst to explore various aspects of the Constitution. I particularly enjoyed the episode about the emoluments clause, which prohibits acceptance of diplomatic gifts to the President, Vice President, Secretary of State, and their spouses. The Protocol Gift Unit keeps public record of all gifts presidents did accept, including justification of why they made the exception. For example, in 2016, former President Obama accepted Courage, an olive green with black flecks soapstone sculpture, depicting the profile of an eagle with half of an indigenous man’s face in the center, valued at $650.00, from His Excellency Justin Trudeau, P.C., M.P., Prime Minister of Canada, because “non-acceptance would cause embarrassment to donor and U.S. Government.”

courage statute
Courage on the right. Leo Arcand, the Alberta Cree artist who sculpted it, on the left.

[7] Cindy will be in Toronto for RightsCon May 16-18. I cannot recommend her highly enough. Every time I hear her speak, every time I read her writing, I am floored by her eloquence, precision, and passionate commitment to justice.

[8] Another thing I cannot recommend highly enoug is David Foster Wallace’s commencement speech This is Water. It’s ruthlessly important. It’s tragic to think about the fact that this human, this wonderfully enlightened heart, felt the only appropriate act left was to commit suicide.

[9] A related issue I won’t go into in this post is the third-party doctrine, “under which an individual who voluntarily provides information to a third party loses any reasonable expectation of privacy in that information.” (Cohn)

[10] Eli Pariser does a great job showing the difference between our frivolous and aspiration selves, and the impact this has on filter bubbles, in his 2011 quite prescient monograph.

[11] See also this 2014 Harvard Business Review interview with Pentland. My friend Dazza Greenwood first introduced me to Pentland’s work by presenting the blockchain as an effective means to executive the new deal on data, empowering individuals to keep better track of where data flow and sit, and how they are being used.

[12] Cynthia Dwork’s pioneering work on differential privacy at Microsoft Research dates back to 2006. It’s currently in use at Apple, Facebook, and Google (the most exciting application being fused with federated learning across the network of Android users, to support localized, distributed personalization without requiring that everyone share their digital self with Google’s central servers). Even Uber has released an open-source differential privacy toolset. There are still many limitations to applying these techniques in practice given their impact on model performance and the lack of robust guarantees on certain machine learning models. I don’t know of many instances of startups using the technology yet outside a proud few in the Georgian Partners portfolio, including integrate.ai (where I work) and Bluecore in New York City.

The featured image is from an article in The Daily Dot (which I’ve never heard of) about the Mojave Phone Booth, which, as Roman Mars delightfully narrates in 99% Invisible became a sensation when Godfrey “Doc” Daniels (trust me that link is worth clicking on!!) used the internet to catalogue his quest to find the phone booth working merely from its number: 760-733-9969. The tattered decrepitude of the phone booth, pitched against the indigo of the sunset, is a compelling illustration of the inevitable retrograde character of common law precedent. The opinions in Katz v. United States regarded reasonably expectations for privacy were given at a time when digital communications occurred largely over the phone: is it even possible for us to draw analogies between what privacy meant then and what it could mean now in the age of centralized platform technologies whose foundations are built upon creating user bases and markets to then exchange this data for commercial and political advertising purposes? But, what can we use to anchor ethics and lawful behavior if not the precedent of the past, aligned against a set of larger, overarching principles in an urtext like the constitution, or, in the Islamic tradition, the Qur’an? 

Exploration-Exploitation and Life

There was another life that I might have had, but I am having this one. – Kazuo Ishiguro

On April 18, 2016*, I attended an NYAI Meetup** featuring a talk by Columbia Computer Science Professor Dan Hsu on interactive learning. Incredibly clear and informative, the talk slides are worth reviewing in their entirety. But one in particular caught my attention (fortunately it summarizes many of the subsequent examples):

Screen Shot 2017-12-02 at 9.44.34 AM
From Dan Hsu’s excellent talk on interactive machine learning

It’s worth stepping back to understand why this is interesting.

Much of the recent headline-grabbing progress in artificial intelligence (AI) comes from the field of supervised learning. As I explained in a recent HBR article, I find it helpful to think of supervised learning like the inverse of high school algebra:

Think back to high school math — I promise this will be brief — when you first learned the equation for a straight line: y = mx + b. Algebraic equations like this represent the relationship between two variables, x and y. In high school algebra, you’d be told what m and b are, be given an input value for x, and then be asked to plug them into the equation to solve for y. In this case, you start with the equation and then calculate particular values.

Supervised learning reverses this process, solving for m and b, given a set of x’s and y’s. In supervised learning, you start with many particulars — the data — and infer the general equation. And the learning part means you can update the equation as you see more x’s and y’s, changing the slope of the line to better fit the data. The equation almost never identifies the relationship between each x and y with 100% accuracy, but the generalization is powerful because later on you can use it to do algebra on new data. Once you’ve found a slope that captures a relationship between x and y reliably, if you are given a new x value, you can make an educated guess about the corresponding value of y.

Supervised learning works well for classification problems (spam or not spam? relevant or not for my lawsuit? cat or dog?) because of how the functions generalize. Effectively, the “training labels” humans provide in supervised learning assign categories, tokens we affiliate to abstractions from the glorious particularities of the world that enable us to perceive two things to be similar. Because our language is relatively stable (stable does not mean normative, as Canadian Inuit perceive snow differently from New Yorkers because they have more categories to work with), generalities and abstractions are useful, enabling the learned system to act correctly in situations not present in the training set (e.g., it takes a hell of a long time for golden retrievers to evolve to be indistinguishable from their great-great-great-great-great-grandfathers, so knowing what one looks like on April 18, 2016 will be a good predictor of what one looks like on December 2, 2017). But, as Rich Sutton*** and Andrew Barto eloquently point out in their textbook on reinforcement learning,

This is an important kind of learning, but alone it is not adequate for learning from interaction. In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. In uncharted territory—where one would expect learning to be most beneficial—an agent must be able to learn from its own experience.

In his NYAI talk, Dan Hsu also mentioned a common practical limitation of supervised learning, namely that many companies often lack good labeled training data and it can be expensive, even in the age of Mechanical Turk, to take the time to provide labels.**** The core thing to recognize is that learning from generalization requires that future situations look like past situations; learning from interaction with the environment helps develop a policy for action that can be applied even when future situations do not look exactly like past situations. The maxim “if you don’t have anything nice to say, don’t say anything at all” holds both in a situation where you want to gossip about a colleague and in a situation where you want to criticize a crappy waiter at a restaurant.

In a supervised learning paradigm, there are certainly traps to make faulty generalizations from the available training data. One classic problem is called “overfitting”, where a model seems to do a great job on a training data set but fails to generalize well to new data. But the super critical salient difference Hsu points out in his talk is that, while with supervised learning the data available to the learner is exogenous to the system, with interactive machine learning approaches, the learner’s performance is based on the learner’s decisions and the data available to the world depends on the learner’s decisions. 

Think about that. Think about what that means for gauging the consequences of decisions. Effectively, these learners cannot evaluate counterfactuals: they cannot use data or evidence to judge what would have happened if they took a different action. An ideal optimization scenario, by contrast, would be one where we could observe the possible outcomes of any and all potential decisions, and select the action with the best outcome across all these potential scenarios (this is closer, but not identical, to the spirit of variational inference, but that is a complex topic for another post).

To share one of Hsu’s***** concrete examples, let’s say a website operator has a goal to personalize website content to entice a consumer to buy a pair of shoes. Before the user shows up at the site, our operator has some information about her profile and browsing history, so can use past actions to guess what might be interesting bait to get a click (and eventually a purchase). So, at the moment of truth, the operator says “Let’s show the beige Cole Hann high heels!”, displays the content, and observes the reaction. We’ll give the operator the benefit of the doubt and assume the user clicks, or even goes on to purchase. Score! Positive signal! Do that again in the future! But was it really the best choice? What would have happened if the operator had shown the manipulatable consumer the red Jimmy Choo high heels, which cost $750 per pair rather than a more modest $200 per pair? Would the manipulatable consumer have clicked? Was this really the best action?

The learner will never know. It can only observe the outcome of the action it took, not the action it didn’t take.

The literature refers to this dilemma as the trade-off between exploration and exploitation. To again cite Sutton and Barto:

One of the challenges that arise in reinforcement learning, and not in other kinds of learning, is the trade-off between exploration and exploitation. To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it already knows in order to obtain reward, but it also has to explore in order to make better action selections in the future. The dilemma is that neither exploration nor exploitation can be pursued exclusively without failing at the task. The agent must try a variety of actions and progressively favor those that appear to be best. On a stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward.

There’s a lot to say about the exploration-exploitation tradeoff in machine learning (I recommend starting with the Sutton/Barto textbook). Now that I’ve introduced the concept, I’d like to pivot to consider where and why this is relevant in honest-to-goodness-real-life.

The nice thing about being an interactive machine learning algorithm as opposed to a human is that algorithms are executors, not designers or managers. They’re given a task (“optimize revenues for our shoe store!”) and get to try stuff and make mistakes and learn from feedback, but never have to go through the soul-searching agony of deciding what goal is worth achieving. Human designer overlords take care of that for them. And even the domain and range of possible data to learn from is constrained by technical conditions: designers make sure that it’s not all the data out there in the world that’s used to optimize performance on some task, but a tiny little baby subset (even if that tiny little baby entails 500 million examples) confined within a sphere of relevance.

Being a human is unfathomably more complicated.

Many choices we make benefit from the luxury of triviality and frequency. “Where should we go for dinner and what should we eat when we get there?” Exploitation can be a safe choice, in particular for creatures of habit. “Well, sweetgreen is around the corner, it’s fast and reliable. We could take the time to review other restaurants (which could lead to the most amazing culinary experience of our entire lives!) or we could not bother to make the effort, stick with what we know, and guarantee a good meal with our standard kale caesar salad, that parmesan crisp thing they put on the salad is really quite tasty…” It’s not a big deal if we make the wrong choice because, low and behold, tomorrow is another day with another dinner! And if we explore something new, it’s possible the food will be just terrible and sometimes we’re really not up for the risk, or worse, the discomfort or shame of having to send something we don’t like back. And sometimes it’s fine to take the risk and we come to learn we really do love sweetbreads, not sweetgreens, and perhaps our whole diet shifts to some decadent 19th-century French paleo practice in the style of des Esseintes.

Des_Esseintes_at_study_Zaidenberg_illustration
Arthur Zaidenberg’s depiction of des Esseintes, decadent hero extraordinaire, who embeds gems into a tortoise shell and has a perfume organ.

Other choices have higher stakes (or at least feel like they do) and easily lead to paralysis in the face of uncertainty. Working at a startup strengthens this muscle every day. Early on, founders are plagued by an unknown amount of unknown unknowns. We’d love to have a magic crystal ball that enables us to consider the future outcomes of a range of possible decisions, and always act in the way that guarantees future success. But the crystal balls don’t exist, and even if they did, we sometimes have so few prior assumptions to prime the pump that the crystal ball could only output an #ERROR message to indicate there’s just not enough there to forecast. As such, the only option available is to act and to learn from the data provided as a result of that action. To jumpstart empiricism, staking some claim and getting as comfortable as possible with the knowledge that the counterfactual will never be explored, and that each action taken shifts the playing field of possibility and probability and certainty slightly, calming minds and hearts. The core challenge startup leaders face is to enable the team to execute as if these conditions of uncertainty weren’t present, to provide a safe space for execution under the umbrella of risk and experiment. What’s fortunate, however, is that the goals of the enterprise are, if not entirely well-defined, at least circumscribed. Businesses exist to turn profits and that serves as a useful, if not always moral, constraint.

Big personal life decisions exhibit further variability because we but rarely know what to optimize for, and it can be incredibly counter-productive and harmful to either constrain ourselves too early or suffer from the psychological malaise of assuming there’s something wrong with us if we don’t have some master five-year plan.

This human condition is strange because we do need to set goals–it’s beneficial for us to consider second- and third-tier consequences, i.e., if our goal is to be healthy and fit, we should overcome the first-tier consequence of receiving pleasure when we drown our sorrows in a gallon of salted caramel ice cream–and yet it’s simply impossible for us to imagine the future accurately because, well, we overfit to our present and our past.

I’ll give a concrete example from my own experience. As I touched upon in a recent post about transitioning from academia to business, one reason why it’s so difficult to make a career change is that, while we never actually predict the future accurately, it’s easier to fear loss from a known predicament than to imagine gain from a foreign predicament.****** Concretely, when I was deciding whether to pursue a career in academia or the private sector in the fifth year in graduate school, I erroneously assumed that I was making a strict binary choice, that going into business meant forsaking a career teaching or publishing. As I was evaluating my decision, I never in my wildest dreams imagined that, a mere two years later, I would be invited to be an adjunct professor at the University of Calgary Faculty of Law, teaching about how new technologies were impacting traditional professional ethics. And I also never imagined that, as I gave more and more talks, I would subsequently be invited to deliver guest lectures at numerous business schools in North America. This path is not necessarily the right path for everyone, but it was and is the right path for me. In retrospect, I wish I’d constructed my decision differently, shifting my energy from fearing an unknown and unknowable future to paying attention to what energized me and made me happy and working to maximize the likelihood of such energizing moments occurring in my life. I still struggle to live this way, still fetishize what I think I should be wanting to do and living with an undercurrent of anxiety that a choice, a foreclosure of possibility, may send me down an irreconcilably wrong path. It’s a shitty way to be, and something I’m actively working to overcome.

So what should our policy be? How can we reconcile this terrific trade-off between exploration and exploitation, between exposing ourselves to something radically new and honing a given skill, between learning from a stranger and spending more time with a loved one, between opening our mind to some new field and developing niche knowledge in a given domain, between jumping to a new company with new people and problems, and exercising our resilience and loyalty to a given team?

There is no right answer. We’re all wired differently. We all respond to challenges differently. We’re all motivated by different things.

Perhaps death is the best constraint we have to provide some guidance, some policy to choose between choice A and choice B. For we can project ourselves forward to our imagined death bed, where we lie, alone, staring into the silent mirror of our hearts, and ask ourselves “Was my life was meaningful?” But this imagined scene is not actually a future state: it is a present policy. It is a principle we can use to evaluate decisions, a principle that is useful because it abstracts us from the mire of emotions overly indexed towards near-term goals and provides us with perspective.

And what’s perhaps most miraculous is that, at every present, we can sit there are stare into the silent mirror of our hearts and look back on the choices we’ve made and say, “That is me.” It’s so hard going forward, and so easy going backward. The proportion of what may come wanes ever smaller than the portion of what has been, never quite converging until it’s too late, and we are complete.


*Thank you, internet, for enabling me to recall the date with such exacting precision! Using my memory, I would have deduced the approximate date by 1) remembering that Robert Colpitts, my boyfriend at the time (Godspeed to him today, as he participates in a sit-a-thon fundraiser for the Interdependence Project in New York City, a worthy cause), attended with me, recalling how fresh our relationship was (it had to have been really fresh because the frequency with which we attended professional events together subsequently declined), and working backwards from the start to find the date; 2) remembering what I wore! (crazy!!), namely a sheer pink sleeveless shirt, a pair of wide-legged white pants that landed just slightly above the ankle and therefore looked great with the pair of beige, heeled sandals with leather so stiff it gave me horrific blisters that made running less than pleasant for the rest of the week. So I’d recently purchased those when my brother and his girlfriend visited, which was in late February (or early March?) 2016; 3) remembering that afterwards we went to some fast food Indian joint nearby in the Flatiron district, food was decent but not good enough to inspire me to return. So that would put is in the March-April, 2016 range, which is close but not the exact April 18. That’s one week after my birthday (April 11); I remember Robert and I had a wonderful celebration on my birthday. I felt more deeply cared for than I had in any past birthdays. But I don’t remember this talk relative to the birthday celebration (I do remember sending the marketing email to announce the Fast Forward Labs report on text summarization on my birthday, when I worked for half day and then met Robert at the nearby sweetgreen, where he ordered, as always, (Robert is a creature of exploitation) the kale caesar salad, after which we walked together across the Brooklyn Bridge to my house, we loved walking together, we took many, many walks together, often at night after work at the Promenade, often in the morning, before work, at the Promenade, when there were so few people around, so few people awake). I must say, I find the process of reconstructing when an event took place using temporal landmarks much more rewarding than searching for “Dan Hsu Interactive Learning NYAI” on Google to find the exact date. But the search terms themselves reveal something equally interesting about our heuristic mnemonics, as every time we reconstruct some theme or topic to retrieve a former conversation on Slack.

**Crazy that WeWork recently bought Meetup, although interesting to think about how the two business models enable what I am slowly coming to see as the most important creative force in the universe, the combinatory potential of minds meeting productively, where productively means that each mind is not coming as a blank slate but as engaged in a project, an endeavor, where these endeavors can productively overlap and, guided by a Smithian invisible hand, create something new. The most interesting model we hope to work on soon at integrate.ai is one that optimizes groups in a multiplayer game experience (which we lovingly call the polyamorous online dating algorithm), so mapping personality and playing style affinities to dynamically allocate the best next player to an alliance. Social compatibility is a fascinating thing to optimize for, in particular when it goes beyond just assembling a pleasant cocktail party to pairing minds, skills, and temperaments to optimize the likelihood of creating something beautiful and new.

***Sutton has one of the most beautiful minds in the field and he is kind. He is a person to celebrate. I am grateful our paths have crossed and thoroughly enjoyed our conversation on the In Context podcast.

***Maura Grossman and Gordon Cormack have written countless articles about the benefits of using active learning for technology assisted review (TAR), or classifying documents for their relevance for a lawsuit. The tradeoffs they weigh relate to system performance (gauged by precision and recall on a document set) versus time, cost, and effort to achieve that performance.

*****Hsu did not mention Haan or Choo. I added some more color.

******Note this same dynamic occurs in our current fears about the future economy. We worry a hell of a lot more about the losses we will incur if artificial intelligence systems automate existing jobs than we celebrate the possibilities of new jobs and work that might become possible once these systems are in place. This is also due to the fact that the future we imagine tends to be an adaptation of what we know today, as delightfully illustrated in Jean-Marc Côté’s anachronistic cartoons of the year 2000. The cartoons show what happens when our imagination only changes one variable as opposed to a set of holistically interconnected variables.

barber
19th-century cartoons show how we imagine technological innovations in isolation. That said, a hipster barber shop in Portland or Brooklyn could feature such a palimpsestic combination.

 

The featured image is a photograph I took of the sidewalk on State Street between Court and Clinton Streets in Brooklyn Heights. I presume a bird walked on wet concrete. Is that how those kinds of footprints are created? I may see those footprints again in the future, but not nearly as soon as I’d be able to were I not to have decided to move to Toronto in May. Now that I’ve thought about them, I may intentionally make the trip to Brooklyn next time I’m in New York (certainly before January 11, unless I die between now and then). I’ll have to seek out similar footprints in Toronto, or perhaps the snows of Alberta. 

 

 

 

 

 

 

 

 

Hearing Aids (Or, Metaphors are Personal)

Thursday morning, I gave the opening keynote at an event about the future of commerce at the Rotman School of Management in Toronto. I shared four insights:

  • The AI instinct is to view a reasoning problem as a data problem
    • Marketing hype leads many to imagine that artificial intelligence (AI) works like human brain intelligence. Words like “cognitive” lead us to assume that computers think like we think. In fact, succeeding with supervised learning, as I explain in this article and this previous post, involves a shift in perspective to reframe a reasoning task as a data collection task.
  • Advances in deep learning are enabling radical new recommender systems
    • My former colleague Hilary Mason always cited recommender systems as a classic example of a misunderstood capability. Data scientists often consider recommenders to be a solved problem, given the widespread use of collaborative filtering, where systems infer person B’s interests based on similarity with person A’s interests. This approach, however, is often limited by the “cold start” problem: you need person A and person B to do stuff before you can infer how they are similar. Deep learning is enabling us to shift from comparing past transactional history (structured data) to comparing affinities between people and products (person A loves leopard prints, like this ridiculous Kimpton-style robe!). This doesn’t erase the cold start problem wholesale, but it opens a wide range of possibilities because taste is so hard to quantify and describe: it’s much easier to point to something you like than to articulate why you like it.
  • AI capabilities are often features, not whole products
  • AI will dampen the moral benefits of commerce if we are not careful
    • Adam Smith is largely remembered for his theories on the value of the distribution of labor and the invisible hand that guides capitalistic markets. But he also wrote a wonderful treatise on moral sentiments where he argued that commerce is a boon to civilization because it forces us to interact with strangers; when we interact with strangers, we can’t have temper tantrums like we do at home with our loved ones; and this gives us practice in regulating our emotions, which is a necessary condition of rational discourse and the compromise at the heart of teamwork and democracy. As with many of the other narcissistic inclinations of our age, the logical extreme of personalization and eCommerce is a world where we no longer need to interact with strangers, no longer need to practice the art of tempered self-interest to negotiate a bargain. Being elegantly bored at a dinner party can be a salutatory boon to happiness. David Hume knew this, and died happy; Jean-Jacques Rousseau did not, and died miserable.
bill cunningham
This post on Robo Bill Cunningham does a good job explaining how image recognition capabilities are opening new roads in commerce and fashion.

An elderly couple approached me after the talk. I felt a curious sense of comfort and familiarity. When I give talks, I scan the audience for signs of comprehension and approval, my attention gravitating towards eyes that emit kindness and engagement. On Thursday, one of those loci of approval was an elderly gentleman seated in the center about ten rows deep. He and his Russian companion had to have been in their late seventies or early eighties. I did not fear their questions. I embraced them with the openness that only exists when there is no expectation of judgment.

She got right to the point, her accent lilted and slavic. “I am old,” she said, “but I would like to understand this technology. What recommendations would you give to elderly people like myself, who grew up in a different age with different tools and different mores (she looked beautifully put together in her tweed suit), to learn about this new world?”

I told her I didn’t have a good answer. The irony is that, by asking about something I don’t normally think about, she utterly stumped me. But it didn’t hurt to admit my ignorance and need to reflect. By contrast, I’m often able to conjure some plausible response to those whose opinion I worry about most, who elicit my insecurities because my sense of self is wrapped up in their approval. The left-field questions are ultimately much more interesting.

The first thing that comes to mind if we think about how AI might impact the elderly is how new voice recognition capabilities are lowering the barrier to entry to engage with complex systems. Gerontechnology is a thing, and there are many examples of businesses working to build robots to keep the elderly company or administer remote care. My grandmother, never an early adopter, loves talking to Amazon Alexa.

But the elegant Russian woman was not interested in how the technology could help her; She wanted to understand how it works. Democratizing knowledge is harder than democratizing utility, but ultimately much more meaningful and impactful (as a U Chicago alum, I endorse a lifelong life of the mind).

Then something remarkable happened. Her gentleman friend interceded with an anecdote.

“This,” he started, referring to the hearing aid he’d removed from his ear, “is an example of artificial intelligence. You can hear from my accent that I hail from the other side of the Atlantic (his accent was upper-class British; he’d studied at Harvard). Last year, we took a trip back with the family and stayed in quintessential British town with quintessential British pubs. I was elated by the prospect of returning to the locals of my youth, of unearthing the myriad memories lodged within childhood smells and sounds and tastes. But my first visit to a pub was intolerable! My hearing aid had become thoroughly Canadian, adapted to the acoustics of airy buildings where sound is free to move amidst tall ceilings. British pubs are confined and small! They trap the noise and completely bombarded my hearing aid. But after a few days, it adjusted, as these devices are wont to do these days. And this adaptation, you see, shows how devices can be intelligent.”

Of course! A hearing aid is a wonderful example of an adaptive piece of technology, of something whose functionality changes automatically with context. His anecdote brilliantly showed how technologies are always more than the functionalities they provide, are rather opportunities to expose culture and anthropology: Toronto’s adolescence as a city indexed by its architecture, in contrast to the wizened wood of an old-world pub; the frustrating compromises of age and fragility, the nostalgic ideal clipped by the time the device required to recalibrate; the incredible detail of the personal as a theatrical device to illustrate the universal.

What’s more, the history of hearing aids does a nice job illustrating the more general history of technology in this our digital age.

Partial deafness is not a modern phenomenon. As everywhere, the tools to overcome it have changed shape over time.

Screen Shot 2017-11-19 at 11.39.29 AM
This 1967 British Pathé primer on the history of hearing aids is a total trip, featuring radical facial hair and accompanying elevator music. They pay special attention to using the environment to camouflage cumbersome hearing aid machinery.

One thing that stands out when you go down the rabbit hole of hearing aid history is the importance of design. Indeed, historical hearing aids are analogue, not digital. People used to use naturally occurring objects, like shells or horns, to make ear trumpets like the one pictured in the featured image above. Some, including 18th-century portrait painter Joshua Reynolds, did not mind exposing their physical limitations publicly. Reynolds was renowned for carrying an ear trumpet and even represented his partial deafness in self-portraits painted later in life.

reynolds_self_portrait_1775_0
Reynolds’ self-portrait as deaf (1775)

Others preferred to deflect attention from their disabilities, camouflaging their tools in the environment or even transforming them into signals of power. At the height of the Napoleonic Age, King John VI of Portugal commissioned an acoustic throne with two open lion mouths at the end of the arms. These lion mouthes became his makeshift ears, design transforming weakness into a token of strength; Visitors were required to kneel before the chair and speak directly into the animal heads.

acoustic throne
King John VI’s acoustic throne, its lion head ears requiring submission

The advent of the telephone changed hearing aid technology significantly. Since the early 20th century, they’ve gone from being electronic to transistor to digital. Following the exponential dynamics of Moore’s Law, their size has shrunk drastically: contemporary tyrants need not camouflage their weakness behind visual symbols of power. Only recently have they been able to dynamically adapt to their surroundings, as in the anecdote told by the British gentleman at my talk. Time will tell how they evolve in the near future. Awesome machine listening research in labs like those run by Juan Pablo Bello at NYU may unlock new capabilities where aids can register urban mood, communicating the semantics of a surrounding as opposed to merely modulating acoustics. Making sense of sound requires slightly different machine learning techniques than making sense of images, as Bello explores in this recent paper. In 50 years time, modern digital hearing aids may seem as eccentric as a throne with lion-mouth ears.

The world abounds in strangeness. The saddest state of affairs is one of utter familiarity, is one where the world we knew yesterday remains the world we will know tomorrow. Is the trap of the filter bubble, the closing of the mind, the resilient force of inertia and sameness. I would have never included a hearing aid in my toolbox of metaphors to help others gain an intuition of how AI works or will be impactful. For I have never lived in the world the exact same way the British gentleman has lived in the world. Let us drink from the cup of the experiences we ourselves never have. Let us embrace the questions from left field. Let each week, let each day, open our perspectives one sliver larger than the day before. Let us keep alive the temperance of commerce and the sacred conditions of curiosity.


The featured image is of Madame de Meuron, a 20th-century Swiss aristocrat and eccentric. Meuron is like the fusion of Jean des Esseintes–the protagonist of Huysman’s paradigmatic decadent novel, À Rebours, the poisonous book featured in Oscar Wilde’s Picture of Dorian Gray–and Gertrude Stein or Peggy Guggenheim. She gives life to characters in Thomas Mann novels. She is a modern day Quijote, her mores and habits out of sync with the tailwinds of modernity. Eccentricity, perhaps, the symptom of history. She viewed her deafness as an asset, not a liability, for she could control the input from her surroundings: “So ghör i nume was i wott! – So I only hear what I want to hear!”

Clinamen

The Sagrada Familia is a castle built by Australian termites.


The Sagrada Familia is not a castle built by Australian termites, and never will be. Tis utter blasphemy.


The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell, in near silence, near but for the methodical gnawing, not unlike that of a mouse nibbling rapaciously on parched pasta left uneaten all these years but preserved under the thick dust on the thin cardboard with the thin plastic window enabling her to view what remained after she’d cooked just one serving, with butter, for her son, there emerged and fell, with the sublime transience of Andy Goldsworthy, a neo-Gothic church of organic complexity on par with that imagined by Antoni Gaudí i Cornet, whose Sagrada Familia is scheduled for completion in 2026, a full century after the architect died in a tragic tram crash, distracted by the recent rapture of his prayer.


The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell a structure so eerily resemblant of the one Antoni Gaudí imagined before he died, neglected like a beggar in his shabby clothes, the doctors unaware they had the chance to save the mind that preempted the fluidity of contemporary parametric architectural design by some 80 odd years, a mind supple like that of Poincaré, singular yet part of a Zeitgeist bent on infusing time into space like sandalwood in oil, inseminating Euclid’s cold geometry with femininity and life, Einstein explaining why Mercury moves retrograde, Gaudí rendering the holy spirit palpable as movement in stone, fractals of repetition and difference giving life to inorganic matter, tension between time and space the nadir of spirituality, as Andrei Tarkovsky went on to explore in his films.

tarkovsky mirror
From Andrei Tarkovsky’s Mirror. As Tarkovsky wrote of his films in Sculpting in Time: “Just as a sculptor takes a lump of marble, and, inwardly conscious of the features of his finished piece, removes everything that is not a part of it — so the film-maker, from a ‘lump of time’ made up of an enormous, solid cluster of living facts, cuts off and discards whatever he does not need, leaving only what is to be an element of the finished film.”

The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell a structure so eerily resemblant of the one Antoni Gaudí imagined before he died, with the (seemingly crucial) difference that the termites built their temple without blueprints or plan, gnawing away the silence as a collectivity of single stochastic acts which, taken together over time, result in a creation that appears, to our meaning-making minds, to have been created by an intelligent designer, this termite Sagrada Familia a marvelous instance of what Dennett calls Darwin’s strange inversion of reasoning, an inversion that admits to the possibility that absolute ignorance can serve as master artificer, that IN ORDER TO MAKE A PERFECT AND BEAUTIFUL MACHINE, IT IS NOT REQUISITE TO KNOW HOW TO MAKE IT*, that structures might emerge from the local activity of multiple parts, amino acids folding into proteins, bees flying into swarms, bumper-to-bumper traffic suddenly flowing freely, these complex release valves seeming like magic to the linear perspective of our linear minds.


The Sagrada Familia is not a castle built by Australian termites, and yet, the eerie resemblance between the termite and the tourist Sagrada Familias serves as a wonderful example to anchor a very important cultural question as we move into an age of post-intelligent design, where the technologies we create exhibit competence without comprehension, diagnosing lungs as cancerous or declaring that individuals merit a mortgage or recommending that a young woman would be a good fit for a role on a software engineering team or getting better and better at Go by playing millions of games against itself in a schizophrenic twist resemblant of the pristine pathos of Stephan Zweig, one’s own mind an asylum of exiled excellence during the travesty of the second world war, why, we’ve come full circle and stand here at a crossroads, bidden by a force we ourselves created to accept the creative potential of Lucretius’ swerve, to kneel at the altar of randomness, to appreciate that computational power is not just about shuffling 1s and 0s with speed but shuffling them fast enough to enable a tiny swerve to result in wondrous capabilities, and to watch as, perhaps tragically, we apply a framework built for intelligent design onto a Darwinian architecture, clipping the wings of stochastic potential, working to wrangle our gnawing termites into a straight jacket of cause, while the systems beating Atari, by no act of strategic foresight but by the blunt speed of iteration, make a move so strange and so outside the realm of verisimilitude that, as Kasparov succumbing to Deep Blue, we misinterpret a bug for brilliance.


The Sagrada Familia is not a castle built by Australian termites, and yet, it seems plausible that Gaudí would have reveled in the eerie resemblance between a castle built by so many gnawing termites and the temple Josep Maria Bocabella i Verdaguer, a bookseller with a popular fundamentalist newspaper, “the kind that reminded everybody that their misery was punishment for their sins,”**commissioned him to build.

Bocabella
A portrait of Josep Maria Bocabella, who commissioned Gaudí to build the Sagrada Familia.

Or would he? Gaudí was deeply Catholic. He genuflected at the temple of nature, seeing divine inspiration in the hexagons of honeycombs, imagining the columns of the Sagrada Familia to lean, buttresses, as symbols of the divine trilogy of the father (the vertical axis), son (the horizontal axis), and holy spirit (the vertical meeting the horizontal in crux of the diagonal). His creativity, therefore, always stemmed from something more than intelligent design, stood as an act of creative prayer to render homage to God the creator by creating an edifice that transformed, in fractals of repetition in difference, inert stone into movement and life.

columns
The top of the columns inside the Sagrada Familia have twice as many lines as the roots,             the doubling generating a sense of movement and life.

The Sagrada Familia is not a castle built by Australian termites, and yet, the termite Sagrada Familia actually exists as a complete artifact, its essence revealed to the world rather than being stuck in unfinished potential. And yet, while we wait in joyful hope for its imminent completion, this unfinished, 144-year-long architectural project has already impacted so many other architects, from Frank Gehry to Zaha Hadid. This unfinished vision, this scaffold, has launched a thousand ships of beauty in so many other places, changing the skylines of Bilbao and Los Angeles and Hong Kong. Perhaps, then, the legacy of the Sagrada Family is more like that of Jodorowsky’s Dune, an unfinished film that, even from its place of stunted potential,  changed the history of cinema. Perhaps, then, the neglect the doctors showed to Gaudí, the bearded beggar distracted by his act of prayer, was one of those critical swerves in history. Perhaps, had Gaudí lived to finish his work, architects during the century wouldn’t have been as puzzled by the parametric requirements of his curves and the building wouldn’t have gained the puzzling aura it gleans to this day. Perhaps, no matter how hard we try to celebrate and accept the immense potential of stochasticity, we will always be makers of meaning, finders of cause, interpreters needing narrative to live grounded in our world. And then again, perhaps not.


The Sagrada Familia is not a castle built by Australian termites. The termites don’t care either way. They’ll still construct their own Sagrada Familia.


The Sagrada Familia is a castle built by Australian termites. How wondrous. How essential must be these shapes and forms.


The Sagrada Familia is a castle built by Australian termites. It is also an unfinished neo-Gothic church in Barcelona, Spain. Please, terrorists, please don’t destroy this temple of unfinished potential, this monad brimming the history of the world, each turn, each swerve a pivot down a different section of the encyclopedia, coming full circle in its web of knowledge, imagination, and grace.


The Sagrada Familia is a castle built by Australian termites. We’ll never know what Gaudí would have thought about the termite castle. All we have are the relics of his Poincaréan curves, and fish lamps to illuminate our future.

fish-4
Frank Gehry’s fish lamps, which carry forth the spirit of Antoni Gaudí

*Dennett reads these words, penned in 1868 by Robert Beverley MacKenzie, with pedantic panache, commenting that the capital letters were in the original.

**Much in this post was inspired by Roman Mars’ awesome 99% Invisible podcast about the Sagrada Familia, which features the quotation about Bocabella’s newspaper.

The featured image comes from Daniel Dennett’s From Bacteria to Bach and Back. I had the immense pleasure of interviewing Dan on the In Context podcast, where we discuss many of the ideas that appear in this post, just in a much more cogent form. 

 

Degrees of Knowledge

That familiar discomfort of wanting to write but not feeling ready yet.*

(The default voice pops up in my brain: “Then don’t write! Be kind to yourself! Keep reading until you understand things fully enough to write something cogent and coherent, something worth reading.”

The second voice: “But you committed to doing this! To not write** is to fail.***”

The third voice: “Well gosh, I do find it a bit puerile to incorporate meta-thoughts on the process of writing so frequently in my posts, but laziness triumphs, and voilà there they come. Welcome back. Let’s turn it to our advantage one more time.”)

This time the courage to just do it came from the realization that “I don’t understand this yet” is interesting in itself. We all navigate the world with different degrees of knowledge about different topics. To follow Wilfred Sellars, most of the time we inhabit the manifest image, “the framework in terms of which man came to be aware of himself as man-in-the-world,” or, more broadly, the framework in terms of which we ordinarily observe and explain our world. We need the manifest image to get by, to engage with one another and not to live in a state of utter paralysis, questioning our every thought or experience as if we were being tricked by the evil genius Descartes introduces at the outset of his Meditations (the evil genius toppled by the clear and distinct force of the cogito, the I am, which, per Dan Dennett, actually had the reverse effect of fooling us into believing our consciousness is something different from what it actually is). Sellars contrasts the manifest image with the scientific image: “the scientific image presents itself as a rival image. From its point of view the manifest image on which it rests is an ‘inadequate’ but pragmatically useful likeness of a reality which first finds its adequate (in principle) likeness in the scientific image.” So we all live in this not quite reality, our ability to cooperate and coexist predicated pragmatically upon our shared not-quite-accurate truths. It’s a damn good thing the mess works so well, or we’d never get anything done.

Sellars has a lot to say about the relationship between the manifest and scientific images, how and where the two merge and diverge. In the rest of this post, I’m going to catalogue my gradual coming to not-yet-fully understanding the relationship between mathematical machine learning models and the hardware they run on. It’s spurring my curiosity, but I certainly don’t understand it yet. I would welcome readers’ input on what to read and to whom to talk to change my manifest image into one that’s slightly more scientific.

So, one common thing we hear these days (in particular given Nvidia’s now formidable marketing presence) is that graphical processing units (GPUs) and tensor processing units (TPUs) are a key hardware advance driving the current ubiquity in artificial intelligence (AI). I learned about GPUs for the first time about two years ago and wanted to understand why they made it so much faster to train deep neural networks, the algorithms behind many popular AI applications. I settled with an understanding that the linear algebra–operations we perform on vectors, strings of numbers oriented in a direction in an n-dimensional space–powering these applications is better executed on hardware of a parallel, matrix-like structure. That is to say, properties of the hardware were more like properties of the math: they performed so much more quickly than a linear central processing unit (CPU) because they didn’t have to squeeze a parallel computation into the straightjacket of a linear, gated flow of electrons. Tensors, objects that describe the relationships between vectors, as in Google’s hardware, are that much more closely aligned with the mathematical operations behind deep learning algorithms.

There are two levels of knowledge there:

  • Basic sales pitch: “remember, GPU = deep learning hardware; they make AI faster, and therefore make AI easier to use so more possible!”
  • Just above the basic sales pitch: “the mathematics behind deep learning is better represented by GPU or TPU hardware; that’s why they make AI faster, and therefore easier to use so more possible!”

At this first stage of knowledge, my mind reached a plateau where I assumed that the tensor structure was somehow intrinsically and essentially linked to the math in deep learning. My brain’s neurons and synapses had coalesced on some local minimum or maximum where the two concepts where linked and reinforced by talks I gave (which by design condense understanding into some quotable meme, in particular in the age of Twitter…and this requirement to condense certainly reinforces and reshapes how something is understood).

In time, I started to explore the strange world of quantum computing, starting afresh off the local plateau to try, again, to understand new claims that entangled qubits enable even faster execution of the math behind deep learning than the soddenly deterministic bits of C, G, and TPUs. As Ivan Deutsch explains this article, the promise behind quantum computing is as follows:

In a classical computer, information is stored in retrievable bits binary coded as 0 or 1. But in a quantum computer, elementary particles inhabit a probabilistic limbo called superposition where a “qubit” can be coded as 0 and 1.

Here is the magic: Each qubit can be entangled with the other qubits in the machine. The intertwining of quantum “states” exponentially increases the number of 0s and 1s that can be simultaneously processed by an array of qubits. Machines that can harness the power of quantum logic can deal with exponentially greater levels of complexity than the most powerful classical computer. Problems that would take a state-of-the-art classical computer the age of our universe to solve, can, in theory, be solved by a universal quantum computer in hours.

What’s salient here is that the inherent probabilism of quantum computers make them even more fundamentally aligned with the true mathematics we’re representing with machine learning algorithms. TPUs, then, seem to exhibit a structure that best captures the mathematical operations of the algorithms, but exhibit the fatal flaw of being deterministic by essence: they’re still trafficking in the binary digits of 1s and 0s, even if they’re allocated in a different way. Quantum computing seems to bring back an analog computing paradigm, where we use aspects of physical phenomena to model the problem we’d like to solve. Quantum, of course, exhibits this special fragility where, should the balance of the system be disrupted, the probabilistic potential reverts down to the boring old determinism of 1s and 0s: a cat observed will be either dead or alive, as the harsh law of the excluded middle haunting our manifest image.

What, then, is the status of being of the math? I feel a risk of falling into Platonism, of assuming that a statement like “3 is prime” refers to some abstract entity, the number 3, that then gets realized in a lesser form as it is embodied on a CPU, GPU, or cup of coffee. It feels more cogent to me to endorse mathematical fictionalism, where mathematical statements like “3 is prime” tell a different type of truth than truths we tell about objects and people we can touch and love in our manifest world.****

My conclusion, then, is that radical creativity in machine learning–in any technology–may arise from our being able to abstract the formal mathematics from their substrate, to conceptually open up a liminal space where properties of equations have yet to take form. This is likely a lesson for our own identities, the freeing from necessity, from assumption, that enables us to come into the self we never thought we’d be.

I have a long way to go to understand this fully, and I’ll never understand it fully enough to contribute to the future of hardware R&D. But the world needs communicators, translators who eventually accept that close enough can be a place for empathy, and growth.


*This holds not only for writing, but for many types of doing, including creating a product. Agile methodologies help overcome the paralysis of uncertainty, the discomfort of not being ready yet. You commit to doing something, see how it works, see how people respond, see what you can do better next time. We’re always navigating various degrees of uncertainty, as Rich Sutton discussed on the In Context podcast. Sutton’s formalization of doing the best you can with the information you have available today towards some long-term goal, but learning at each step rather than waiting for the long-term result, is called temporal-difference learning.

**Split infinitive intentional.

***Who’s keeping score?

****That’s not to say we can’t love numbers, as Euler’s Identity inspires enormous joy in me, or that we can’t love fictional characters, or that we can’t love misrepresentations of real people that we fabricate in our imaginations. I’ve fallen obsessively in love with 3 or 4 imaginary men this year, creations of my imagination loosely inspired by the real people I thought I loved.

The image comes from this site, which analyzes themes in films by Darren Aronofsky. Maximilian Cohen, the protagonist of Pi, sees mathematical patterns all over the place, which eventually drives him to put a drill into his head. Aronofsky has a penchant for angst. Others, like Richard Feynman, find delight in exploring mathematical regularities in the world around us. Soap bubbles, for example, offer incredible complexity, if we’re curious enough to look.

Macro_Photography_of_a_soap_bubble
The arabesques of a soap bubble

 

AI Standing On the Shoulders of Giants

My dear friend and colleague Steve Irvine and I will represent our company integrate.ai at the ElevateToronto Festival this Wednesday (come say hi!). The organizers of a panel I’m on asked us to prepare comments about what makes an “AI-First Organization.”

There are many bad answers to this question. It’s not helpful for business leaders to know that AI systems can just-about reliably execute perception tasks like recognizing a puppy or kitty in a picture. Executives think that’s cute, but can’t for the life of them see how that would impact their business. Seeing these parallels requires synthetic thinking and expertise in AI, the ability to see how the properties of a business’ data set are structurally similar to those of the pixels in an image, which would merit the application of similar mathematical model to solve two problems that instantiate themselves quite differently in particular contexts. Most often, therefore, being exposed to fun breakthroughs leads to frustration. Research stays divorced from commercial application.

Another bad answer is mindlessly mobilize hype to convince businesses they should all be AI First. That’s silly.

On the one hand, as Bradford Cross convincingly argues, having “AI deliver core value” is a pillar of a great vertical AI startup. Here, AI is not an afterthought added like a domain suffix to secure funding from trendy VCs, but rather a necessary and sufficient condition of solving an end user problem. Often, this core competency is enhanced by other statistical features. For example, while the core capability of satellite analysis tools like Orbital Insight or food recognition tools like Bitesnap is image recognition*, the real value to customers arises with additional statistical insights across an image set (Has the number of cars in this Walmart parking lot increased year over year? To feel great on my new keto diet, what should I eat for dinner if I’ve already had two sausages for breakfast?).

On the other hand, most enterprises have been in business for a long time and have developed the Clayton Christensen armature of instilled practices and processes that make it too hard to flip a switch to just become AI First. (As Gottfried Leibniz said centuries before Darwin, natura non saltum facit  – nature does not make jumps). One false assumption about enterprise AI is that large companies have lots of data and therefore offer ripe environments for AI applications. Most have lots of data indeed, but have not historically collected, stored, or processed their data with an eye towards AI. That creates a very different data environment than those found at Google or Facebook, requiring tedious work to lay the foundations to get started. The most important thing enterprises need to keep in mind is to never to let perfection be the enemy of the good, knowing that no company has perfect data. Succeeding with AI takes a guerrilla mindset, a willingness to make do with close enough and the knack of breaking down the ideal application into little proofs of concepts that can set the ball rolling down the path towards a future goal.

Screen Shot 2017-09-10 at 12.14.38 PM
The swampy reality of working with enterprise data.

What large enterprises do have is history. They’ve been in business for a while. They’ve gotten really good at doing something, it’s just not always something a large market still wants or needs. And while it’s popular for executives to say that they are “a technology company that just so happen to be financial services/healthcare/auditing/insurance company,” I’m not sure this attitude delivers the best results for AI. Instead, I think it’s more useful for each enterprise to own up to its identity as a Something-Else-First company, but to add a shift in perspective to go from a Just-Plain-Old-Something-Else-First Company to a Something-Else-First-With-An-AI-Twist company.

The shift in perspective relates to how an organization embodies its expertise and harnesses traces of past work.** AI enables a company to take stock of the past judgments, work product, and actions of employees – a vast archive of years of expertise in being Something-Else-First – and either concatenate together these past actions to automate or inform a present action.

To be pithy, AI makes it easier for us to stand on the shoulder of giants.

An anecdote helps illustrate what this change in perspective might look like in practice. A good friend did his law degree ten years ago at Columbia. One final exam exercise was to read up on a case and write how a hypothetical judge would opine. Having procrastinated until the last minute, my friend didn’t have time to read and digest all the materials. What he did have was a study guide comprising answers former Columbia law students had given to the same exam question for the past 20 years. And this gave him a brilliant idea. As students all have to have high LSAT scores and transcripts to get into Columbia Law, he thought, we can assume that all past students have more or less the same capability of answering the question. So wouldn’t he do a better job predicting a judge’s opinion by finding the average answer from hundreds of similarly-qualified students rather than just reporting his own opinion? So as opposed to reading the primary materials, he shifted and did a statistical analysis of secondary materials, an analysis of the judgments that others in his position had given for a given task. When he handed in his assignment, the professor remarked on the brilliance of the technique, but couldn’t reward him with a good grade because it missed the essence of what he was tested for. It was a different style of work, a different style of jurisprudence.

Something-Else-First AI organizations work similarly. Instead of training each individual employee to do the same task, perhaps in a way similar to those of the past, perhaps with some new nuance, organizations capture past judgments and actions across a wide base of former employees and use these judgments – these secondary sources – to inform current actions. With enough data to train an algorithm, the actions might be completely automated. Most often there’s not enough to achieve satisfactory accuracy in the predictions, and organizations instead present guesses to current employees, who can provide feedback to improve performance in the future.

This ability to recycle past judgments and actions is very powerful. Outside enterprise applications, AI’s ability to fast forward our ability to stand on the shoulders of giants is shifting our direction as a species. Feedback loops like filtering algorithms on social media sites have the potential to keep us mired in an infantile past, with consequences that have been dangerous for democracy. We have to pay attention to that, as news and the exchange of information, all the way back to de Tocqueville, has always been key to democracy. Expanding self-reflexive awareness broadly across different domains of knowledge will undoubtedly change how disciplines evolve going forward. I remain hopeful, but believe we have some work to do to prepare the citizenship and workforce of the future.


*Image recognition algorithms do a great job showing why it’s dangerous for an AI company to bank its differentiation and strategy on an algorithmic capability as opposed to a unique ability to solve a business problem or amass a proprietary data set. Just two years ago, image recognition was a breakthrough capability just making its way to primetime commercial use. This June, Google released image recognition code for free via its Tensorflow API. That’s a very fast turnaround from capability to commodity, a transition of great interest to my former colleagues at Fast Forward Labs.

**See here for ethical implications of this backward-looking temporality.

The featured image comes from a twelfth-century manuscript by neo-platonist philosopher Bernard de Chartres. It illustrates this quotation: 

“We are like dwarfs on the shoulders of giants, so that we can see more than they, and things at a greater distance, not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up by their giant size.”

It’s since circulated from Newton to Nietzsche, each indicating indebtedness to prior thinkers as inspiration for present insights and breakthroughs. 

The Temporality of Artificial Intelligence

Nothing sounds more futuristic than artificial intelligence (AI). Our predictions about the future of AI are largely shaped by science fiction. Go to any conference, skim any WIRED article, peruse any gallery of stock images depicting AI*, and you can’t help but imagine AI as a disembodied cyberbabe (as in Spike Jonze’s Her), a Tin Man (who just wanted a heart!) gone rogue (as in the Terminator), or, my personal favorite, a brain out-of-the-vat-like-a-fish-out-of-water-and-into-some-non-brain-appropriate-space-like-a-robot-hand-or-an-android-intestine (as in Krang in the Ninja Turtles).

Screen Shot 2017-07-16 at 9.11.35 AM
A legit AI marketing photo!
Screen Shot 2017-07-16 at 9.12.33 AM
Krang should be the AI mascot, not the Terminator!

The truth is, AI looks more like this:

Screen Shot 2017-07-16 at 9.16.46 AM
A slide from Pieter Abbeel’s lecture at MILA’s Reinforcement Learning Summer School.

Of course, it takes domain expertise to picture just what kind of embodied AI product such formal mathematical equations would create. Visual art, argued Gene Kogan, a cosmopolitan coder-artist, may just be the best vehicle we have to enable a broader public to develop intuitions of how machine learning algorithms transform old inputs into new outputs.

 

One of Gene Kogan‘s beautiful machine learning recreations.

What’s important is that our imagining AI as superintelligent robots — robots that process and navigate the world with a similar-but-not-similar-enough minds, lacking values and the suffering that results from being social — precludes us from asking the most interesting philosophical and ethical questions that arise when we shift our perspective and think about AI as trained on past data and working inside feedback loops contingent upon prior actions.

Left unchecked, AI may actually be an inherently conservative technology. It functions like a time warp, capturing trends in human behavior from our near past and projecting them into our near future. As Alistair Croll recently argued, “just because [something was] correct in the past doesn’t make it right for the future.”

Our Future as Recent Past: The Case of Word Embeddings

In graduate school, I frequently had a jarring experience when I came home to visit my parents. I was in my late twenties, and was proud of the progress I’d made evolving into a more calm, confident, and grounded me. But the minute I stepped through my parents’ door, I was confronted with the reflection of a past version of myself. Logically, my family’s sense of my identity and personality was frozen in time: the last time they’d engaged with me on a day-to-day basis was when I was 18 and still lived at home. They’d anticipate my old habits, tiptoeing to avoid what they assumed would be a trigger for anxiety. Their behavior instilled doubt. I questioned whether the progress I assumed I’d made was just an illusion, and quickly fall back into old habits.

In fact, the discomfort arose from a time warp. I had progressed, I had grown, but my parents projected the past me onto the current me, and I regressed under the impact of their response. No man is an island. Our sense of self is determined not only by some internal beacon of identity, but also (for some, mostly) by the self we interpret ourselves to be given how others treat us and perceive us. Each interaction nudges us in some direction, which can be a regression back to the past or a progression into a collective future.

AI systems have the potential to create this same effect at scale across society. The shock we feel upon learning that algorithms automating job ads show higher-paying jobs to men rather than women, or recidivism-prediction tools place African-American males at higher risk than other races and classes, results from recapitulating issues we assume society has already advanced beyond. Sometimes we have progressed, and the tools are simply reflections for the real-world prejudices of yore; sometimes we haven’t progressed as much as we’d like to pretend, and the tools are barometers for the hard work required to make the world a world we want to live in.

Consider this research about a popular natural language processing (NLP) technique called word embeddings by Bolukbasi and others in 2016.**

The essence of NLP is to to make human talk (grey, messy, laden with doubts and nuances and sarcasm and local dialectics and….) more like machine talk (black and white 1s and 0s). Historically, NLP practitioners did this by breaking down language into different parts and using those parts as entities in a system.

tree why_graphs002
Tree graphs parsing language into parts, inspired by linguist Noam Chomsky.

Naturally, this didn’t get us as far as we’d hoped. With the rise of big data in the 2000s, many in the NLP community adopted a new approach based on statistics. Instead of teasing out structure in language with trees, they used massive processing power to find repeated patterns across millions of example sentences. If two words (or three, or four, or the general case, n) appeared multiple times in many different sentences, programmers assumed the statistical significance of that word pair conferred semantic meaning. Progress was made, but this n-gram technique failed to capture long-term, hierarchical relationships in language: how words at the end of a sentence or paragraph inflect the meaning of the beginning, how context inflects meaning, how other nuances make language different from a series of transactions at a retail store.

Word embeddings, made popular in 2013 with a Google technique called word2vec, use a vector, a string of numbers pointing in some direction in an N-dimensional space***, to capture (more of) the nuances of contextual and long-term dependencies (the 6589th number in the string, inflected in the 713th dimension, captures the potential relationship between a dangling participle and the subject of the sentence with 69% accuracy). This conceptual shift is powerful: instead of forcing simplifying assumptions onto language, imposing arbitrary structure to make language digestible for computers, these embedding techniques accept that meaning is complex, and therefore must be processed with techniques that can harness and harvest that complexity. The embeddings make mathematical mappings that capture latent relationships our measly human minds may not be able to see. This has lead to breakthroughs in NLP, like the ability to automatically summarize text (albeit in a pretty rudimentary way…) or improve translation systems.

With great power, of course, comes great responsibility. To capture more of the inherent complexity in language, these new systems require lots of training data, enough to capture patterns versus one-off anomalies. We have that data, and it dates back into our recent – and not so recent – past. And as we excavate enough data to unlock the power of hierarchical and linked relationships, we can’t help but confront the lapsed values of our past.

Indeed, one powerful property of word embeddings is their ability to perform algebra that represents analogies. For example, if we input: “man is to woman as king is to X?” the computer will output: “queen!” Using embedding techniques, this operation is conducted by using a vector – a string of numbers mapped in space – as a proxy for analogy: if two vectors have the same length and point in the same direction, we consider the words at each pole semantically related.

embeddings
Embeddings use vectors as a proxy for semantics and syntax.

Now, Bolukbasi and fellow researchers dug into this technique and found some relatively disturbing results.

Screen Shot 2017-07-30 at 10.27.32 AM

It’s important we remember that the AI systems themselves are neutral, not evil. They’re just going through the time warp, capturing and reflecting past beliefs we had in our society that leave traces in our language. The problem is, if we are unreflective and only gauge the quality of our systems based on the accuracy of their output, we may create really accurate but really conservative or racist systems (remember Microsoft Tay?). We need to take a proactive stance to make sure we don’t regress back to old patterns we thought we’ve moved past. Our psychology is pliable, and it’s very easy for our identities to adapt to the reflections we’re confronted with in the digital and physical world.

Bolukbasi and his co-authors took an interesting, proactive approach to debiasing their system, which involved mapping the words associated with gender in two dimensions, where the X axis represented gender (girls to the left and boys to the right). Words associated with gender but that don’t stir sensitivities in society were mapped under the X axis (e.g., girl : sister :: boy : brother). Words that do stir sensitivities (e.g., girl : tanning :: boy : firepower) were forced to collapse down to the Y axis, stripping them of any gender association.

Screen Shot 2017-07-30 at 10.32.47 AM

Their efforts show what mindfulness may look like in the context of algorithmic design. Just as we can’t run away from the inevitable thoughts and habits in our mind, given that they arise from our past experience, the stuff that shapes our minds to make us who we are, so too we can’t run away from the past actions of our selves and our society. It doesn’t help our collective society to blame the technology as evil, just as it doesn’t help any individual to repress negative emotions. We are empowered when we acknowledge them for what they are, and proactively take steps to silence and harness them so they don’t keep perpetuating in the future. This level of awareness is required for us to make sure AI is actually a progressive, futuristic technology, not one that traps us in the unfortunate patterns of our collective past.

Conclusion

This is one narrow example of the ethical and epistemological issues created by AI. In a future blog post in this series, I’ll explore how reinforcement learning frameworks – in particular contextual bandit algorithms – shape and constrain the data collected to train their systems, often in a way that mirrors the choices and constraints we face when we make decisions in real life.


*Len D’Avolio, Founder CEO of healthcare machine learning startup Cyft, curates a Twitter feed of the worst-ever AI marketing images every Friday. Total gems.

**This is one of many research papers on the topic. FAT ML is a growing community focused on fairness, accountability, and transparency in machine learning. the brilliant Joanna Bryson has written articles about bias in NLP systems. Cynthia Dwork and Toni Pitassi are focusing more on bias (though still do great work on differential privacy). Blaise Aguera y Arcas’ research group at Google thinks deeply about ethics and policy and recently published an article debunking the use of physiognomy to predict criminality. My colleague Tyler Schnoebelen recently gave a talk on ethical AI product design at Wrangle. The list goes on.

***My former colleague Hilary Mason loved thinking about the different ways we imagine spaces of 5 dimensions or greater.

The featured image is from Swedish film director Ingmar Bergman‘s Wild Strawberries (1957). Bergman’s films are more like philosophical essays than Hollywood thrillers. He uses medium, with its ineluctable flow, its ineluctable passage of time, to ponder the deepest questions of meaning and existence. A clock without hands, at least if we’re able to notice it, as our mind’s eye likely fills in the semantic gaps with the regularity of practice and habit. The eyes below betokening what we see and do not see. Bergman died June 30, 2007 the same day as Michelangelo Antonioni, his Italian counterpart. For me, the coincidence was as meaningful as that of the death of John Adams and Thomas Jefferson on July 4, 1826.  

The Unreasonable Effectiveness of Proxies*

Imagine it’s December 26. You’re right smack in the midst of your Boxing Day hangover, feeling bloated and headachy and emotionally off from the holiday season’s interminable festivities. You forced yourself to eat Aunt Mary’s insipid green bean casserole out of politeness and put one too many shots of dark rum in your eggnog. The chastising power of the prefrontal cortex superego is in full swing: you start pondering New Year’s Resolutions.

Lose weight! Don’t drink red wine for a year! Stop eating gluten, dairy, sugar, processed foods, high-fructose corn syrup–just stop eating everything except kale, kefir, and kimchi! Meditate daily! Go be a free spirit in Kerala! Take up kickboxing! Drink kombucha and vinegar! Eat only purple foods!

Right. Check.

(5:30 pm comes along. Dad’s offering single malt scotch. Sure, sure, just a bit…neat, please…)**

We’re all familiar with how hard it is to set and stick to resolutions. That’s because our brains have little instant gratification monkeys flitting around on dopamine highs in constant guerrilla warfare against the Rational Decision Maker in the prefrontal cortex (Tim Urban’s TEDtalk on procrastination is a complete joy). It’s no use beating ourselves up over a physiological fact. The error of Western culture, inherited from Catholicism, is to stigmatize physiology as guilt, transubstantiating chemical processes into vehicles of self deprecation with the same miraculous power used to transform just-about-cardboard wafers into the living body of Christ. Eastern mindsets, like those proselytized by Buddha, are much more empowering and pragmatic: if we understand our thoughts and emotions to be senses like sight, hearing, touch, taste, smell, we can then dissociate self from thoughts. Our feelings become nothing but indices of a situation, organs to sense a misalignment between our values–etched into our brains as a set of habitual synaptic pathways–and the present situation around us. We can watch them come in, let them sit there and fester, and let them gradually fade before we do something we regret. Like waiting out the internal agony until the baby in front of you in 27G on your overseas flight to Sydney stops crying.

Resolutions are so hard to keep because we frame them the wrong way. We often set big goals, things like, “in 2017 I’ll lose 30 pounds” or “in 2017 I’ll write a book.” But a little tweak to the framework can promote radically higher chances for success. We have to transform a long-term, big, hard-to-achieve goal into a short-term, tiny, easy-to-achieve action that is correlated with that big goal. So “lose weight” becomes “eat an egg rather than cereal for breakfast.” “Write a book” becomes “sit down and write for 30-minutes each day.” “Master Mandarin Chinese” becomes “practice your characters for 15 minutes after you get home from work.” The big, scary, hard-to-achieve goal that plagues our consciousness becomes a small, friendly, easy-to-achieve action that provides us with a little burst of accomplishment and satisfaction. One day we wake up and notice we’ve transformed.

It’s doubtful that the art of finding a proxy for something that is hard to achieve or know is the secret of the universe. But it may well be the secret to adapting the universe to our measly human capabilities, both at the individual (transform me!) and collective (transform my business!) level. And the power extends beyond self-help: it’s present in the history of mathematics, contemporary machine learning, and contemporary marketing techniques known as growth hacking.

Ut unum ad unum, sic omnia ad omnia: Archimedes, Cavalieri, and Calculus

Many people are scared of math. Symbols are scary: they’re a type of language and it takes time and effort to learn what they mean. But most of the time people struggle with math because they were badly taught. There’s no clearer example of this than calculus, where kids memorize equations that something is so instead of conceptually grasping why something is so.

The core technique behind calculus–and I admit this just scratches the surface–is to reduce something that’s hard to know down to something that’s easy to know. Slope is something we learn in grade school: change in y divided by change in x, how steep a line is. Taking the derivative is doing this same process but on a twisting, turning, meandering curve rather than just a line. This becomes hard because we add another dimension to the problem: with a line, the slope is the same no matter what x we put in; with a curve, the slope changes with our x input value, like a mountain range undulating from mesa to vertical extreme cliff. What we do in differential calculus is find a way to make a line serve as a proxy for a curve, to turn something we don’t know how to do and into something know how to do. So we take magnifying glasses with ever increasing potency and zoom in until our topsy-turvy meandering curve becomes nothing but a straight line; we find the slope; and then we sum up those little slopes all the way across our curve. The big conceptual breakthrough Newton and Leibniz made in the 17th century was to turn this proxy process into something continuous and infinite: to cross a conceptual chasm between a very, very small number and a number so small that it was effectively zero. Substituting close-enough-for-government-work-zero with honest-to-goodness-zero did not go without strong criticism from the likes of George Berkeley, a prominent philosopher of the period who argued that it’s impossible for us to say anything about the real world because we can only know how our minds filter the real world. But its pragmatic power to articulate the mechanics of the celestial motions overcame such conceptual trifles.***

riemann sum
Riemann Sums use the same proxy method to find the area under a curve. One replaces that hard task with the easier task of summing up the area of rectangles approximate the area of the curve.

This type of thinking, however, did not start in the 17th century. Greek mathematicians like Archimedes (famous for screaming Eureka! (I’ve found it!) and running around naked like a madman when he noticed that water levels in the bathtub rose proportionately to his body mass) used its predecessor, the method of exhaustion, to find the area of a shape like a circle or a blob by inscribing it within a series of easier-to-measure shapes like polygons or squares to get an approximation of the area by proxy to the polygon.

exhaustion
The method of exhaustion in ancient Greek math.

It’s challenging for us today to reimagine what Greek geometry was like because we’re steeped in a post-Cartesian mindset, where there’s an equivalence between algebraic expressions and geometric shapes. The Greeks thought about shapes as shapes. The math was tactical, physical, tangible. This mindset leads to interesting work in the Renaissance like Bonaventura’s Cavalieri’s method of indivisibles, which showed that the areas of two shapes were equivalent (often a hard thing to show) by cutting the shapes into parts and showing that each of the parts were equivalent (an easier thing to show). He turns the problem of finding equivalence into an analogy, ut unum ad unum, sic omnia ad omnia–as the one is to the one, so all are to all–substituting the part for the whole to turn this in a tractable problem. His worked paved the way for what would eventually become the calculus.****

Supervised Machine Learning for Dummies

My dear friend Moises Goldszmidt, currently Principal Research Scientist at Apple and a badass Jazz musician, once helped me understand that supervised machine learning is quite similar.

Again, at an admittedly simplified level, machine learning can be divided into two camps. Unsupervised machine learning is using computers to find patterns in data and sort different data into clusters. When most people hear they world machine learning, they think about unsupervised learning: computers automagically finding patterns, “actionable insights,” in data that would evade detection of measly human minds. In fact, unsupervised learning is an area of research in the upper echelons of the machine learning community. It can be valuable for exploratory data analysis, but only infrequently powers the products that are making news headlines. The real hero of the present day is supervised learning.

I like to think about supervised learning as follows:

Screen Shot 2017-07-02 at 9.51.14 AM

Let’s take a simple example. We’re moving, and want to know how much to put our house on the market for. We’re not real estate brokers, so we’re not great at measuring prices. But we do have a tape measure, so we are great at measuring the square footage of our house. Let’s say we go look through a few years of real estate records, and find a bunch of data points about how much houses go for and what their square footage is. We also have data about location, amenities like an in-house washer and dryer, and whether the house has a big back yard. But we notice a lot of variation in prices for houses with different sized back yards, but pretty consistent correlations between square footage and price. Eureka! we say, and run around the neighbourhood naked horrifying our neighbours! We can just plot the various data points of square footage : price, measure our square footage (we do have our handy tape measure), and then put that into a function that outputs a reasonable price!

This technique is called linear regression. And it’s the basis for many data science and machine learning techniques.

Screen Shot 2017-07-02 at 9.57.31 AM

The big breakthroughs in deep learning over the past couple of years (note, these algorithms existed for a while, but they are now working thanks to more plentiful and cheaper data, faster hardware, and some very smart algorithmic tweaks) are extensions of this core principle, but they add the following two capabilities (which are significant):

  • Instead of humans hand selecting a few simple features (like square footage or having a washer/dryer), computers transform rich data into a vector of numbers and find all sorts of features that might evade our measly human minds
  • Instead of only being able to model phenomena using simple linear lines, deep learning neural networks can model phenomena using topsy-turvy-twisty functions, which means they can capture richer phenomena like the environment around a self-driving car

At its root, however, even deep learning is about using mathematics to identify a good proxy to represent a more complex phenomenon. What’s interesting is that this teaches us something about the representational power of language: we barter in proxies at every moment of every day, crystallizing the complexities of the world into little tokens, words, that we use to exchange our experience with others. These tokens mingle and merge to create new tokens, new levels of abstraction, adding from from the dust from which we’ve come and to which we will return. Our castles in the sky. The quixotic figures of our imagination. The characters we fall in love with in books, not giving a dam that they never existed and never will. And yet, children learn that dogs are dogs and cats are cats after only seeing a few examples; computers, at least today, need 50,000 pictures of dogs to identify the right combinations of features that serve as a decent proxy for the real thing. Reducing that quantity is an active area of research.

Growth Hacking: 10 Friends in 14 Days

I’ve spent the last month in my new role at integrate.ai talking with CEOs and innovation leaders at large B2C businesses across North America. We’re in that miraculously fun, pre product-market fit phase of startup life where we have to make sure we are building a product that will actually solve a real, impactful, valuable business problem. The possibilities are broad and we’re managing more unknown unknowns than found in a Donald Rumsfeld speech (hat tip to Keith Palumbo of Cylance for the phrase). But we’re starting to see a pattern:

  • B2C businesses have traditionally focused on products, not customers. Analytics have been geared towards counting how many widgets were sold. They can track how something moves across a supply chain, but cannot track who their customers are, where they show up, and when. They can no longer compete on just product. They want to become customer centric.
  • All businesses are sustained by having great customers. Great means having loyalty and alignment with brand and having a high life-time value. They buy, they buy more, they don’t stop buying, and there’s a positive association when they refer a brand to others, particularly others who behave like them.
  • Wanting great customers is not a good technical analytics problem. It’s too fuzzy. So we have to find a way to transform a big objective into a small proxy, and focus energy and efforts on doing stuff in that small proxy window. Not losing weight, but eating an egg instead of pancakes for breakfast every morning.

Silicon Valley giants like Facebook call this type of thinking growth hacking: finding some local action you can optimize for that is a leading indicator of a long-term, larger strategic goal. The classic example from Facebook (which some rumour to be apocryphal, but it’s awesome as an example) was when the growth team realized that the best way to achieve their large, hard-to-achieve metric of having as many daily active users as possible was to reduce it to a smaller, easy-to-achieve metric of getting new users up to 10 friends in their first 14 days. 10 was the threshold for people’s ability to appreciate the social value of the site, a quantity of likes sufficient to drive dopamine hits that keep users coming back to the site.***** These techniques are rampant across Silicon Valley, with Netflix optimizing site layout and communications when new users join given correlations with potential churn rates down the line and Eventbrite making small product tweaks to help users understand they can use to tool to organize as well as attend events. The real power they unlock is similar to that of compound interest in finance: a small investment in your twenties can lead to massive returns after retirement.

Our goal at integrate.ai is to bring this thinking into traditional enterprises via a SaaS platform, not a consulting services solution. And to make that happen, we’re also scouting small, local wins that we believe will be proxies for our long-term success.

Conclusion

The spirit of this post is somewhat similar to a previous post about artifice as realism. There, I surveyed examples of situations where artifice leads to a deeper appreciation of some real phenomenon, like when Mendel created artificial constraints to illuminate the underlying laws of genetics. Proxies aren’t artifice, they’re parts that substitute for wholes, but enable us to understand (and manipulate) wholes in ways that would otherwise be impossible. Doorways into potential. A shift in how we view problems that makes them tractable for us, and can lead to absolutely transformative results. This takes humility. The humility of analysis. The practice of accepting the unreasonable effectiveness of the simple.


*Shout out to the amazing Andrej Karpathy, who authored The Unreasonable Effectiveness of Recurrent Neural Networks and Deep Reinforcement Learning: Pong from Pixels, two of the best blogs about AI available.

**There’s no dearth of self-help books about resolutions and self-transformation, but most of them are too cloying to be palatable. Nudge by Cass Sunstein and Richard Thaler is a rational exception.

***The philosopher Thomas Hobbes was very resistant to some of the formal developments in 17th-century mathematics. He insisted that we be able to visualize geometric objects in our minds. He was relegated to the dustbins of mathematical history, but did cleverly apply Euclidean logic to the Leviathan.

****Leibniz and Newton were rivals in discovering the calculus. One of my favourite anecdotes (potentially apocryphal?) about the two geniuses is that they communicated their nearly simultaneous discovery of the Fundamental Theorem of Calculus–which links derivatives to integrals–in Latin anagrams! Jesus!

*****Nir Eyal is the most prominent writer I know of on behavioural design and habit in products. And he’s a great guy!

The featured image is from the Archimedes Palimpsest, one of the most exciting and beautiful books in the world. It is a Byzantine prayerbook–or euchologion–written on a piece of parchment paper that originally contained mathematical treatises by the Greek mathematician Archimedes. A palimpsest, for reference, is a manuscript or piece of writing material on which the original writing has been effaced to make room for later writing but of which traces remain. As portions of Archimedes’ original Archimedes are very hard to read, researchers recently took the palimpsest to the Stanford Accelerator Laboratory and threw all sorts of particles at it really fast to see if they might shine light on hard-to-decipher passages. What they found had the potential to change our understanding of the history of math and the development of calculus!