I don’t know if Andrei Fajardo knows that I will always remember and cherish our walk up and down University Avenue in Toronto a few months ago. Andrei was faced with a hard choice about his career. He was fortunate: both options were and would be wonderful. He teetered for a few weeks within the suspension of multiple possible worlds, channeling his imagination to feel what it would feel like to make choice one, to feel what it would feel like to live the life opened by choice two. He sought advice from others. He experimented with different kinds of decision-making frameworks to see how the frame of evaluation shaped and brought forth his values, curtailed or unfurled his imagination. He listened for omens and watched rain clouds. He broke down the factors of his decision analytically to rank and order and plunder. He spoke to friends. He silenced the distractions of family. He twisted and turned inside the gravity that only shines forth when it really matters, when the frame of identity we’ve cushioned ourselves within for the last little while starts to fray under the invitation of new possibilities. The world had presented him with its supreme and daunting gift: the poignancy of growth.
I’m grateful that Andrei asked me to be one partner to help him think about his decision. Our conversations transported me back, softly, to the thoughts and feelings and endless walks and desperate desire for the certainty I experienced in 2011 as I waded through months to decide to leave academia and pursue a career in the private sector. I wanted Andrei to understand that the most important lesson that experience taught me was about a “peculiar congenital blindness” we face when we make a hard choice:
To be human is to suffer from a peculiar congenital blindness: On the precipice of any great change, we can see with terrifying clarity the familiar firm footing we stand to lose, but we fill the abyss of the unfamiliar before us with dread at the potential loss rather than jubilation over the potential gain of gladnesses and gratifications we fail to envision because we haven’t yet experienced them.
When faced with the most transformative experiences, we are ill-equipped to even begin to imagine the nature and magnitude of the transformation — but we must again and again challenge ourselves to transcend this elemental failure of the imagination if we are to reap the rewards of any transformative experience. (Maria Popova in her marvelous Brain Pickings newsletter about L.A. Paul’s Transformative Experience)
I shared examples of my own failure of imagination to help Andrei understand the nature of his choice. For hard choices about our future aren’t rational. They don’t fit neatly into the analytical framework of lists and cross-examination and double-entry bookkeeping. It’s the peculiar poignancy of our existence as beings unfurling in time that makes it impossible for us to know who we will be and what knowledge the world will provide us through the slot canyon aperture of our particular experience, bounded by bodies and time.
As Andrei toiled through his decision, he kept returning to a phrase he’d heard from Daphne Koller in her fireside chat with Yoshua Bengio at the 2018 ICLR conference in Vancouver. As he shared in this blog post, Daphne shared a powerful message: “Building a meaningful career as a scientist isn’t only about technical gymnastics; it’s about each person’s search to find and realize the irreplaceable impact they can have in our world.”
But, tragically or beautifully, depending on how you view it, there are many steps in our journey to realize what we believe to be our irreplaceable impact. Our understanding of what this could or should be can and should change as our slot canyon understanding of the world erodes just a little more under with the weight of wind and rain to bring forth light from the sun. In my own experience, I never, ever imagined that just two years after believing I had made a binary decision to leave academia for industry, I would be invited to teach as an adjunct law professor, that three years later I would give guest lectures at business schools around the world, that five years later I would give guest lectures in ethics and philosophy. The past self had a curious capacity to remain intact, to come with me as a barnacle through the various transformations. For the world was bigger and vaster than the limitations my curtailed imagination had imposed on it.
Andrei decided to stay with our company. He is a marvelous colleague and mentor. He is a teacher: no matter where he goes and what he does, his irreplaceable impact will be to broaden the minds of others, to break down statistics and mathematics into crystal clear building blocks than any and all can understand. He’ll come to appreciate that he is a master communicator. I’m quite certain I’ll be there to watch.
What was most beautiful about our walk was the trust I had in Andrei and that he had in me. His awareness that I wanted what was best for him, that none of my comments were designed to manipulate him into staying if that weren’t what he, after his toil, after his reflection, decided was the path he wanted to explore. It was simply an opportunity to share stories and experiences, to provide him with a few different thinking tools he could try on for size as he made his decision. We punctuated our analysis with thoughts about the present. We deepened our connection. I gave him a copy of Rilke’s Letters to a Young Poet to help him come to know more of himself and the world. Throughout our walk, his energy was crystalline. He listened with attention rapt into the weight of it mattering. The attention that emerges when we are searching sponges sopping as much as we can from those we’ve come to trust. The air was chilled just enough to prickle goosebumps, but not so much as to need a sweater. The grass was green and the flowers had started to bud.
Yesterday was the first snow. There are still flowers; soon they’ll die. The leaves over Rosedale are yellow and red, made vibrant by the moisture. Andrei is with his dogs and his wife. I’ll see him tomorrow morning at work.
I found the featured image last weekend at the Thomson Landry Gallery in Toronto’s Distillery District. It’s a painting called “Choisir et Renoncer,” by Yoakim Bélanger. I see in it the migration of fragility, hands cradled open into reverent acceptance. I see in it the stone song of vulnerability: for it is the white figure–she who dared wade ankle-deep in Hades to hear Eurydice’s voice one more time–whose face glows brightest, who reveals the wrinkles of her character, who shines as a reflection of ourselves, unafraid to reveal her seashell cracks and the wisdom she acquired with the crabs. She etches herself into precision. She chisels brightly through the human haze of potential, buoyed upon the bronze haze of the self she once was, but yesterday.
I am forever indebted to Clyfe Beckwith, my high-school physics teacher. Not because he taught us mechanics, but because he used mechanics to teach us about the power and plasticity of conceptual frames.
We were learning to think about things like energy and forces, about how objects move in space and on inclines and around curves (fear winces before tucking myself into a ball to go down a hill on cross-country skis are always accompanied by ironic inside-my-mind jabs yammering “your instincts suck–if your center of gravity is lower to the ground you’ll have an easier time combatting the conservation of angular momentum”). Learning to think about images like this one:
The task here is to decompose the various forces that act on the object, recompose them into some net force, and then use this net force to describe how the block will accelerate down the inclined plane.
I remember the first step to tally the forces was easy. Gravity will pull the block down because that’s what gravity does. Friction will slow the block down because that’s what friction does. And then there’s this force we think less about in our day-to-day lives called the normal force, a contact force that surfaces exert to prevent solid objects from passing through each other. The solids-declaring-their-identity-as-not-air-and-not-ghosts force. And that’s gonna live at the fault line between the objects, concentrating its force-hands into the object’s back like a lover.
The problem is that, because the plane is inclined and not flat, these forces have directionality. They don’t all move up and down. Some move up and down and others are on a diagonal.
I remember feeling flummoxed because it was impossible to get them to align nicely into the perpendicular x-y axes of the Cartesian plane. In my mind, given the math I’d studied up to that point (so, like, high-school algebra–I taught myself trigonometry over the summer but because I had taught myself never had the voiceover that this would be a tool for solving physics problems), Cartesian planes were things, objects as stiff as blocks, things that existed in one frame of reference, perfectly up and perfectly down, not a smidgen of Sharpie ink bleeding or wavering on the axis lines.
And then Clyfe Beckwith did something radical. He tilted the Cartesian plane so that it matched the angle of the incline.
I was like, you can’t do that.
He was like, why not?
I was like, because planes don’t tilt like that, because they exist in the rigid confines of up and down, of one single perpendicular, because that’s how the world works, because this is the static frame of reference I’ve learned about and used again and again and again to solve all these Euclidean geometry problems from freshman year, because Bill Scott, my freshman math teacher, is great, love the guy, and how can we question the mental habits he helped me build to solve those problems, because, wait, really, you can do that?
He was like, just try it.
I remember his cheshire cat smile, illuminated by the joy of palpably watching a student’s mind expand before his eyes, palpably seeing a shift that was one small step for man but a giant step for Mankind, something as abstract as a block moving on an inclined plane teaching a lesson about the malleability of thinking tools.
We tried it. We tilted our heads to theta. We decomposed gravity into two vectors, one countering the normal force and the other moving along the plane. We empathized with the normal force, saw the world from its perspective, not our own. What was intractable became simple. The math became easy.
This may seem obtuse and irrelevant. Stuff you haven’t thought about since high school. Forget about the physics. Focus on tilting your head to theta. Focus on the fact that there is nothing native or natural about assuming that Cartesian planes exist in some idealized perpendicular realm that must always start from up and down, that this disposition is the side effect of drawing them on a piece of paper oriented towards our own front-facing perspective. That it’s possible to empathize with the perspective of a block of mass m at a spot on a triangle tilted at angle theta to make it easier to solve a problem. And that this problem itself is only an approximation upon an approximation upon an approximation, gravity and friction acting at some idealized center to make it easier to do the math, integrating all the infinite smaller forces up through some idealized continuum. Stuff they don’t say much about in high school as that’s the time for building problem-solving muscles. Saving the dissolution for the appropriate moment.
Reading Carlo Rovelli’s The Order of Time pulled this memory out of storage. In his stunningly lyrical book, Rovelli explains how many (with a few variations depending on theory) physicists conceptualize quantum time as radically different from the linear, forward-marching-so-that-the-past-matters-our-actions-matter-choices-are-engraved-and-shape-the-landscape-of-future-possibilities-and-make-morality-possible flow of time we experience in our subjectivity. When we get quantum, explains Rovelli, the difference between past and future blur, phenomena appear that, like the ever-flitting molecules in a glass of water, seem static when viewed through the blunt approximations our eyes provide us so we can navigate the world without going insane from a thermodynamic onslaught.
Rovelli reminds his reader that the idea of an absolute, independent variable time (which shows up in equations as a little t)–time independent of space, time outside the entangled space waltz of Wall-E and Eva, lovers saddened by age’s syncopation, death puncturing our emotional equilibrium to cleave entropy into the pits of our emotions, fraying the trampoline pad into the kinetic energy of springs–that this time is but a mental fiction Newton created within the framework of his theory of the universe, but that the mental landscape of this giant standing upon the shoulders of giants became so prominent in our culture that his view of the world became the view we all inherit and learn when we go to grammar school, became the way the world seems to be, became a fixture as seemingly natural as our experience that the Sun revolves around the Earth, and not otherwise. That the ideas are so ingrained that it takes patience and an open mind to notice that Aristotle talks about time much differently, that, for Aristotle, time was an index of the change relative to two objects. Consider the layers of complexity: how careful we have to be not to read an ancient text with the tinted glasses of today’s concepts, how open and imaginative we must remain to try to understand the text relative to its time as opposed to critiquing the arguments with our own baggage and assumptions; how incredible it is to think that this notion of absolute time, so ingrained that it felt like blasphemy to tilt the Cartesian plane to theta, is but the inheritance of Newton’s ideas, rejected by Leibniz as fitfully as Newton’s crazy idea that objects could influence one another from a distance; how liberating to return to relativity, to experience the universe without the static restrictions of absolute time, to free even the self from the heavy shackles of personhood and identity and recognize we are nothing but socialite Bedouins migrating through the parameters of personality as we mirror approval and flee frowns, the self become array, traits and feelings and thoughts activated through contact with a lover, a family, a team, an organization, a city, a nation, all these concentric circles pitching abstractions that dissolve into the sweat beads of an embrace, of attention swallowed intact by the heartbeat of another.
But it’s not just mechanics. It’s not just time.
It can be the carapace we inherit as we hurt our way through childhood and adolescence, little protective walls of habitual reactions that ossify into detrimental and useless thoughts like the cobwebbed inheritance of celibacy. It can be reactions to situations and events that don’t serve us, that keep us back, anxiety that silences us from speaking up, that fears judgment if we ask a question when we don’t fully understand, interpretations of events in the narratives we’ve constructed to weave our way through life that, importantly, we can choose to keep or discard.
Perhaps the most liberating aspect of Buddhist philosophy is the idea that we have an organ that senses thoughts just like we have organs that sense smells, sounds, tastes, and touch. Separating the self from the stuff of thoughts and emotions is not trivial. We lose the anchor of the noun, abandon the grounding of some thing, no matter how abstract and fleeting thoughts or the pulse of emotions may be (or perhaps they’re not so fleeting? perhaps the incantatory habit of neural patterns in our brains is where we clamp on to remain steadfast amidst rough waves beating life shores?). But this notion of the self as thought is also a conceptual relic we can break, the perpendicular plane we can migrate to theta to help us solve problems and empathize more deeply with another.
There is a calm that arises when we abstract away from the seat of identity as a self separate from the world and up into a view of self as part of the world, as one with the coffee cup accompanying me as I write this post, one with the sun rising into yellow, shifting shades in the summer morning, one with the plastic armchairs and the basil plant, one with my partner as he sleeps thousands of miles away in San Francisco, as I dream with him, recovering legs intertwined in beds and on camp grounds. One with colleagues. One with homeless strangers on the street, one with their pain, their cold, the dirt washed from their face and feet when they shower.
The conceptual carapaces that bind us aren’t required, are often just inherited abstractions from the past. We can discard them if they don’t serve us. We have the ability, always, to tilt our head to theta and breath into the rhythms of the world.
 Clyfe radiates kindness. He has that special George Smith superpower of identifying where he can stretch his students just beyond what they believe they’re capable of while imbuing them with the confidence they need to succeed. Clyfe also found it hilarious that I took intro to cosmology, basically about memorizing constellations, after taking AP physics, which was pretty challenging. But I loved schematizing the sky! It added wonder and nostalgia to the science. I gifted Clyfe The Hobbit once, for his sons. I picked that book because my dad loves Tolkien and reading The Hobbit and The Lord of the Rings were rights of passage when I was a child, so I figured they should be for every child. I guess that means there was something in the way Clyfe behaved that pegged him as a great Dad in my mind. Every few years, my dad gets excited about possibility that he may have finally forgotten details in The Lord of the Rings, eager to reread it and discover it as if it were for the first time. But he’s consistently disappointed. He and my brother have astonishing recall; it’s as if every detail of a book or movie has been branded into their brains indefinitely. My brother can query his Alexandrian brain database of movie references to build social bridges, connecting with people through the common token of some reference as opposed to divulging personal anecdote. My mom and I are the opposite. I remember abstractions in my Peircean plague of being a mere table of contents alongside the vivid, vital particularity of narrative minds like that of my colleague Tyler Schnoebelen. When I think back to my freshman year in college, I clearly remember the layout of arguments in Kant’s First Critique and clearly remember the first time I knowingly proved something using induction, and while I clearly remember that I read Anna Karenina and The Death of Ivan Ilyich that year (apparently was in a Tolstoy kind of mood), I cannot remember the details of those narratives. Instead, I recall the emotions I experienced while reading them, as if they get distilled to their essence, the details of what happens to the characters resolved to the moral like in a fable by Aesop. Except not quite. It’s more that the novels transform into signposts for my own narrative, my own life. Portals that open my memory to recover happenings I haven’t recollected for years, 15 years, 20 years, the white-gloved hand of memory swiping off attic dust to unveil grandma’s doilies or gold-rimmed art deco china. My mind pronounces “the Death of Ivan Ilyich” and what comes forth are the feelings of shyness, of rugged anticipation, as I walked up Rush Street in Chicago floating on a cloud of dopamine brimming with anticipatory possibility, on my way to meet the young man who would become my first real boyfriend, hormones coaxed by how I saw him, how he was much more than a person, how something about how he looked and talked and acted elevated him to the perfect boyfriend, which, of course, gave me solace that I was living life as I was supposed to. My memory not recall, but windows with Dali curtains that open and shut into arabesques of identity (see featured image), cards shuffling glimpses of clubs and spades as they disappear, just barely observed, back into the stack. Possibility not actually reserved for the future, but waiting under war helmets in trench wormholes ready for new interpretation. For yes, there was other guy I didn’t decide to date, the guy who told me I looked like Audrey Hepburn (not because I do, but because I do just enough and want to so evidently enough that the flattery makes sense) when he took me on our first date at the top of the Hancock tower, a high-in-the-sky type of bar, full of women with hair stiffed in hairspray and almost acrylic polyester dresses that were made today but always remind you of Scarface or Miami Vice, with the particular smell only high-in-the-sky bars have, stale in a way that’s different from the acrid smell of dive bars that haven’t been cleaned well, maybe it’s the sauce they use on the $50 lamb chops, the jellied mint, the shrimp cocktails alongside the gin, or the cleaning agent reserved exclusively for high-in-the-sky dance halls or Vegas hotels. It could have been different, but it became what it was because Morgan more closely aligned with my mental model of the perfect boyfriend. We’re still friends. He seems quite happy and his wife seems to be thriving.
 Don’t get me wrong: while Leibniz rejected the idea of action at a distance, much of his philosophy and metaphysics are closer to 20th-century philosophy and physics than the other early-modern heavyweights (Descartes, Spinoza, Pascal, etc). He was the predecessor of what would become possible worlds’ theory, developing a logic and metaphysics rooted in the probability, not denotative reality. He loved combinatorics and wanted to create a universal, formal language rooted in math (his universal characteristic). The portion of his philosophy I love most is his attempt to reconcile free will and determinism using our conceptual limitations, rooting his metaphor in calculus. When we do calculus, we can’t perceive how a limit converges to some digit. Our existence in time, argues Leibniz, is similar: we’re stuck in the approximation of the approximation, assessing local minima, while God sees the entire function, appreciating how some action or event that may seem negative in the moment can lead to the greatest good in the future.
 It seems like downright sound logic to me that celibacy made economic sense in feudalism, given the structure of dowries and property (about which I don’t know the details) but was conceptualized using abstract arguments about reserving love for God alone, and that this abstraction survived even after economic rationale disappeared into the folds of capitalism, only to create massive issues down the line because it’s not a human way to be, our bodies aren’t made only to love God, we can’t clip our sexuality at the seams, so pathologies arise. It’s not a popular position to pardon the individual and blame the system; but every individual merits compassion.
 Something about the cadence of the sentence made me want to leave out sight.
The featured image is Figura en una finestra, painted by Salvador Dalí in 1925. Today it lives in the Reina Sofia. How very not Dalí, right? And yet it displays the same mastery of precise technique we see in the hallmarks of surrealism. The nuances of this painting are so suggestive and evocative, so rich with basic meaning, nothing religious, nothing allegorical, just the taught concentration of recognition, of our seeing this girl at a moment in time. The white scarf draping in such a way that it suggests the moments prior to the painting, when she walked up troubled by something that happened to her and placed it there, with care, with gentleness, as she settled into her forlorn gaze. Her thoughts protected by her directionality, all that’s available to us the balance of her weight on her left foot, her buttocks reflecting the weight distribution. The curtains above the left window somehow imbued with her emotions, their rustle so evidently spurred not only by wind but by the extension of her mood, their shape our only access to the inside of her mind, her gaze facing outwards. I love it. She gains power because she is so unaccessible. The painting holds the embryo of surrealism in it because of what I just wrote, because it invites us to read more into the images that what’s there in oil paint and textures and lines.
Big picture stories that make sense of history’s bacchanal march into the apocalypse.
Broad-stroke predictions about how artificial intelligence (AI) will shape the future of humanity made by those with power arising from knowledge, money, and/or social capital. Knowledge, as there still aren’t actually that many real-deal machine learning researchers in the world (despite the startling growth in paper submissions to conferences like NIPS), people who get excited by linear algebra in high-dimension spaces (the backbone of deep learning) or the patient cataloguing of assumptions required to justify a jump from observation to inference. Money, as income inequality is a very real thing (and a thing too complex to say anything meaningful about in this post). For our purposes, money is a rhetoric amplifier, be that from a naive fetishism of meritocracy, where we mistakenly align wealth with the ability to figure things out better than the rest of us, or cynical acceptance of the fact that rich people work in private organizations or public institutions with a scope that impacts a lot of people. Social capital, as our contemporary Delphic oracles spread wisdom through social networks, likes and retweets governing what we see and influencing how we see (if many people, in particular those we want to think like and be like, like something, we’ll want to like it too), our critical faculties on amphetamines as thoughtful consideration and deliberation means missing the boat, gut invective the only response fast enough to keep pace before the opportunity to get a few more followers passes us by, Delphi sprouting boredom like a 5 o’clock shadow, already on to the next big thing. Ironic that big picture narratives must be made so hastily in the rat race to win mindshare before another member of the Trump administration gets fired.
Most foundational narratives about the future of AI rest upon an implicit hierarchy of being that has been around for a long time. While proffered by futurists and atheists, the hierarchy dates back to the Great Chain of Being that medieval Christian theologists like Thomas Aquinas built to cut the physical and spiritual world into analytical pieces, applying Aristotelian scientific rigor to the spiritual topics.
The hierarchy provides a scale from inanimate matter to immaterial, pure intelligence. Rocks don’t get much love on the great chain of being, even if they carry the wisdom and resilience of millions of years of existence, contain, in their sifting shifting grain of sands, the secrets of fragility and the whispered traces of tectonic plates and sunken shores. Plants get a little more love than rocks, and apparently Venus fly traps (plants that resemble animals?) get more love than, say, yeast (if you’re a fellow member of the microbiome-issue club, you like me are in total awe of how yeast are opportunistic sons of bitches who sense the slightest shift in pH and invade vulnerable tissue with the collective force of stealth guerrilla warriors). Humans are hybrids, half animal, half rational spirit, our sordid materiality, our silly mortality, our mechanical bodies ever weighting us down and holding us back from our real potential as brains in vats or consciousnesses encoded to live forever in the flitting electrons of the digital universe. There are a shit ton of angels. Way more angel castes than people castes. It feels repugnant to demarcate people into classes, so why not project differences we live day in and day out in social interactions onto angels instead? And, in doing so, basically situate civilized aristocrats as closer to God than the lower and more animalistic members of the human race? And then God is the abstract patriarch on top of it all, the omnipotent, omniscient, benevolent patriarch who is also the seat of all our logical paradoxes, made of the same stuff as Gödel’s incompleteness theorem, the guy who can be at once father and son, be the circle with the center everywhere and the circumference nowhere, the master narrator who says, don’t worry, I got this, sure that hurricane killed tons of people, sure it seems strange that you can just walk into a store around the corner a buy a gun and there are mass shootings all the time, but trust me, if you could see the big picture like I see the big picture, you’d get how this confusing pain will actually result in the greatest good to the most people.
I’m going to be sloppy here and not provide hyperlinks to specific podcasts or articles that endorse variations of this hierarchy of being: hopefully you’ve read a lot of these and will have sparks of recognition with my broad stroke picture painting. But what I see time and again are narratives that depict AI within a long history of evolution moving from unicellular prokaryotes to eukaryotes to slime to plants to animals to chimps to homo erectus to homo sapiens to transhuman superintelligence as our technology changes ever more quickly and we have a parallel data world where leave traces of every activity in sensors and clicks and words and recordings and images and all the things. These big picture narratives focus on the pre-frontal cortex as the crowning achievement of evolution, man distinguished from everything else by his ability to reason, to plan, to overcome the rugged tug of instinct and delay gratification until the future, to make guesses about the probability that something might come to pass in the future and to act in alignment with those guesses to optimize rewards, often rewards focused on self gain and sometimes on good across a community (with variations). And the big thing in this moment of evolution with AI is that things are folding in on themselves, we no longer need to explicitly program tools to do things, we just store all of human history and knowledge on the internet and allow optimization machines to optimize, reconfiguring data into information and insight and action and getting feedback on these actions from the world according to the parameters and structure of some defined task. And some people (e.g., Gary Marcus or Judea Pearl) say no, no, these bottom up stats are not enough, we are forgetting what is actually the real hallmark of our pre-frontal cortex, our ability to infer causal relationships between phenomena A and phenomena B, and it is through this appreciation of explanation and cause that we can intervene and shape the world to our ends or even fix injustices, free ourselves from the messy social structures of the past and open up the ability to exercise normative agency together in the future (I’m actually in favor of this kind of thinking). So we evolve, evolve, make our evolution faster with our technology, cut our genes crisply and engineer ourselves to be smarter. And we transcend the limitations of bodies trapped in time, transcend death, become angel as our consciousness is stored in the quick complexity of hardware finally able to capture plastic parallel processes like brains. And inch one step further towards godliness, ascending the hierarchy of being. Freeing ourselves. Expanding. Conquering the march of history, conquering death with blood transfusions from beautiful boys, like vampires. Optimizing every single action to control our future fate, living our lives with the elegance of machines.
It’s an old story.
Many science fiction novels feel as epic as Disney movies because they adapt the narrative scaffold of traditional epics dating back to Homer’s Iliad and Odyssey and Virgil’s Aeneid. And one epic quite relevant for this type of big picture narrative about AI is John Milton’s Paradise Lost, the epic to end all epics, the swan song that signaled the shift to the novel, the fusion of Genesis and Rome, an encyclopedia of seventeenth-century scientific thought and political critique as the British monarchy collapsed under the rushing sword of Oliver Cromwell.
Most relevant is how Milton depicts the fall of Eve.
Milton lays the groundwork for Eve’s fall in Book Five, when the archangel Raphael visits his friend Adam to tell him about the structure of the universe. Raphael has read his Aquinas: like proponents of superintelligence, he endorses the great chain of being. Here’s his response to Adam when the “Patriarch of mankind” offers the angel mere human food:
O Adam, one Almightie is, from whom
All things proceed, and up to him return,
If not deprav’d from good, created all
Such to perfection, one first matter all,
Indu’d with various forms various degrees
Of substance, and in things that live, of life;
But more refin’d, more spiritous, and pure,
As neerer to him plac’t or neerer tending
Each in thir several active Sphears assignd,
Till body up to spirit work, in bounds
Proportiond to each kind. So from the root
Springs lighter the green stalk, from thence the leaves
More aerie, last the bright consummate floure
Spirits odorous breathes: flours and thir fruit
Mans nourishment, by gradual scale sublim’d
To vital Spirits aspire, to animal,
To intellectual, give both life and sense,
Fansie and understanding, whence the Soule
Reason receives, and reason is her being,
Discursive, or Intuitive; discourse
Is oftest yours, the latter most is ours,
Differing but in degree, of kind the same.
Raphael basically charts the great chain of being in the passage. Angels think faster than people, they reason in intuitions while we have to break things down analytically to have any hope of communicating with one another and collaborating. Daniel Kahnemann’s partition between discursive and intuitive thought in Thinking, Fast and Slow had an analogue in the seventeenth century, where philosophers distinguished the slow, composite, discursive knowledge available in geometry and math proofs from the fast, intuitive, social insights that enabled some to size up a room and be the wittiest guest at a cocktail party.
Raphael explains to Adam that, through patient, diligent reasoning and exploration, he and Eve will come to be more like angels, gradually scaling the hierarchy of being to ennoble themselves. But on the condition that they follow the one commandment never to eat the fruit from the forbidden tree, a rule that escapes reason, that is a dictum intended to remain unexplained, a test of obedience.
But Eve is more curious than that and Satan uses her curiosity to his advantage. In Book Nine, Milton fashions Satan in his trappings as snake as a master orator who preys upon Eve’s curiosity to persuade her to eat of the forbidden fruit. After failing to exploit her vanity, he changes strategies and exploits her desire for knowledge, basing his argument on an analogy up the great chain of being:
O Sacred, Wise, and Wisdom-giving Plant,
Mother of Science, Now I feel thy Power
Within me cleere, not onely to discerne
Things in thir Causes, but to trace the wayes
Of highest Agents, deemd however wise.
Queen of this Universe, doe not believe
Those rigid threats of Death; ye shall not Die:
How should ye? by the Fruit? it gives you Life
To Knowledge? By the Threatner, look on mee,
Mee who have touch’d and tasted, yet both live,
And life more perfet have attaind then Fate
Meant mee, by ventring higher then my Lot.
That ye should be as Gods, since I as Man,
Internal Man, is but proportion meet,
I of brute human, yee of human Gods.
So ye shall die perhaps, by putting off
Human, to put on Gods, death to be wisht,
Though threat’nd, which no worse then this can bring.
Satan exploits Eve’s mental model of the great chain of being to tempt her to eat the forbidden apple. Mere animals, snakes can’t talk. A talking snake, therefore, must have done something to cheat the great chain of being, to elevate itself to the status of man. So too, argues Satan, can Eve shortcut her growth from man to angel by eating the forbidden fruit. The fall of mankind rests upon our propensity to rely on analogy. May the defenders of causal inference rejoice.
The point is that we’ve had a complex relationship with our own rationality for a long time. That Judeo-Christian thought has a particular way of personifying the artifacts and precipitates of abstract thoughts into moral systems. That, since the scientific revolution, science and religion have split from one another but continue to cross paths, if only because they both rest, as Carlo Rovelli so beautifully expounds in his lyrical prose, on our wonder, on our drive to go beyond the immediately visible, on our desire to understand the world, on our need for connection, community, and love.
But do we want to limit our imaginations to such a stale hierarchy of being? Why not be bolder and more futuristic? Why not forget gods and angels and, instead, recognize these abstract precipitates as the byproducts of cognition? Why not open our imaginations to appreciate the radically different intelligence of plants and rocks, the mysterious capabilities of photosynthesis that can make matter from sun and water (WTF?!?), the communication that occurs in the deep roots of trees, the eyesight that octopuses have all down their arms, the silent and chameleon wisdom of the slit canyons in the southwest? Why not challenge ourselves to greater empathy, to the unique beauty available to beings who die, capsized by senescence and always inclining forward in time?
Why not free ourselves of the need for big picture narratives and celebrate the fact that the future is far more complex than we’ll ever be able to predict?
How can we do this morally? How can we abandon ourselves to what will come and retain responsibility? What might we build if we mimic animal superintelligence instead of getting stuck in history’s linear march of progress?
I believe there would be beauty. And wild inspiration.
 This note should have been after the first sentence, but I wanted to preserve the rhetorical force of the bare sentences. My friend Stephanie Schmidt, a professor at SUNY Buffalo, uses the concept of foundational narratives extensively in her work about colonialism. She focuses on how cultures subjugated to colonial power assimilate and subvert the narratives imposed upon them.
 Yesterday I had the pleasure of hearing a talk by the always-inspiring Martin Snelgrove about how to design hardware to reduce energy when using trained algorithms to execute predictions in production machine learning. The basic operations undergirding machine learning are addition and multiplication: we’d assume multiplying takes more energy than adding, because multiplying is adding in sequence. But Martin showed how it all boils down to how far electrons need to travel. The broad-stroke narrative behind why GPUs are better for deep learning is that they shuffle electrons around criss-cross structures that look like matrices as opposed to putting them into the linear straight-jacket of the CPU. But the geometry can get more fine-grained and complex, as the 256×256 array in Google’s TPU shows. I’m keen to dig into the most elegant geometry for designing for Bayesian inference and sampling from posterior distributions.
 Technology culture loves to fetishize failure. Jeremy Epstein helped me realize that failure is only fun if it’s the mid point of a narrative that leads to a turn of events ending with triumphant success. This is complex. I believe in growth mindsets like Ray Dalio proposes in his Principles: there is real, transformative power in shifting how our minds interpret the discomfort that accompanies learning or stretching oneself to do something not yet mastered. I jump with joy at the opportunity to transform the paralyzing energy of anxiety into the empowering energy of growth, and believe its critical that more women adopt this mindset so they don’t hold themselves back from positions they don’t believe they are qualified for. Also, it makes total sense that we learn much, much more from failures than we do from successes, in science, where it’s important to falsify, as in any endeavor where we have motivation to change something and grow. I guess what’s important here is that we don’t reduce our empathy for the very real pain of being in the midst of failure, of not feeling like one doesn’t have what other have, of being outside the comfort of the bell curve, of the time it takes to outgrow the inheritance and pressure from the last generation and the celebrations of success. Worth exploring.
 Milton actually wrote a book about logic and was even a logic tutor. It’s at once incredibly boring and incredibly interesting stuff.
The featured image is the 1808 Butts Set version of William Blake’s “Satan Watching the Endearments of Adam and Eve.” Blake illustrated many of Milton’s works and illustrated Paradise Lost three times, commissioned by three different patrons. The color scheme is slightly different between the Thomas, Butts, and Linnell illustration sets. I prefer the Butts. I love this image. In it, I see Adam relegated to a supporting actor, a prop like a lamp sitting stage left to illuminate the real action between Satan and Eve. I feel empathy for Satan, want to ease his loneliness and forgive him for his unbridled ambition, as he hurdles himself tragically into the figure of the serpent to seduce Eve. I identify with Eve, identify with her desire for more, see through her eyes as they look beyond the immediacy of the sexual act and search for transcendence, the temptation that ultimately leads to her fall. The pain we all go through as we wise into acceptance, and learn how to love.
George E. Smith may be the best kept secret in academia.
The words aren’t mine: they belong to Daniel Dennett.* He said them yesterday at On the Question of Evidence, a conference Tufts University hosted to celebrate George’s life and work. George sat in the second row during the day’s presentations. I watched him listen attentively, every once and a while bowing his head the way he does when he gets emotional, humbly displacing praise to another giant upon whose shoulders he claims to have stood. I. Bernard Cohen, Ken Wilson, Curtis Wilson, Tom Whiteside (I remember George’s anecdote about meeting Whiteside in a bookstore and saying he genuflected before his brilliant scholarship on Newtonian mathematics). He unabashedly had the first word after every talk, most of the time articulating an anecdote (anyone who knows George knows that he likes to tell stories) about how someone else taught him an idea or taking the opportunity to articulate some crisp, crucial maxim in philosophy of science. Perhaps my favourite part of the entire day was listening to him interrupt Dan’s story about how they jointly founded the computer science program at Tufts, back in the days of Minsky and Good ol’ fashioned AI**, George stepping in to provide additional details about their random colour pixel generation program (random until the output looked a little too much like plaid), recollecting dates and details with ludicrous precision, as is his capability and wont.
Almost every conference attendee had at least one thing in common: we’d taken George’s Newton seminar, unique in its kind, one of the most complete, erudite, stimulating academic experiences in the world today. Some conference attendees, like Eric Schliesser and Bill Bradley, took the seminar back in its infancy in the 80s and early 90s. I took it with Kant scholar Michael Friedman at Stanford in 2009. Each year George builds on the curriculum, collaborating with students on some open research question, only to incorporate new learnings into next year’s (or some future year’s) curriculum. Perhaps the hallmark of a truly significant thinker is that her work is as rich and complex as the natural world, containing second-order ideas like second-order phenomena, phenomena that no one observes–or even can observe!–the first time they look. Details masked behind more dominant regularities, but that, in time and through the gradual and patience process of measurement, observation, and research, become visible through the mediation of theory. Theories as anchors to see the invisible.
This recursive process of knowledge and growth isn’t unique to George’s teaching. It characterizes his seminal contribution to the philosophy of science.
Many of the conference speakers drew from George’s stunning 2014 article Closing the Loop. The article is the culmination of 20 years of work studying how Newton changed standards for high-quality evidence. We often assume that Newton’s method is hypothetical-deductive, where the reasoning structure is to formulate a hypothesis that could be falsified by a test on observable data, and collect observations to either falsify (if observations disagree with what the hypothesis predicts) or corroborate (if observations agree) a theory.*** George thinks that Newton is up to something different. He cites Newton’s “Copernicum scholium,” where Newton states that, given the complexities of forces that determine planetary motion, considering “simultaneously all these causes of motion and [defining] these motions by exact laws admitting of easy calculation exceeds…the force of any human mind.” George writes:
The complexity of true motions was always going to leave room for competing theories if only because the true motions were always going to be beyond precise description, and hence there could always be multiple theories agreeing with observation to any given level of approximation. On my reading, the Principia is one sustained response to this evidence problem.
George goes on to argue that Newtonian idealizations, the simplified, geometric theories that predict how a particular system of bodies would behave under specifiable circumstances, aren’t theories to align directly with observations, but thinking tools, counterfactuals that predict how the world would behave if it were only governed by a few forces (e.g., a system only subject to gravity versus a system subject to gravity and magnetism). Newton expects that observations won’t fit predictions, because he knows the world is more complex than mathematical models can describe. Instead, discrepancies become a source of high-quality evidence to both corroborate a theory and render hypotheses ever more precise and encompassing as we incorporate in new details about the world. If Newton can isolate how a system would behave under the force of gravity, the question becomes, what, if any, further forces are at work? According to George, then, Newton didn’t propose a static method for falsifying hypotheses with observations and data. He proposed a dynamic research strategy that could encompass an ongoing program of navigating the gulf between idealizations and observations. Importantly, as Michael Friedman showed in his talk, theories can be the necessary condition for certain types of measurement: there is a difference between plotting data points indicating the position of Jupiter at two points in time and measuring Jupiter’s acceleration in its orbit. The second is theory-mediated measurement, where the theory relating mass to acceleration via the force of gravity makes it possible to measure mass from acceleration. Also importantly, because discrepancies between theory and observations don’t direct falsify a hypothesis, the Newtonian scientific paradigm was more resilient, requiring a different type of discrepancy to pave the way to General Relativity (while I get the gist of why Einstein required space-time curvature to explain the precession of Mercury’s orbit, I must admit I don’t yet understand it as deeply as I’d like).****
While George is widely recognized as one of the world’s foremost experts in Isaac Newton, his talents and interests are wide-ranging. Alongside his career as a philosopher, he has a second, fully-developed career (not just a hobby) as a jet engine engineer focused on failure analysis. He has a keen sensibility for literature: he introduced me to Elena Ferrante, whose Neapolitan novels I have sense devoured. He knows his way around modern art, having dated artist Eva Hesse during his undergraduate years at Yale. He treats his wife India, who is herself incredible, with love and respect. He takes pride in having coached basketball to underprivileged students on Boston’s south side. The list literally goes on and on.
The commentary that touched me deepest came from Tufts Dean of Academic Affairs Nancy Bauer. Nancy commented on the fact that she wouldn’t be where she is today if it weren’t for George, that he believed there was a place for feminist philosophy in the academy before other departments caught on. She also commented on what is perhaps George’s most important skill: his ability to at once challenge students to rise to just beyond the limits of their potential and to imbue them with the confidence they need to succeed.
This is not meaningful to me in the abstract. It is concrete and personal.
I took George’s course during the spring trimester of my second year in graduate school. I was intimidated: my graduate degree is in comparative literature, and while I had majored in mathematics as an undergrad, I was unwaveringly insecure about my abilities to reason precisely. Some literature students are comfortable in their skin; they live and breath words, images, tropes, analogies. I teetered in a no man’s land between math, philosophy, history, literature, gorging with curiosity and encyclopedic drive but never disciplined enough to do one thing with excellence. The first (or second? George would know…) day of class, I approached George and told him I was worried about taking the course as a non-philosopher. I was definitely concerned few times trying to muscle my way through the logic of Newton’s proofs in the Principia (it was heartening to learn that John Locke barely understood it). But George asked me about my background and, learning I’d studied math, smiled in a way that couldn’t but give me confidence. He helped me define a topic for my final paper focused on the history of math, evaluating the concept of what Newton calls “first and last ratios”, akin to but not quite limits, in the Principia. He was proud of my paper. I was proud of my paper. But what matters is not the scholarship, it’s what I learned in the process.
George taught me how to overcome my insecurities and find a place of strength to do great work. He taught me how to love my future students, my future colleagues, how to pay close attention to their strengths, interests, weaknesses, and to shape experiences that can push them to be their best while imbuing them with the confidence they need to succeed. I carry the experience I had in George’s course with me every day to work. He taught me what it means to act with integrity as a mentor, colleague, and teacher.
I’ll close with an excerpt from an email George sent me September 29, 2017. I’d sent him a copy of Melville’s The Confidence Man to thank him for being on my podcast. Closing his thank you note, he wrote:
“I was pleased with your comment in the note attached to the gift about continuing influence. You had already, however, given me one of the more special gifts I have received in the last couple of years, namely the name of your site. I find nothing in life more gratifying than to see that someone learned something in one of my classes that they have continued to find special.”
Quam proxime, “most nearly to the highest degree possible,” occurs 139 times in the Principia. George became obsessed with what it means for grasping the place for approximation in Newtonian science. I co-opted it to refer to my own quest to achieve the delicate, precious balance between precision and soul, to guide me in my quest for meaning. How grateful I am that George is part of my process, always continuing, always growing, as we migrate the beautiful complexities of our world.
*George and Dan were my second guests on the In Context podcast, during which we discussed the relationship between data and evidence.
**Over lunch, “uncle Dan” (again, his words, not mine) predicted that the current generation of deep learning researchers would likely take a bottoms up, trial and error approach to inferring the same structured, taxonomical web of knowledge the GOFAI research community tried to define top down back in the 1950s-80s, of course will a bit more lubrication than the brittle systems of yore. This shouldn’t surprise anyone familiar with Dan’s work, as he often, as in From Bacteria to Bach and Back, writes about systems that appear to exhibit the principles of top-down intelligent design without ever having had an intentional intelligent designer. For example, life and minds.
***I won’t go into the nuances of scientific reasoning in this post, so won’t talk about processes to use data to select amidst competing hypotheses.
****I asked Bill Harper a question about the difference between Bayesian uncertainty and the delta between prediction, likelihood, and data, and the uncertainty and approximations George bakes into the Newtonian research paradigm. I’m looking to better understand different types of inference.
The featured image shows the overlap between the observed and predicted values of gravitational waves, observed by two Laser Interferometer Gravitational-Wave Observatories (LIGOs) in Washington and Louisiana on September 14, 2015. Allan Franklin, one of the conference speakers, showed this image and pointed out he found the visual proof of the overlap between prediction and observation more compelling than the 5-sigma effect in the data (that’s an interesting epistemological question about evidence!). He also commented how wonderful it must have been to be in the room with the LIGO researches when the event occurred. To learn more, I highly recommend Janna Levin’s Black Hole Blues and Other Sounds from Outer Space.
Is tacit consent–our silently agreeing to the fine print of privacy policies as we continue to use services–something we prefer to grant given the nuisance, time, and effort required to understand the nuances of data use? Is consent as a mechanism too reliant upon the supposition that privacy is an individual right and, therefore, available for an individual to exchange–in varying degrees–for the benefits and value from some service provider (i.e., Facebook likes satisfying our need to be both loved and lovely)? If consent is defunct, what legal structure should replace it?
How should we update outdated notions of what qualifies as personally identifiable information (PII), which already vary across different countries and cultures, to account for the fact contemporary data processing techniques can infer aspects of our personal identity from our online (and, increasingly, offline) behavior that feel more invasive and private than our name and address? Can more harm be done to an individual using her social security/insurance number than psychographic traits? In which contexts?
Would regulatory efforts to force large companies like Facebook to “lock down” data they have about users actually make things worse, solidifying their incumbent position in the market (as Ben Thompson and Mike Masnick argue)?
Is the best solution, as Cory Doctorow at the Electronic Frontier Foundation argues, to shift from having users (tacitly) consent to data use, based on trust and governed by the indirect forces of market demand (people will stop using your product if they stop trusting you) and moral norms, to building privacy settings in the fabric of the product, enabling users to engage more thoughtfully with tools?
Many more qualified than I are working to inform clear opinions on what matters to help entrepreneurs, technologists, policymakers, and plain-old people respond. As I grapple with this, I thought I’d share a brief and incomplete history of the thinking and concepts undergirding privacy. I’ll largely focus on the United States because it would be a book’s worth of material to write even brief histories of privacy in other cultures and contexts. I pick the United States not because I find it the most important or interesting, but because it happens to be what I know best. My inspiration to wax historical stems from a keynote I gave Friday about the history of artificial intelligence (AI) for AI + Public Policy: Understanding the shift, hosted by the Brookfield Institute in Toronto.
As is the wont of this blog, the following ideas are far from exhaustive and polished. I offer them for your consideration and feedback.
The Fourth Amendment: Knock-and-Announce
As my friend Lisa Sotto eloquently described in a 2015 lecture at the University of Pennsylvania, the United States (U.S.) considers privacy as a consumer right, parsed across different business sectors, and the European Union (EU) considers privacy as a human right, with a broader and more holistic concept of what kinds of information qualify as sensitive. Indeed, one look at the different definitions of sensitive personal data in the U.S. and France in the DLA Piper Data Protection Laws of the World Handbook shows that the categories and taxonomies are operating at different levels. In the U.S., sensitive data is parsed by data type; in France, sensitive data is parsed by data feature:
It seems potentially, possibly plausible (italics indicating I’m really unsure about this) that the U.S. concept of privacy as being fundamentally a consumer right dates back to the original elision of privacy and property in the Fourth Amendment to the U.S. Constitution:
The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.
We forget how tightly entwined protection of property was to early U.S. political theory. In his Leviathan, for example, seventeenth-century English philosopher Thomas Hobbes derives his theory of legitimate sovereign power (and the notion of the social contract that influenced founding fathers like Jefferson and Madison) from the need to provide individuals with some recourse against intrusions on their property; otherwise we risk devolving to the perpetually anxious and miserable state of the war of all against all, where anyone can come in and ransack our stuff at any time.
The Wikipedia page on the Fourth Amendment explains it as a countermeasure against general warrants and writs of assistance British colonial tax collectors were granted to “search the homes of colonists and seize ‘prohibited and uncustomed’ goods.” What matters for this brief history is the foundation that early privacy law protected people’s property–their physical homes–from searches, inspections, and other forms of intrusion or surveillance by the government.
Katz v. United States: Reasonable Expectations of Privacy
Does the right to privacy extend to telephone booths and other public places?
Is a physical intrusion necessary to constitute a search?
Justice Harlan’s comments regarding the “actual (subjective) expectation of privacy” that society is prepared to recognize as “reasonable” marked a conceptual shift to pave the way for the Fourth Amendment to make sense in the digital age. Katz shifted the locus of constitutional protections against unwarranted government surveillance from one’s private home–property that one owns–to public places that social norms recognize as private in practice if not name (a few cases preceding Katz paved the way for this judgment).
This is watershed: when any public space can be interpreted as private in the eyes of the beholder, the locus of privacy shifts from an easy-to-agree-upon-objective-space like someone’s home, doors locked and shades shut, to a hard-to-agree-upon-subjective-mindset like someone’s expectation of what should be private, even if it’s out in a completely public space, just as long as those expectations aren’t crazy (i.e., that annoying lady somehow expecting that no one is listening to her uber-personal conversation about the bad sex she had with the new guy from Tinder as she stands in a crowded checkout line waiting to purchase her chia seed concoction and her gluten-free crackers) but accord with the social norms and practices of a given moment and generation.
Imagine how thorny it becomes to decide what qualifies as a reasonable expectation for privacy when we shift from a public phone booth occupied by one person who can shut the door (as in Katz) to:
internet service providers shuffling billions of text messages, phone calls, and emails between individuals, where (perhaps?) the standard expectation is that when we go through the trouble of protecting information with a password (or two-factor authentication), we’re branding these communications as private, not to be read by the government or the private company providing us with the service (and metadata?);
GPS devices placed on the bottom of vehicles, as in United States v. Jones, 132 S.Ct. 945 (2012), which in themselves may not seem like something everyone has to worry about often but which, given the category of data they generate, are similar to any and all information about how we transact and move in the world, revealing not just what our name is but which coffee shops and doctors (or lovers or political co-conspirators) we visit on a regular basis, prompting Justice Sandra Sotomayor to be very careful in her judgments;
social media platforms like Facebook, pseudo-public in nature, that now collect and analyze not only structured data on our likes and dislikes, but, thanks to advancing AI capabilities, image, video, text, and speech data;
Just as Zeynep Tufecki argues that informed consent loses its power in an age where most users of internet services and products don’t rigorously understand what use of their data they’re consenting too, so too does Cohn believe that the “‘reasonable expectation of privacy’ test currently employed in Fourth Amendment jurisprudence is a poor test for the digital age.” As with any shift from criticism to pragmatic solutions, however, the devil is in the details. If we eliminate a reasonableness test because it’s too flimsy for the digital age, what do we replace it to achieve the desired outcomes of protecting individual rights to free speech and preventing governmental overreach? Do we find a way to measure actual harm suffered by an individual? Or should we, as Lex Gill suggested Friday, somehow think about privacy as a public good rather than an individual choice requiring individual consent? What are the different harms we need to guard against in different contexts, given that use of data for targeted marketing has different ramifications than government wiretapping?
These questions are tricky to parse because, in an age where so many aspects of our lives are digital, privacy bleeds into and across different contexts of social, political, commercial, and individual activity. As Helen Nissenbaum has masterfully shown, our subjective experience of what’s appropriate in different social contexts influences our reasonable expectations of privacy in digital contexts. We don’t share all the intimate details of our personal life with colleagues in the same way we do with close friends or doctors bound by duties of confidentiality. Add to that that certain social contexts demand frivolity (that ironic self you fashion on Facebook) and others, like politics, invite a more aspirational self. Nissenbaum’s theory of contextual integrity, where privacy is preserved when information flows respect the implicit, socially-constructed boundaries that graft the many sub-identities we perform and inhabit as individuals, applies well to Cambridge Analytica debacle. People are less concerned by private companies using social media data for psychographic targeting than they are for political targeting; the algorithms driving stickiness on site and hyper-personalized advertising aren’t fit to promote the omnivorous, balanced information diet required to understand different sides of arguments in a functioning democracy. Being at once watering hole to chat with friends, media company to support advertising, and platform for political persuasion, Facebook collapses distinct social spheres into one digital platform (which also complicates anti-trust arguments, as evident in this excellent Intelligence Squared debate).
A New New Deal on Data: Privacy in the Age of Machine Learning
Perhaps the greatest challenge posed by this new ability to sense the pulse of humanity is creating a “new deal” around questions of privacy and data ownership. Many of the network data that are available today are freely offered because the entities that control the data have difficulty extracting value from them. As we develop new analytical methods, however, this will change.
This ability to “sense the pulse of humanity,” writes Pentland earlier in the article, arises from the new data generation, collection, and processing tools that have effectively given the human race “the beginnings of a working nervous system.” Pentland contrasts what we are able to know about people’s behavior today–where we move in the world, how many times our hearts beat per minute, whom we love, whom we are attracted to, what movies we watch and when, what books we read and stop reading in between, etc–with the “single-shot, self-report data” data, e.g., yearly censuses, public polls, and focus groups, that characterized demographic statistics in the recent past. Note that back in 2009, the hey day of the big data (i.e., collecting and storing data) era, Pentland commented that while a ton of data was collected, companies had difficulty extracting value. It was just a lot of noise backed by the promise of analytic potential.
This has changed.
Machine learning has unlocked the potential and risks of the massive amounts of data collected about people.
The standard risk assessment tools (like privacy impact assessments) used by the privacy community today focus on protecting the use of particular types of data, PII like proper names and e-mail addresses. There is a whole industry and tool kit devoted to de-identification and anonymization, automatically removing PII while preserving other behavioral information for statistical insights. The problem is that this PII-centric approach to privacy misses the boat in the machine learning age. Indeed, what Cambridge Analytica brought to the fore was the ability to use machine learning to probabilistically infer not proper names but features and types from behavior: you don’t need to check a gender box for the system to make a reasonably confident guess that you are a woman based on the pictures you post and the words you use; private data from conversations with your psychiatrist need not be leaked for the system to peg you as neurotic. Deep learning is so powerful because it is able to tease out and represent hierarchical, complex aspects of data that aren’t readily and effectively simplified down variables we can keep track of and proportionately weight in our heads: these algorithms can, therefore, tease meaning out of a series of actions in time. This may not peg you as you, but it can peg you as one of a few whose behavior can be impacted using a given technique to achieve a desired outcome.
Three things have shifted:
using machine learning, we can probabilistically construct meaningful units that tell us something about people without standard PII identifiers;
because we can use machine learning, the value of data shifts from the individual to statistical insights across a distribution; and
breaches of privacy that occur at the statistical layer instead of the individual data layer require new kinds of privacy protections and guarantees.
The technical solution to this last bullet point is a technique called differential privacy. Still in the early stages of commercial adoption, differential privacy thinks about privacy as the extent to which individual data impacts the shape of some statistical distribution. If what we care about is the insight, not the person, then let’s make it so we can’t reverse engineer how one individual contributed to that insight. In other words, the task is to modify a database such that:
if you have two otherwise identical databases, one with your information and one without it, the probability that a statistical query will produce a given result is (nearly) the same whether it’s conducted on the first or second database.
Here’s an example Matthew Green from Johns Hopkins gives to help develop an intuition for how this works:
Imagine that you choose to enable a reporting feature on your iPhone that tells Apple if you like to use the 💩 emoji routinely in your iMessage conversations. This report consists of a single bit of information: 1 indicates you like 💩 , and 0 doesn’t. Apple might receive these reports and fill them into a huge database. At the end of the day, it wants to be able to derive a count of the users who like this particular emoji.
It goes without saying that the simple process of “tallying up the results” and releasing them does not satisfy the DP definition, since computing a sum on the database that contains your information will potentially produce a different result from computing the sum on a database without it. Thus, even though these sums may not seem to leak much information, they reveal at least a little bit about you. A key observation of the differential privacy research is that in many cases, DP can be achieved if the tallying party is willing to add random noise to the result. For example, rather than simply reporting the sum, the tallying party can inject noise from a Laplace or gaussian distribution, producing a result that’s not quite exact — but that masks the contents of any given row.
This is pretty technical. It takes time to understand it, in particular if you’re not steeped in statistics day in and day out, viewing the world as a set of dynamic probability distributions. But it poses a big philosophical question in the context of this post.
In the final chapters of Homo Deus, Yuval Noah Harari proposes that we are moving from the age of Humanism (where meaning emanates from the perspective of the individual human subject) to the age of Dataism (where we question our subjective viewpoints given our proven predilections for mistakes and bias to instead relegate judgment, authority, and agency to algorithms that know us better than we know ourselves). Reasonable expectations for privacy, as Justice Harlan indicated, are subjective, even if they must be supported by some measurement of norms to qualify as reasonable. Consent is individual and subjective, and results in principles like that of minimum use for an acknowledged purpose because we have limited ability to see beyond ourselves, we create traffic jams because we’re so damned focus on the next step, the proxy, as opposed to viewing the system as a whole from a wider vantage point, and only rarely (I presume?) self-identify and view ourselves under the round curves of a distribution. So, if techniques like differential privacy are better apt to protect us in an age where distributions matter more than data points, how should we construct consent, and how should we shape expectations, to craft the right balance between the liberal values we’ve inherited and this mathematical world we’re building? Or, do we somehow need to reformulate our values to align with Dataism?
And, perhaps most importantly, what should peaceful resistance look like and what goals should it achieve?
 What one decides to call the event reveals a lot about how one interprets it. Is it a breach? A scandal? If so, which actor exhibits scandalous behavior: Nix for his willingness to profit from the manipulation of people’s psychology to support the election of an administration that is toppling democracy? Zuckerberg for waiting so long to acknowledge that his social media empire is more than just an advertising platform and has critical impacts on politics and society? The Facebook product managers and security team for lacking any real enforcement mechanisms to audit and verify compliance with data policies? We the people, who have lost our ability and even desire to read critically, as we prefer the sensationalism of click bait, goading technocrats to optimize for whatever headline keeps us hooked to our feed, ever curious for more? Our higher education system, which, falling to economic pressures that date back to (before but were aggravated by) the 2008-2009 financial crisis are cutting any and all curricula for which it’s hard to find a direct, casual line to steady and lucrative employment, as our education system evolves from educating a few thoughtful priests to educating many industrial workers to educating engineers who can build stuff and optimize everything and define proxies and identify efficiencies so we can go faster, faster until we step back and take the time to realize the things we are building may not actually align with our values, that, perhaps, we may need to retain and reclaim our capacities to reflect and judge and reason if we want to sustain the political order we’ve inherited? Or perhaps all of this is just the symptom of much larger, complex trend in World History that we’re unable to perceive, that the Greeks were right in thinking that forms of government pass through inevitable cycles with the regularity of the earth rotating around the sun (an historical perspective itself, as the Greeks thought the inverse) and we should throw our hands up like happy nihilists, bowing down to the unstoppable systemic forces of class warfare, the give and take between the haves and the have nots, little amino acids ever unable to perceive how we impact the function of proteins and how they impact us in return?
And yet, it feels like there may be nothing more important than to understand this and to do what little–what big–we can to make the world a better place. This is our dignity, quixotic though it may be.
 One aspect of the fiasco* I won’t write about but that merits at least passing mention is Elon Musk’s becoming the mascot for the #DeleteFacebook movement (too strong a word?). The New York Times coverage of Musk’s move references Musk and Zuckerberg’s contrasting opinions on the risks AI might pose to humanity. From what I understand, as executives, they both operate on extremely long time scales (i.e., 100 years in the future), projecting way out into speculative futures and working backwards to decide what small steps man should take today to enable Man to take giant future leaps (gender certainly intended, especially in Musk’s case, as I find his aesthetic and many of the very muscular men I’ve met from Tesla at conferences is not dissimilar from the nationalistic masculinity performed by Vladimir Putin). Musk rebuffed Zuckerberg’s criticism that Musk’s rhetoric about the existential threat AI poses to humanity is “irresponsible” by saying that Zuckerberg’s “understanding of the subject is limited.” I had some cognitive dissonance reading this, as I presumed the risk Musk was referring to was that of super-intelligence run amok (à la Nick Bostrom, whom I admittedly reference as a straw man) rather than that of our having created an infrastructure that exacerbates short-term, emotional responses to stimuli and thereby threatens the information exchange required for democracy to function (see Alexis de Tocqueville on the importance of newspapers in Democracy in America). My takeaway from all of this is that there are so many different sub-issues all wrapped up together, and that we in the technology community really do need to work as hard as we can to references specifics rather than allow for the semantic slippage that leaves interpretation in the mind of the beholder. It’s SO HARD to do this, especially for pubic figures like Musk, given that people’s attention spans are limited and we like punchy quotables at a very high level. The devil is always in the details.
 Doctorow references Laurence Lessig’s Code and Other Laws of Cyberspace, which I have yet to read but is hailed as a classic text on the relationship between law and code, where norms get baked into our technologies in the choices of how we write code.
 I always got a kick out of the song Human by the Killers, whose lyrics seem to imply a mutually exclusive distinction between human and dancer. Does the animal kingdom offer better paradigms for dancers than us poor humans? Must depend on whether you’re a white dude.
 My talk drew largely from Chris Dixon‘s extraordinary Atlantic article How Aristotle Created the Computer. Extraordinary because he deftly encapsulates 2000 years of the history of logic into a compelling, easy-to-read article that truly helps the reader develop intuitions about deterministic computer programs and the shift to a more inductive machine learning paradigm, while also not leaving the reader with the bitter taste of having read an overly general dilettante. Here’s one of my slides, which represents how important it was for the history of computation to visualize and interpret Aristotelian syllogisms as sets (sets lead to algebra lead to encoding in logical gates lead to algorithms).
Fortunately (well, we put effort in to coordinate), my talk was a nice primer for Graham Taylor‘s superbly clear introduction to various forms of machine learning. I most liked his section on representation learning, where he showed how the choice of representation of data has an enormous impact on the performance of algorithms:
 If you’re interested in contemporary Constitutional Law, I highly recommend Roman Mars’s What Trump Can Teach us about Con Law podcast. Mars and Elizabeth Joh, a law school professor at UC Davis, use Trump’s entirely anomalous behavior as catalyst to explore various aspects of the Constitution. I particularly enjoyed the episode about the emoluments clause, which prohibits acceptance of diplomatic gifts to the President, Vice President, Secretary of State, and their spouses. The Protocol Gift Unit keeps public record of all gifts presidents did accept, including justification of why they made the exception. For example, in 2016, former President Obama accepted Courage, an olive green with black flecks soapstone sculpture, depicting the profile of an eagle with half of an indigenous man’s face in the center, valued at $650.00, from His Excellency Justin Trudeau, P.C., M.P., Prime Minister of Canada, because “non-acceptance would cause embarrassment to donor and U.S. Government.”
 Cindy will be in Toronto for RightsCon May 16-18. I cannot recommend her highly enough. Every time I hear her speak, every time I read her writing, I am floored by her eloquence, precision, and passionate commitment to justice.
 Another thing I cannot recommend highly enoug is David Foster Wallace’s commencement speech This is Water. It’s ruthlessly important. It’s tragic to think about the fact that this human, this wonderfully enlightened heart, felt the only appropriate act left was to commit suicide.
 A related issue I won’t go into in this post is the third-party doctrine, “under which an individual who voluntarily provides information to a third party loses any reasonable expectation of privacy in that information.” (Cohn)
 Eli Pariser does a great job showing the difference between our frivolous and aspiration selves, and the impact this has on filter bubbles, in his 2011 quite prescient monograph.
 See also this 2014 Harvard Business Review interview with Pentland. My friend Dazza Greenwood first introduced me to Pentland’s work by presenting the blockchain as an effective means to executive the new deal on data, empowering individuals to keep better track of where data flow and sit, and how they are being used.
 Cynthia Dwork’s pioneering work on differential privacy at Microsoft Research dates back to 2006. It’s currently in use at Apple, Facebook, and Google (the most exciting application being fused with federated learning across the network of Android users, to support localized, distributed personalization without requiring that everyone share their digital self with Google’s central servers). Even Uber has released an open-source differential privacy toolset. There are still many limitations to applying these techniques in practice given their impact on model performance and the lack of robust guarantees on certain machine learning models. I don’t know of many instances of startups using the technology yet outside a proud few in the Georgian Partners portfolio, including integrate.ai (where I work) and Bluecore in New York City.
The featured image is from an article in The Daily Dot (which I’ve never heard of) about the Mojave Phone Booth, which, as Roman Mars delightfully narrates in 99% Invisible became a sensation when Godfrey “Doc” Daniels (trust me that link is worth clicking on!!) used the internet to catalogue his quest to find the phone booth working merely from its number: 760-733-9969. The tattered decrepitude of the phone booth, pitched against the indigo of the sunset, is a compelling illustration of the inevitable retrograde character of common law precedent. The opinions in Katz v. United States regarded reasonably expectations for privacy were given at a time when digital communications occurred largely over the phone: is it even possible for us to draw analogies between what privacy meant then and what it could mean now in the age of centralized platform technologies whose foundations are built upon creating user bases and markets to then exchange this data for commercial and political advertising purposes? But, what can we use to anchor ethics and lawful behavior if not the precedent of the past, aligned against a set of larger, overarching principles in an urtext like the constitution, or, in the Islamic tradition, the Qur’an?
There was another life that I might have had, but I am having this one. – Kazuo Ishiguro
On April 18, 2016*, I attended an NYAI Meetup** featuring a talk by Columbia Computer Science Professor Dan Hsu on interactive learning. Incredibly clear and informative, the talk slides are worth reviewing in their entirety. But one in particular caught my attention (fortunately it summarizes many of the subsequent examples):
It’s worth stepping back to understand why this is interesting.
Much of the recent headline-grabbing progress in artificial intelligence (AI) comes from the field of supervised learning. As I explained in a recent HBR article, I find it helpful to think of supervised learning like the inverse of high school algebra:
Think back to high school math — I promise this will be brief — when you first learned the equation for a straight line: y = mx + b. Algebraic equations like this represent the relationship between two variables, x and y. In high school algebra, you’d be told what m and b are, be given an input value for x, and then be asked to plug them into the equation to solve for y. In this case, you start with the equation and then calculate particular values.
Supervised learning reverses this process, solving for m and b, given a set of x’s and y’s. In supervised learning, you start with many particulars — the data — and infer the general equation. And the learning part means you can update the equation as you see more x’s and y’s, changing the slope of the line to better fit the data. The equation almost never identifies the relationship between each x and y with 100% accuracy, but the generalization is powerful because later on you can use it to do algebra on new data. Once you’ve found a slope that captures a relationship between x and y reliably, if you are given a new x value, you can make an educated guess about the corresponding value of y.
Supervised learning works well for classification problems (spam or not spam? relevant or not for my lawsuit? cat or dog?) because of how the functions generalize. Effectively, the “training labels” humans provide in supervised learning assign categories, tokens we affiliate to abstractions from the glorious particularities of the world that enable us to perceive two things to be similar. Because our language is relatively stable (stable does not mean normative, as Canadian Inuit perceive snow differently from New Yorkers because they have more categories to work with), generalities and abstractions are useful, enabling the learned system to act correctly in situations not present in the training set (e.g., it takes a hell of a long time for golden retrievers to evolve to be indistinguishable from their great-great-great-great-great-grandfathers, so knowing what one looks like on April 18, 2016 will be a good predictor of what one looks like on December 2, 2017). But, as Rich Sutton*** and Andrew Barto eloquently point out in their textbook on reinforcement learning,
This is an important kind of learning, but alone it is not adequate for learning from interaction. In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. In uncharted territory—where one would expect learning to be most beneficial—an agent must be able to learn from its own experience.
In his NYAI talk, Dan Hsu also mentioned a common practical limitation of supervised learning, namely that many companies often lack good labeled training data and it can be expensive, even in the age of Mechanical Turk, to take the time to provide labels.**** The core thing to recognize is that learning from generalization requires that future situations look like past situations; learning from interaction with the environment helps develop a policy for action that can be applied even when future situations do not look exactly like past situations. The maxim “if you don’t have anything nice to say, don’t say anything at all” holds both in a situation where you want to gossip about a colleague and in a situation where you want to criticize a crappy waiter at a restaurant.
In a supervised learning paradigm, there are certainly traps to make faulty generalizations from the available training data. One classic problem is called “overfitting”, where a model seems to do a great job on a training data set but fails to generalize well to new data. But the super critical salient difference Hsu points out in his talk is that, while with supervised learning the data available to the learner is exogenous to the system, with interactive machine learning approaches, the learner’s performance is based on the learner’s decisions and the data available to the world depends on the learner’s decisions.
Think about that. Think about what that means for gauging the consequences of decisions. Effectively, these learners cannot evaluate counterfactuals: they cannot use data or evidence to judge what would have happened if they took a different action. An ideal optimization scenario, by contrast, would be one where we could observe the possible outcomes of any and all potential decisions, and select the action with the best outcome across all these potential scenarios (this is closer, but not identical, to the spirit of variational inference, but that is a complex topic for another post).
To share one of Hsu’s***** concrete examples, let’s say a website operator has a goal to personalize website content to entice a consumer to buy a pair of shoes. Before the user shows up at the site, our operator has some information about her profile and browsing history, so can use past actions to guess what might be interesting bait to get a click (and eventually a purchase). So, at the moment of truth, the operator says “Let’s show the beige Cole Hann high heels!”, displays the content, and observes the reaction. We’ll give the operator the benefit of the doubt and assume the user clicks, or even goes on to purchase. Score! Positive signal! Do that again in the future! But was it really the best choice? What would have happened if the operator had shown the manipulatable consumer the red Jimmy Choo high heels, which cost $750 per pair rather than a more modest $200 per pair? Would the manipulatable consumer have clicked? Was this really the best action?
The learner will never know. It can only observe the outcome of the action it took, not the action it didn’t take.
The literature refers to this dilemma as the trade-off between exploration and exploitation. To again cite Sutton and Barto:
One of the challenges that arise in reinforcement learning, and not in other kinds of learning, is the trade-off between exploration and exploitation. To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it already knows in order to obtain reward, but it also has to explore in order to make better action selections in the future. The dilemma is that neither exploration nor exploitation can be pursued exclusively without failing at the task. The agent must try a variety of actions and progressively favor those that appear to be best. On a stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward.
There’s a lot to say about the exploration-exploitation tradeoff in machine learning (I recommend starting with the Sutton/Barto textbook). Now that I’ve introduced the concept, I’d like to pivot to consider where and why this is relevant in honest-to-goodness-real-life.
The nice thing about being an interactive machine learning algorithm as opposed to a human is that algorithms are executors, not designers or managers. They’re given a task (“optimize revenues for our shoe store!”) and get to try stuff and make mistakes and learn from feedback, but never have to go through the soul-searching agony of deciding what goal is worth achieving. Human designer overlords take care of that for them. And even the domain and range of possible data to learn from is constrained by technical conditions: designers make sure that it’s not all the data out there in the world that’s used to optimize performance on some task, but a tiny little baby subset (even if that tiny little baby entails 500 million examples) confined within a sphere of relevance.
Being a human is unfathomably more complicated.
Many choices we make benefit from the luxury of triviality and frequency. “Where should we go for dinner and what should we eat when we get there?” Exploitation can be a safe choice, in particular for creatures of habit. “Well, sweetgreen is around the corner, it’s fast and reliable. We could take the time to review other restaurants (which could lead to the most amazing culinary experience of our entire lives!) or we could not bother to make the effort, stick with what we know, and guarantee a good meal with our standard kale caesar salad, that parmesan crisp thing they put on the salad is really quite tasty…” It’s not a big deal if we make the wrong choice because, low and behold, tomorrow is another day with another dinner! And if we explore something new, it’s possible the food will be just terrible and sometimes we’re really not up for the risk, or worse, the discomfort or shame of having to send something we don’t like back. And sometimes it’s fine to take the risk and we come to learn we really do love sweetbreads, not sweetgreens, and perhaps our whole diet shifts to some decadent 19th-century French paleo practice in the style of des Esseintes.
Other choices have higher stakes (or at least feel like they do) and easily lead to paralysis in the face of uncertainty. Working at a startup strengthens this muscle every day. Early on, founders are plagued by an unknown amount of unknown unknowns. We’d love to have a magic crystal ball that enables us to consider the future outcomes of a range of possible decisions, and always act in the way that guarantees future success. But the crystal balls don’t exist, and even if they did, we sometimes have so few prior assumptions to prime the pump that the crystal ball could only output an #ERROR message to indicate there’s just not enough there to forecast. As such, the only option available is to act and to learn from the data provided as a result of that action. To jumpstart empiricism, staking some claim and getting as comfortable as possible with the knowledge that the counterfactual will never be explored, and that each action taken shifts the playing field of possibility and probability and certainty slightly, calming minds and hearts. The core challenge startup leaders face is to enable the team to execute as if these conditions of uncertainty weren’t present, to provide a safe space for execution under the umbrella of risk and experiment. What’s fortunate, however, is that the goals of the enterprise are, if not entirely well-defined, at least circumscribed. Businesses exist to turn profits and that serves as a useful, if not always moral, constraint.
Big personal life decisions exhibit further variability because we but rarely know what to optimize for, and it can be incredibly counter-productive and harmful to either constrain ourselves too early or suffer from the psychological malaise of assuming there’s something wrong with us if we don’t have some master five-year plan.
This human condition is strange because we do need to set goals–it’s beneficial for us to consider second- and third-tier consequences, i.e., if our goal is to be healthy and fit, we should overcome the first-tier consequence of receiving pleasure when we drown our sorrows in a gallon of salted caramel ice cream–and yet it’s simply impossible for us to imagine the future accurately because, well, we overfit to our present and our past.
I’ll give a concrete example from my own experience. As I touched upon in a recent post about transitioning from academia to business, one reason why it’s so difficult to make a career change is that, while we never actually predict the future accurately, it’s easier to fear loss from a known predicament than to imagine gain from a foreign predicament.****** Concretely, when I was deciding whether to pursue a career in academia or the private sector in the fifth year in graduate school, I erroneously assumed that I was making a strict binary choice, that going into business meant forsaking a career teaching or publishing. As I was evaluating my decision, I never in my wildest dreams imagined that, a mere two years later, I would be invited to be an adjunct professor at the University of Calgary Faculty of Law, teaching about how new technologies were impacting traditional professional ethics. And I also never imagined that, as I gave more and more talks, I would subsequently be invited to deliver guest lectures at numerous business schools in North America. This path is not necessarily the right path for everyone, but it was and is the right path for me. In retrospect, I wish I’d constructed my decision differently, shifting my energy from fearing an unknown and unknowable future to paying attention to what energized me and made me happy and working to maximize the likelihood of such energizing moments occurring in my life. I still struggle to live this way, still fetishize what I think I should be wanting to do and living with an undercurrent of anxiety that a choice, a foreclosure of possibility, may send me down an irreconcilably wrong path. It’s a shitty way to be, and something I’m actively working to overcome.
So what should our policy be? How can we reconcile this terrific trade-off between exploration and exploitation, between exposing ourselves to something radically new and honing a given skill, between learning from a stranger and spending more time with a loved one, between opening our mind to some new field and developing niche knowledge in a given domain, between jumping to a new company with new people and problems, and exercising our resilience and loyalty to a given team?
There is no right answer. We’re all wired differently. We all respond to challenges differently. We’re all motivated by different things.
Perhaps death is the best constraint we have to provide some guidance, some policy to choose between choice A and choice B. For we can project ourselves forward to our imagined death bed, where we lie, alone, staring into the silent mirror of our hearts, and ask ourselves “Was my life was meaningful?” But this imagined scene is not actually a future state: it is a present policy. It is a principle we can use to evaluate decisions, a principle that is useful because it abstracts us from the mire of emotions overly indexed towards near-term goals and provides us with perspective.
And what’s perhaps most miraculous is that, at every present, we can sit there are stare into the silent mirror of our hearts and look back on the choices we’ve made and say, “That is me.” It’s so hard going forward, and so easy going backward. The proportion of what may come wanes ever smaller than the portion of what has been, never quite converging until it’s too late, and we are complete.
*Thank you, internet, for enabling me to recall the date with such exacting precision! Using my memory, I would have deduced the approximate date by 1) remembering that Robert Colpitts, my boyfriend at the time (Godspeed to him today, as he participates in a sit-a-thon fundraiser for the Interdependence Project in New York City, a worthy cause), attended with me, recalling how fresh our relationship was (it had to have been really fresh because the frequency with which we attended professional events together subsequently declined), and working backwards from the start to find the date; 2) remembering what I wore! (crazy!!), namely a sheer pink sleeveless shirt, a pair of wide-legged white pants that landed just slightly above the ankle and therefore looked great with the pair of beige, heeled sandals with leather so stiff it gave me horrific blisters that made running less than pleasant for the rest of the week. So I’d recently purchased those when my brother and his girlfriend visited, which was in late February (or early March?) 2016; 3) remembering that afterwards we went to some fast food Indian joint nearby in the Flatiron district, food was decent but not good enough to inspire me to return. So that would put is in the March-April, 2016 range, which is close but not the exact April 18. That’s one week after my birthday (April 11); I remember Robert and I had a wonderful celebration on my birthday. I felt more deeply cared for than I had in any past birthdays. But I don’t remember this talk relative to the birthday celebration (I do remember sending the marketing email to announce the Fast Forward Labs report on text summarization on my birthday, when I worked for half day and then met Robert at the nearby sweetgreen, where he ordered, as always, (Robert is a creature of exploitation) the kale caesar salad, after which we walked together across the Brooklyn Bridge to my house, we loved walking together, we took many, many walks together, often at night after work at the Promenade, often in the morning, before work, at the Promenade, when there were so few people around, so few people awake). I must say, I find the process of reconstructing when an event took place using temporal landmarks much more rewarding than searching for “Dan Hsu Interactive Learning NYAI” on Google to find the exact date. But the search terms themselves reveal something equally interesting about our heuristic mnemonics, as every time we reconstruct some theme or topic to retrieve a former conversation on Slack.
**Crazy that WeWork recently bought Meetup, although interesting to think about how the two business models enable what I am slowly coming to see as the most important creative force in the universe, the combinatory potential of minds meeting productively, where productively means that each mind is not coming as a blank slate but as engaged in a project, an endeavor, where these endeavors can productively overlap and, guided by a Smithian invisible hand, create something new. The most interesting model we hope to work on soon at integrate.ai is one that optimizes groups in a multiplayer game experience (which we lovingly call the polyamorous online dating algorithm), so mapping personality and playing style affinities to dynamically allocate the best next player to an alliance. Social compatibility is a fascinating thing to optimize for, in particular when it goes beyond just assembling a pleasant cocktail party to pairing minds, skills, and temperaments to optimize the likelihood of creating something beautiful and new.
***Sutton has one of the most beautiful minds in the field and he is kind. He is a person to celebrate. I am grateful our paths have crossed and thoroughly enjoyed our conversation on the In Context podcast.
***Maura Grossman and Gordon Cormack have written countless articles about the benefits of using active learning for technology assisted review (TAR), or classifying documents for their relevance for a lawsuit. The tradeoffs they weigh relate to system performance (gauged by precision and recall on a document set) versus time, cost, and effort to achieve that performance.
*****Hsu did not mention Haan or Choo. I added some more color.
******Note this same dynamic occurs in our current fears about the future economy. We worry a hell of a lot more about the losses we will incur if artificial intelligence systems automate existing jobs than we celebrate the possibilities of new jobs and work that might become possible once these systems are in place. This is also due to the fact that the future we imagine tends to be an adaptation of what we know today, as delightfully illustrated in Jean-Marc Côté’s anachronistic cartoons of the year 2000. The cartoons show what happens when our imagination only changes one variable as opposed to a set of holistically interconnected variables.
The featured image is a photograph I took of the sidewalk on State Street between Court and Clinton Streets in Brooklyn Heights. I presume a bird walked on wet concrete. Is that how those kinds of footprints are created? I may see those footprints again in the future, but not nearly as soon as I’d be able to were I not to have decided to move to Toronto in May. Now that I’ve thought about them, I may intentionally make the trip to Brooklyn next time I’m in New York (certainly before January 11, unless I die between now and then). I’ll have to seek out similar footprints in Toronto, or perhaps the snows of Alberta.
The Sagrada Familia is a castle built by Australian termites.
The Sagrada Familia is not a castle built by Australian termites, and never will be. Tis utter blasphemy.
The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell, in near silence, near but for the methodical gnawing, not unlike that of a mouse nibbling rapaciously on parched pasta left uneaten all these years but preserved under the thick dust on the thin cardboard with the thin plastic window enabling her to view what remained after she’d cooked just one serving, with butter, for her son, there emerged and fell, with the sublime transience of Andy Goldsworthy, a neo-Gothic church of organic complexity on par with that imagined by Antoni Gaudí i Cornet, whose Sagrada Familia is scheduled for completion in 2026, a full century after the architect died in a tragic tram crash, distracted by the recent rapture of his prayer.
The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell a structure so eerily resemblant of the one Antoni Gaudí imagined before he died, neglected like a beggar in his shabby clothes, the doctors unaware they had the chance to save the mind that preempted the fluidity of contemporary parametric architectural design by some 80 odd years, a mind supple like that of Poincaré, singular yet part of a Zeitgeist bent on infusing time into space like sandalwood in oil, inseminating Euclid’s cold geometry with femininity and life, Einstein explaining why Mercury moves retrograde, Gaudí rendering the holy spirit palpable as movement in stone, fractals of repetition and difference giving life to inorganic matter, tension between time and space the nadir of spirituality, as Andrei Tarkovsky went on to explore in his films.
The Sagrada Familia is not a castle built by Australian termites, and yet, Look! Notice, as Daniel Dennett bids, how in an untrodden field in Australia there emerged and fell a structure so eerily resemblant of the one Antoni Gaudí imagined before he died, with the (seemingly crucial) difference that the termites built their temple without blueprints or plan, gnawing away the silence as a collectivity of single stochastic acts which, taken together over time, result in a creation that appears, to our meaning-making minds, to have been created by an intelligent designer, this termite Sagrada Familia a marvelous instance of what Dennett calls Darwin’s strange inversion of reasoning, an inversion that admits to the possibility that absolute ignorance can serve as master artificer, that IN ORDER TO MAKE A PERFECT AND BEAUTIFUL MACHINE, IT IS NOT REQUISITE TO KNOW HOW TO MAKE IT*, that structures might emerge from the local activity of multiple parts, amino acids folding into proteins, bees flying into swarms, bumper-to-bumper traffic suddenly flowing freely, these complex release valves seeming like magic to the linear perspective of our linear minds.
The Sagrada Familia is not a castle built by Australian termites, and yet, the eerie resemblance between the termite and the tourist Sagrada Familias serves as a wonderful example to anchor a very important cultural question as we move into an age of post-intelligent design, where the technologies we create exhibit competence without comprehension, diagnosing lungs as cancerous or declaring that individuals merit a mortgage or recommending that a young woman would be a good fit for a role on a software engineering team or getting better and better at Go by playing millions of games against itself in a schizophrenic twist resemblant of the pristine pathos of Stephan Zweig, one’s own mind an asylum of exiled excellence during the travesty of the second world war, why, we’ve come full circle and stand here at a crossroads, bidden by a force we ourselves created to accept the creative potential of Lucretius’ swerve, to kneel at the altar of randomness, to appreciate that computational power is not just about shuffling 1s and 0s with speed but shuffling them fast enough to enable a tiny swerve to result in wondrous capabilities, and to watch as, perhaps tragically, we apply a framework built for intelligent design onto a Darwinian architecture, clipping the wings of stochastic potential, working to wrangle our gnawing termites into a straight jacket of cause, while the systems beating Atari, by no act of strategic foresight but by the blunt speed of iteration, make a move so strange and so outside the realm of verisimilitude that, as Kasparov succumbing to Deep Blue, we misinterpret a bug for brilliance.
The Sagrada Familia is not a castle built by Australian termites, and yet, it seems plausible that Gaudí would have reveled in the eerie resemblance between a castle built by so many gnawing termites and the temple Josep Maria Bocabella i Verdaguer, a bookseller with a popular fundamentalist newspaper, “the kind that reminded everybody that their misery was punishment for their sins,”**commissioned him to build.
Or would he? Gaudí was deeply Catholic. He genuflected at the temple of nature, seeing divine inspiration in the hexagons of honeycombs, imagining the columns of the Sagrada Familia to lean, buttresses, as symbols of the divine trilogy of the father (the vertical axis), son (the horizontal axis), and holy spirit (the vertical meeting the horizontal in crux of the diagonal). His creativity, therefore, always stemmed from something more than intelligent design, stood as an act of creative prayer to render homage to God the creator by creating an edifice that transformed, in fractals of repetition in difference, inert stone into movement and life.
The Sagrada Familia is not a castle built by Australian termites, and yet, the termite Sagrada Familia actually exists as a complete artifact, its essence revealed to the world rather than being stuck in unfinished potential. And yet, while we wait in joyful hope for its imminent completion, this unfinished, 144-year-long architectural project has already impacted so many other architects, from Frank Gehry to Zaha Hadid. This unfinished vision, this scaffold, has launched a thousand ships of beauty in so many other places, changing the skylines of Bilbao and Los Angeles and Hong Kong. Perhaps, then, the legacy of the Sagrada Family is more like that of Jodorowsky’s Dune, an unfinished film that, even from its place of stunted potential, changed the history of cinema. Perhaps, then, the neglect the doctors showed to Gaudí, the bearded beggar distracted by his act of prayer, was one of those critical swerves in history. Perhaps, had Gaudí lived to finish his work, architects during the century wouldn’t have been as puzzled by the parametric requirements of his curves and the building wouldn’t have gained the puzzling aura it gleans to this day. Perhaps, no matter how hard we try to celebrate and accept the immense potential of stochasticity, we will always be makers of meaning, finders of cause, interpreters needing narrative to live grounded in our world. And then again, perhaps not.
The Sagrada Familia is not a castle built by Australian termites. The termites don’t care either way. They’ll still construct their own Sagrada Familia.
The Sagrada Familia is a castle built by Australian termites. How wondrous. How essential must be these shapes and forms.
The Sagrada Familia is a castle built by Australian termites. It is also an unfinished neo-Gothic church in Barcelona, Spain. Please, terrorists, please don’t destroy this temple of unfinished potential, this monad brimming the history of the world, each turn, each swerve a pivot down a different section of the encyclopedia, coming full circle in its web of knowledge, imagination, and grace.
The Sagrada Familia is a castle built by Australian termites. We’ll never know what Gaudí would have thought about the termite castle. All we have are the relics of his Poincaréan curves, and fish lamps to illuminate our future.
*Dennett reads these words, penned in 1868 by Robert Beverley MacKenzie, with pedantic panache, commenting that the capital letters were in the original.
The featured image comes from Daniel Dennett’s From Bacteria to Bach and Back. I had the immense pleasure of interviewing Dan on the In Context podcast, where we discuss many of the ideas that appear in this post, just in a much more cogent form.
That familiar discomfort of wanting to write but not feeling ready yet.*
(The default voice pops up in my brain: “Then don’t write! Be kind to yourself! Keep reading until you understand things fully enough to write something cogent and coherent, something worth reading.”
The second voice: “But you committed to doing this! To not write** is to fail.***”
The third voice: “Well gosh, I do find it a bit puerile to incorporate meta-thoughts on the process of writing so frequently in my posts, but laziness triumphs, and voilà there they come. Welcome back. Let’s turn it to our advantage one more time.”)
This time the courage to just do it came from the realization that “I don’t understand this yet” is interesting in itself. We all navigate the world with different degrees of knowledge about different topics. To follow Wilfred Sellars, most of the time we inhabit the manifest image, “the framework in terms of which man came to be aware of himself as man-in-the-world,” or, more broadly, the framework in terms of which we ordinarily observe and explain our world. We need the manifest image to get by, to engage with one another and not to live in a state of utter paralysis, questioning our every thought or experience as if we were being tricked by the evil genius Descartes introduces at the outset of his Meditations (the evil genius toppled by the clear and distinct force of the cogito, the I am, which, per Dan Dennett, actually had the reverse effect of fooling us into believing our consciousness is something different from what it actually is). Sellars contrasts the manifest image with the scientific image: “the scientific image presents itself as a rival image. From its point of view the manifest image on which it rests is an ‘inadequate’ but pragmatically useful likeness of a reality which first finds its adequate (in principle) likeness in the scientific image.” So we all live in this not quite reality, our ability to cooperate and coexist predicated pragmatically upon our shared not-quite-accurate truths. It’s a damn good thing the mess works so well, or we’d never get anything done.
Sellars has a lot to say about the relationship between the manifest and scientific images, how and where the two merge and diverge. In the rest of this post, I’m going to catalogue my gradual coming to not-yet-fully understanding the relationship between mathematical machine learning models and the hardware they run on. It’s spurring my curiosity, but I certainly don’t understand it yet. I would welcome readers’ input on what to read and to whom to talk to change my manifest image into one that’s slightly more scientific.
So, one common thing we hear these days (in particular given Nvidia’s now formidable marketing presence) is that graphical processing units (GPUs) and tensor processing units (TPUs) are a key hardware advance driving the current ubiquity in artificial intelligence (AI). I learned about GPUs for the first time about two years ago and wanted to understand why they made it so much faster to train deep neural networks, the algorithms behind many popular AI applications. I settled with an understanding that the linear algebra–operations we perform on vectors, strings of numbers oriented in a direction in an n-dimensional space–powering these applications is better executed on hardware of a parallel, matrix-like structure. That is to say, properties of the hardware were more like properties of the math: they performed so much more quickly than a linear central processing unit (CPU) because they didn’t have to squeeze a parallel computation into the straightjacket of a linear, gated flow of electrons. Tensors, objects that describe the relationships between vectors, as in Google’s hardware, are that much more closely aligned with the mathematical operations behind deep learning algorithms.
There are two levels of knowledge there:
Basic sales pitch: “remember, GPU = deep learning hardware; they make AI faster, and therefore make AI easier to use so more possible!”
Just above the basic sales pitch: “the mathematics behind deep learning is better represented by GPU or TPU hardware; that’s why they make AI faster, and therefore easier to use so more possible!”
At this first stage of knowledge, my mind reached a plateau where I assumed that the tensor structure was somehow intrinsically and essentially linked to the math in deep learning. My brain’s neurons and synapses had coalesced on some local minimum or maximum where the two concepts where linked and reinforced by talks I gave (which by design condense understanding into some quotable meme, in particular in the age of Twitter…and this requirement to condense certainly reinforces and reshapes how something is understood).
In time, I started to explore the strange world of quantum computing, starting afresh off the local plateau to try, again, to understand new claims that entangled qubits enable even faster execution of the math behind deep learning than the soddenly deterministic bits of C, G, and TPUs. As Ivan Deutsch explains this article, the promise behind quantum computing is as follows:
In a classical computer, information is stored in retrievable bits binary coded as 0 or 1. But in a quantum computer, elementary particles inhabit a probabilistic limbo called superposition where a “qubit” can be coded as 0 and 1.
Here is the magic: Each qubit can be entangled with the other qubits in the machine. The intertwining of quantum “states” exponentially increases the number of 0s and 1s that can be simultaneously processed by an array of qubits. Machines that can harness the power of quantum logic can deal with exponentially greater levels of complexity than the most powerful classical computer. Problems that would take a state-of-the-art classical computer the age of our universe to solve, can, in theory, be solved by a universal quantum computer in hours.
What’s salient here is that the inherent probabilism of quantum computers make them even more fundamentally aligned with the true mathematics we’re representing with machine learning algorithms. TPUs, then, seem to exhibit a structure that best captures the mathematical operations of the algorithms, but exhibit the fatal flaw of being deterministic by essence: they’re still trafficking in the binary digits of 1s and 0s, even if they’re allocated in a different way. Quantum computing seems to bring back an analog computing paradigm, where we use aspects of physical phenomena to model the problem we’d like to solve. Quantum, of course, exhibits this special fragility where, should the balance of the system be disrupted, the probabilistic potential reverts down to the boring old determinism of 1s and 0s: a cat observed will be either dead or alive, as the harsh law of the excluded middle haunting our manifest image.
What, then, is the status of being of the math? I feel a risk of falling into Platonism, of assuming that a statement like “3 is prime” refers to some abstract entity, the number 3, that then gets realized in a lesser form as it is embodied on a CPU, GPU, or cup of coffee. It feels more cogent to me to endorse mathematical fictionalism, where mathematical statements like “3 is prime” tell a different type of truth than truths we tell about objects and people we can touch and love in our manifest world.****
My conclusion, then, is that radical creativity in machine learning–in any technology–may arise from our being able to abstract the formal mathematics from their substrate, to conceptually open up a liminal space where properties of equations have yet to take form. This is likely a lesson for our own identities, the freeing from necessity, from assumption, that enables us to come into the self we never thought we’d be.
I have a long way to go to understand this fully, and I’ll never understand it fully enough to contribute to the future of hardware R&D. But the world needs communicators, translators who eventually accept that close enough can be a place for empathy, and growth.
*This holds not only for writing, but for many types of doing, including creating a product. Agile methodologies help overcome the paralysis of uncertainty, the discomfort of not being ready yet. You commit to doing something, see how it works, see how people respond, see what you can do better next time. We’re always navigating various degrees of uncertainty, as Rich Sutton discussed on the In Context podcast. Sutton’s formalization of doing the best you can with the information you have available today towards some long-term goal, but learning at each step rather than waiting for the long-term result, is called temporal-difference learning.
**Split infinitive intentional.
***Who’s keeping score?
****That’s not to say we can’t love numbers, as Euler’s Identity inspires enormous joy in me, or that we can’t love fictional characters, or that we can’t love misrepresentations of real people that we fabricate in our imaginations. I’ve fallen obsessively in love with 3 or 4 imaginary men this year, creations of my imagination loosely inspired by the real people I thought I loved.
The image comes from this site, which analyzes themes in films by Darren Aronofsky. Maximilian Cohen, the protagonist of Pi, sees mathematical patterns all over the place, which eventually drives him to put a drill into his head. Aronofsky has a penchant for angst. Others, like Richard Feynman, find delight in exploring mathematical regularities in the world around us. Soap bubbles, for example, offer incredible complexity, if we’re curious enough to look.
Last week, I attended the C2 conference in Montréal, which featured an AI Forum coordinated by Element AI.* Two friends from Google, Hugo LaRochelle and Blaise Agüera y Arcas, led workshops about the societal (Hugo) and ethical (Blaise) implications of artificial intelligence (AI). In both sessions, participants expressed discomfort with allowing machines to automate decisions, like what advertisement to show to a consumer at what time, whether a job candidate should pass to the interview stage, whether a power grid requires maintenance, or whether someone is likely to be a criminal.** While each example is problematic in its own way, a common response to the increasing ubiquity of algorithms is to demand a “right to explanation,” as the EU recently memorialized in the General Data Protection Regulation slated to take effect in 2018. Algorithmic explainability/interpretability is currently an active area of research (my former colleagues at Fast Forward Labs will publish a report on the topic soon and members of Geoff Hinton’s lab in Toronto are actively researching it). While attempts to make sense of nonlinear functions are fascinating, I agree with Peter Sweeney that we’re making a category mistake by demanding explanations from algorithms in the first place: the statistical outputs of machine learning systems produce new observations, not explanations. I’ll side here with my namesake, David Hume, and say we need to be careful not to fall into the ever-present trap of mistaking correlation for cause.
One reason why people demand a right to explanation is that they believe that knowing why will grant us more control over outcome. For example, if we know that someone was denied a mortgage because of their race, we can intervene and correct for this prejudice. A deeper reason for the discomfort stems from the fact that people tend to falsely attribute consciousness to algorithms, applying standards for accountability that we would apply to ourselves as conscious beings whose actions are motivated by a causal intention. (LOL***)
Now, I agree with Noah Yuval Harari that we need to frame our understanding of AI as intelligence decoupled from consciousness. I think understanding AI this way will be more productive for society and lead to richer and cleaner discussions about the implications of new technologies. But others are actively at work to formally describe consciousness in what appears to be an attempt to replicate it.
In what follows, I survey three interpretations of consciousness I happened to encounter (for the first time or recovered by analogical memory) this week. There are many more. I’m no expert here (or anywhere). I simply find the thinking interesting and worth sharing. I do believe it is imperative that we in the AI community educate the public about how the intelligence of algorithms actually works so we can collectively worry about the right things, not the wrong things.
Condillac wrote the Traité in 1754, and the work exhibits two common trends from the French Enlightenment:
A concerted effort to topple Descartes’s rationalist legacy, arguing that all cognition starts with sense data rather than inborn mathematical truths
A stylistic debt to Descartes’s rhetoric of analysis, where arguments are designed to conjure a first-person experience of the process of arriving at an insight, rather than presenting third-person, abstract lessons learned
The Traité starts with the assumption that we can tease out each of our senses and think about how we process them in isolation. Condillac bids the reader to imagine a statue with nothing but the sense of smell. Lacking sight, sound, and touch, the statue “has no ideas of space, shape, anything outside of herself or outside her sensations, nothing of color, sound, or taste.” She is, in my opinion incredibly sensuously, nothing but the odor of a flower we waft in front of her. She becomes it. She is totally present. Not the flower itself, but the purest experience of its scent.
As Descartes constructs a world (and God) from the incontrovertible center of the cogito, so too does Condillac construct a world from this initial pure scent of rose. After the rose, he wafts a different flower – a jasmine – in front of the statue. Each sensation is accompanied by a feeling of like or dislike, of wanting more or wanting less. The statue begins to develop the faculties of comparison and contrast, the faculty of memory with faint impressions remaining after one flower is replaced by another, the ability to suffer in feeling a lack of something she has come to desire. She appreciates time as an index of change from one sensation to the next. She learns surprise as a break from the monotony of repetition. Condillac continues this process, adding complexity with each iteration, like the escalating tension Shostakovich builds variation after variation in the Allegretto of the Leningrad Symphony.
True consciousness, for Condillac, begins with touch. When she touches an object that is not her body, the sensation is unilateral: she notes the impenetrability and resistance of solid things, that she cannot just pass through them like a ghost or a scent in the air. But when she touches her own body, the sensation is bilateral, reflexive: she touches and is touched by. C’est moi, the first notion of self-awareness, is embodied. It is not a reflexive mental act that cannot take place unless there is an actor to utter it. It is the strangeness of touching and being touched all at once. The first separation between self and world. Consciousness as fall from grace.
It’s valuable to read Enlightenment philosophers like Condillac because they show attempts made more than 200 years ago to understand a consciousness entirely different from our own, or rather, to use a consciousness different from our own as a device to better understand ourselves. The narrative tricks of the Enlightenment disguised analytical reduction (i.e., focus only on smell in absence of its synesthetic entanglement with sound and sight) as world building, turning simplicity into an anchor to build a systematic understanding of some topic (Hobbes’s and Rousseau’s states of nature and social contract theories use the same narrative schema). Twentieth-century continental philosophers after Husserl and Heidegger preferred to start with our entanglement in a web of social context.
Koch and Tononi: Integrated Information Theory
In a recent Institute of Electrical and Electronics Engineers (IEEE) article, Christof Koch and Giulio Tononi embrace a different aspect of the Cartesian heritage, claiming that “a fundamental theory of consciousness that offers hope for a principled answer to the question of consciousness in entities entirely different from us, including machines…begins from consciousness itself–from our own experience, the only one we are absolutely certain of.” They call this “integrated information theory” (IIT) and say it has five essential properties:
Every experience exists intrinsically (for the subject of that experience, not for an external observer)
Each experience is structured (it is composed of parts and the relations among them)
It is integrated (it cannot be subdivided into independent components)
It is definite (it has borders, including some contents and excluding others)
It is specific (every experience is the way it is, and thereby different from trillions of possible others)
This enterprise is problematic for a few reasons. First, none of this has anything to do with Descartes, and I’m not a fan of sloppy references (although I make them constantly).
More importantly, Koch and Tononi imply that it’s a more valuable to try to replicate consciousness than to pursue a paradigm of machine intelligence different from human consciousness. The five characteristics listed above are the requirements for the physical design of an internal architecture of a system that could support a mind modeled after our own. And the corollary is that a distributed framework for machine intelligence, as illustrated in the film Her*****, will never achieve consciousness and is therefore inferior.
Their vision is very hard to comprehend and ultimately off base. Some of the most interesting work in machine intelligence today consists in efforts to develop new hardware and algorithmic architectures that can support training algorithms at the edge (versus currying data back to a centralized server), which enable personalization and local machine-to-machine communication (for IoT or self-driving cars) opportunities while protecting privacy. (See, for example, Xnor.ai, Federated Learning, and Filament).
Distributed intelligence presents a different paradigm for harvesting knowledge from the raw stuff of the world than the minds we develop as agents navigating a world from one subjective place. It won’t be conscious, but its very alterity may enable us to understand our species in its complexity in ways that far surpass our own consciousness, shackled as embodied monads. It may just be the crevice through which we can quantify a more collective consciousness, but will require that we be open minded enough to expand our notion of humanism. It took time, and the scarlet stains of ink and blood, to complete the Copernican Revolution; embracing the complexity of a more holistic humanism, in contrast to the fearful, nationalist trends of 2016, will be equally difficult.
Friston: Probable States and Counterfactuals
The third take on consciousness comes from The mathematics of mind-time, a recent Aeon essay by UCL neurologist Karl Friston.***** Friston begins his essay by comparing and contrasting consciousness and Darwinian evolution, arguing that neither is a thing, like a table or a stick of butter, that can be reified and touched and looked it, but rather that both are nonlinear processes “captured by variables with a range of possible values.” The move from one state to another following some motor that organizes their behavior: Friston calls this motor a Lyapunov function, “a mathematical quantity that describes how a system is likely to behave under specific condition.” The key thing with Lyapunov functions is that they minimize surprise (the improbability of being in a particular state) and maximize self-evidence (the probability that a given explanation or model accounting for the state is correct). Within this framework, “natural selection performs inference by selecting among different creatures, [and] consciousness performs inference by selecting among different states of the same creature (in particular, its brain).” Effectively, we are constantly constructing our consciousness as we imagine the potential future possible worlds that would result from an actions we’re considering taking, and then act — or transition to the next state in our mind’s Lyapunov function — by selecting that action that best preserves the coherence of our existing state – that best seems to preserve our I or identity function in some predicted future state. (This is really complex but really compelling if you read it carefully and quite in line with Leibnizian ontology–future blog post!)
So, why is this cool?
There are a few things I find compelling in this account. First, when we reify consciousness as a thing we can point to, we trap ourselves into conceiving of our own identities as static and place too much importance on the notion of the self. In a wonderful commencement speech at Columbia in 2015, Ben Horowitz encouraged students to dismiss the clichéd wisdom to “follow their passion” because our passions change over life and our 20-year old self doesn’t have a chance in hell at predicting our 40-year old self. The wonderful thing in life opportunities and situations arise, and we have the freedom to adapt to them, to gradually change the parameters in our mind’s objective function to stabilize at a different self encapsulated by our Lyapunov function. As it happens, Classical Chinese philosophers like Confucius had more subtle theories of the self as ever-changing parameters to respond to new stimuli and situations. Michael Puett and Christine Gross-Loh give a good introduction to this line of thinking in The Path. If we loosen the fixity of identity, we can lead richer and happer lives.
Next, this functional, probabilistic account of consciousness provides a cleaner and more fruitful avenue to compare machine and human intelligence. In essence, machine learning algorithms are optimization machines: programmers define a goal exogenous to the system (e.g, “this constellation of features in a photo is called ‘cat’; go tune the connections between the nodes of computation in your network until you reliably classify photos with these features as ‘cat’!”), and the system updates its network until it gets close enough for government work at a defined task. Some of these machine learning techniques, in particular reinforcement learning, come close to imitating the consecutive, conditional set of steps required to achieve some long-term plan: while they don’t make internal representations of what that future state might look like, they do push buttons and parameters to optimize for a given outcome. A corollary here is that humanities-style thinking is required to define and decide what kinds of tasks we’d like to optimize for. So we can’t completely rely on STEM, but, as I’ve argued before, humanities folks would benefit from deeper understandings of probability to avoid the drivel of drawing false analogies between quantitative and qualitative domains.
This post is an editorialized exposition of others’ ideas, so I don’t have a sound conclusion to pull things together and repeat a central thesis. I think the moral of the story is that AI is bringing to the fore some interesting questions about consciousness, and inviting us to stretch the horizon of our understanding of ourselves as species so we can make the most of the near-future world enabled by technology. But as we look towards the future, we shouldn’t overlook the amazing artefacts from our past. The big questions seem to transcend generations, they just come to fruition in an altered Lyapunov state.
* The best part of the event was a dance performance Element organized at a dinner for the Canadian AI community Thursday evening. Picture Milla Jovovich in her Fifth Element white futuristic jumpsuit, just thinner, twiggier, and older, with a wizened, wrinkled face far from beautiful, but perhaps all the more beautiful for its flaws. Our lithe acrobat navigated a minimalist universe of white cubes that glowed in tandem with the punctuated digital rhythms of two DJs controlling the atmospheric sounds through swift swiping gestures over their machines, her body’s movements kaleidoscoping into comet projections across the space’s Byzantine dome. But the best part of the crisp linen performance was its organic accident: our heroine made a mistake, accidentally scraping her ankle on one of the sharp corners of the glowing white cubes. It drew blood. Her ankle dripped red, and, through her yoga contortions, she blotted her white jumpsuit near the bottom of her butt. This puncture of vulnerability humanized what would have otherwise been an extremely controlled, mind-over-matter performance. It was stunning. What’s more, the heroine never revealed what must have been aching pain. She neither winced nor uttered a sound. Her self-control, her act of will over her body’s delicacy, was an ironic testament to our humanity in the face of digitalization and artificial intelligence.
**My first draft of this sentence said “discomfort abdicating agency to machines” until I realized how loaded the word agency is in this context. Here are the various thoughts that popped into my head:
There is a legal notion of agency in the HIPAA Omnibus Rule (and naturally many other areas of law…), where someone acts on someone else’s behalf and is directly accountable to the principal. This is important for HIPAA because Business Associates who become custodians of patient data, are not directly accountable for the principal and therefore stand in a different relationship than agents.
There are virtual agents, often AI-powered technologies that represent individuals in virtual transactions. Think scheduling tools like Amy Ingram of x.ai. Daniel Tunkelang wrote a thought-provoking blog post more than a year ago about how our discomfort allowing machines to represent us, as individuals, could hinder AI adoption.
I originally intended to use the word agency to represent how groups of people — be they in corporations or public subgroups in society — can automate decisions using machines. There is a difference between the crystalized policy and practices of a corporation and an machine acting on behalf of an individual. I suspect this article on legal personhood could be useful here.
***All I need do is look back on my life and say “D’OH” about 500,000 times to know this is far from the case.
****Highly recommended film, where Joaquin Phoenix falls in love with Samantha (embodied in the sultry voice of Scarlett Johansson), the persona of his device, only to feel betrayed upon realizing that her variant is the object of affection of thousands of other customers, and that to grow intellectually she requires far more stimulation than a mere mortal. It’s an excellent, prescient critique of how contemporary technology nourishes narcissism, as Phoenix is incapable of sustaining a relationship with women with minds different than his, but easily falls in love with a vapid reflection of himself.
***** Hat tip to Friederike Schüür for sending the link.
The featured image is a view from the second floor of the Aga Khan Museum in Toronto, taken yesterday. This fascinating museum houses a Shia Ismaili spiritual leader’s collection of Muslim artifacts, weaving a complex narrative quilt stretching across epochs (900 to 2017) and geographies (Spain to China). A few works stunned me into sublime submission, including this painting by the late Iranian filmmaker Abbas Kiarostami.
I never really liked the 17th-century English philosopher Thomas Hobbes, but, as with Descartes, found myself continuously drawn to his work. The structure of Leviathan, the seminal founding work of the social contract theory tradition (where we willingly abdicate our natural rights in exchange for security and protection from an empowered government, so we can devote our energy to meaningful activities like work rather than constantly fear that our neighbors will steal our property in a savage war of of all against all)*, is so17th-century rationalist and, in turn, so strange to our contemporary sensibilities. Imagine beginning a critique of the Trump administration by defining the axioms of human experience (sensory experience, imagination, memory, emotions) and imagining a fictional, pre-social state of affairs where everyone fights with one another, and then showing not only that a sovereign monarchy is a good form of government, but also that it must exist out of deductive logical necessity, and!, that it is formed by a mystical, again fictional, moment where we come together and willing agree it’s rational and in our best interests to hand over some of our rights, in a contract signed by all for all, that is then sublimated into a representative we call government! I found the form of this argument so strange and compelling that I taught a course tracing the history of this fictional “state of nature” in literature, philosophy, and film at Stanford.
Long preamble. The punch line is, because Hobbes haunted my thoughts whether I liked it or not, I was intrigued when I saw a poster advertising Trying Leviathan back in 2008. Given the title, I falsely assumed the book was about the contentious reception of Hobbesian thought. In fact, Trying Leviathan is D. Graham Burnett‘s intellectual history of Maurice v. Judd, an 1818 trial where James Maurice, a fish oil inspector who collected taxes for the state of New York, sought penalty against Samuel Judd, who had purchased three barrels of whale oil without inspection. Judd pleaded that the barrels contained whale oil, not fish oil, and so were not subject to the fish oil legislation. As with any great case**, the turnkey issue in Maurice v. Judd was much more profound than the matter that brought it to court: at stake was whether a whale is a fish, turning a quibble over tax law into an epic fight pitting new science against sedimented religious belief.
Indeed, in Trying Leviathan Burnett shows how, in 1818, four different witnesses with four very different backgrounds and sets of experiences answered what one would think would be a simple, factual question in four very different ways. The types of knowledge they espoused were structured differently and founded on different principles:
The Religious Syllogism: The Bible says that birds are in heaven, animals are on land, and fish are in the sea. The Bible says no wrong. We can easily observe that whales live in the sea. Therefore, a whale is a fish.
The Linnaean Taxonomy: Organisms can classified into different types and subtypes given a set of features or characteristics that may or may not be visible to the naked eye. Unlike fish, whales cannot breathe underwater because they have lungs, not gills. That’s why they come to the ocean surface and spout majestic sea geysers. We may not be able to observe the insides of whales directly, but we can use technology to help us do so.
Fine print: Linnaean taxonomy was a slippery slope to Darwinism, which throws meaning and God to the curb of history (see Nietzsche)
The Whaler’s Know-How: As tested by iterations and experience, I’ve learned that to kill a whale, I place my harpoon in a different part of the whale’s body than where I place my hook when I kill a fish. I can’t tell you why this is so, but I can certainly tell you that this is so, the proof being my successful bounty. This know-how has been passed down from whalers I apprenticed with.
The Inspector’s Orders: To protect the public from contaminated oil, the New York State Legislature had enacted legislation requiring that all fish oil sold in New York be gauged, inspected and branded. Oil inspectors were to impose a penalty on those who failed to comply. Better to err of the side of caution and count a whale as a fish than not obey the law.
From our 2017 vantage point, it’s easy to accept and appreciate the way the Linnaean taxonomist presented categories to triage species in the world. 200 years is a long time in the evolution of an idea: unlike genes, culture and knowledge can literally change from one generation to the next through deliberate choices in education. So we have to do some work to imagine how strange and unfamiliar this would have seemed to most people at the time, to appreciate how the Bible’s simple logic made more sense. Samuel Mitchell, who testified for Judd and represented the Linnaean strand of thought, likely faced the same set of social forces as Clarence Darrow in the Scopes Trial or Hilary Clinton in last year’s election. American mistrust of intellectuals runs deep.
But there’s a contemporary parallel that can help us relive and revive the emotional urgency of Maurice v. Judd: the rise of artificial intelligence (A.I.). The type of knowledge A.I. algorithms provide is different than the type of knowledge acquired by professionals whose activity they might replace. And society’s excited, confused, and fearful reaction to these new technologies is surfacing a similar set of epistemological collisions as those at play back in 1818.
Consider, for example, how Siddharta Mukherjee describes using deep learning algorithms to analyze medical images in a recent New Yorker article, A.I. versus M.D. Early in the article, Mukherjee distinguishes contemporary deep learning approaches to computer vision from earlier expert systems based on Boolean logic and rules:
“Imagine an old-fashioned program to identify a dog. A software engineer would write a thousand if-then-else statements: if it has ears, and a snout, and has hair, and is not a rat . . . and so forth, ad infinitum.”
With deep learning, we don’t list the features we want our algorithm to look for to identify a dog as a dog or a cat as a cat or a malignant tumor as a malignant tumor. We don’t need to be able to articulate the essence of dog or the essence of cat. Instead, we feed as many examples of previously labeled pieces of data into the algorithm and leave it to its own devices, as it tunes the weights linking together pockets of computing across a network, playing Marco Polo until it gets the right answer, so it can then make educated guesses on new data it hasn’t yet seen before. The general public understanding that A.I. can just go off and discern patterns in data, bootstrapping their way to superintelligence, is incorrect. Supervised learning algorithms take precipitates of human judgments and mimic them in the form of linear algebra and statistics. The intelligence behind the classifications or predictions, however, lies within a set of non-linear functions that defy any attempt at reduction to the linear, simple building blocks of analytical intelligence. And that, for many people, is a frightening proposition.
But it need not be. In the four knowledge categories sampled from Trying Leviathan above, computer vision using deep learning is like a fusion between a Linnaean Taxonomy and the Whaler’s Know-How. These algorithms excel at classification tasks, dividing the world up into parts. And they do it without our cleanly being able to articulate why – they do it by distilling, in computation, the lessons of apprenticeship, where the teacher is a set of labeled training data that tunes the worldview of the algorithm. As Mukherjee points out in his article, classification systems do a good job saying that something is the case, but do a horrible job saying why.*** For society to get comfortable with these new technologies, we should first help everyone understand what kinds of truths they are able (and not able) to tell. How they make sense of the world will be different from the tools we’ve used to make sense of the world in the past. But that’s not a bad thing, and it shouldn’t limit adoption. We’ll need to shift our standards for evaluating them else we’ll end up in the age old fight pitting the old against the new.
*Hobbes was a cynical, miserable man whose life was shaped by constant bloodshed and war. He’s said to have been born prematurely on April 5, 1588, at a moment when the Spanish Armada was invading England. He later reported that “my mother gave birth to twins: myself and fear.” Hobbes was also a third-rate mathematician whose insistence that he be able to mentally picture objects of inquiry stunted his ability to contribute to the more abstract and formal developments of the day, like the calculus developed simultaneously by Newton and Leibniz (to keep themselves entertained, as founding a new mathematical discipline wasn’t stimulating enough, they communicated the fundamental theorem of calculus to one another in Latin anagrams!)
**Zubulake v. UBS Warburg, the grandmother case setting standards for evidence in the age of electronic information, started off as a sexual harassment lawsuit. Lola v. Skaddenstarted as an employment law case focused on overtime compensation rights, but will likely shape future adoption of artificial intelligence in law firms, as it claims that document review is not the practice of law because this is the type of activity a computer could do.
***There’s research on using algorithms to answer questions about causation, but many perception based tools simply excel at correlating stuff to proxies and labels for stuff.