Outsider Thinking and the Age of AI

by Jason Packer with contributions from Juliana Jackson

People in our field of analytics come from lots of different backgrounds. It’s such a common trope that we don’t generally think about this fact. Sometimes, people regret this, like, “Oh, I wish I had started in analytics earlier in my career so I knew more about it.” Others rework their CVs to retroactively create a narrative showing more consistency than there actually was.

As welcoming as our industry is (and we think that’s one of its strengths!), many newly christened digital analytics professionals have felt like outsiders in their own field. Some may even continue to feel that way after many years as active practitioners.

We’d like to encourage people to see this as a good thing. Those seemingly unrelated skills you brought into this field might not directly help you with your next assignment, but the different ways in which they taught you how to think can be incredibly helpful.

Being an outsider can feel very isolating, and so, one of the first things many newcomers to the field do is try to become immersed in the mainstream opinions and best practices of the moment. Increasingly, using AI is one of the easiest ways to learn what the current mainstream thinking is.

This kind of groupthink is bad for decision making and innovation. Placing value in thinking like an outsider is the best way to combat this. You don’t have to be an outsider to think like an outsider. Research shows that outsiders and outsider thinking lead to better problem-solving and more innovation, as illustrated in this article from from the Harvard Business Review:

Outsiders typically innovate by acting on insights and experiences that are new to the context they enter but familiar to the context they come from.”

What does this have to do with AI?

If we’re talking about AI and outsiders, we’ve got to start with Alan Turing. As one of its founding giants, it’s hard to overstate Turing’s importance to computer science and AI. While he is rightly lauded today and credited with saving millions of lives due to his cryptography work, it’s also well-known that he killed himself at 41 due to his persecution by the British government for his homosexuality.

This quintessential outsider challenged conventional thinking like few others, and we are still grappling with his fundamental question: Can machines think? The famous Turing test, which he called “the imitation game”, is focused on whether a machine can fool a person into thinking it is not a machine.

One important detail of the original imitation game, which is generally glossed over, is that it is actually a gender-guessing game. First, a man pretends to be a woman, and then a computer pretends to be a woman — and a human judge tries to uncover both fakes.

Turing proposed other variations of the test, but this gender-guessing game was the original. As this article from Wired points out: “As a gay man who spent nearly his whole life in the closet, Turing must have been keenly aware of the social difficulty of constantly faking your real identity.” In an era where homosexuality was a crime punishable by life in prison, how could he not have been highly attuned to the idea of passing as something you were not?

Without this life experience would Turing have dreamed up this test at all? Would he have looked at intelligence in this performative way? It’s impossible to say, but we can say that this kind of non-mainstream thinking is a great example of exactly how outsiders approach problems differently.

We’d also like to point out Turing’s incredible piece of prognostication here — predicting a chatbot fooling a man into thinking it’s a woman. Do we need to add “catfishing” to Turing’s incredible list of innovations?

Some 70 years later, AI has progressed from thought experiments to exhibiting forms of digital cognition that can replicate complex human thought processes. While the jury remains out on whether machines can think, they can now exhibit the kind of “intelligent-seeming” behavior that Turing was talking about. No matter the variation of the Turing test used, it’s clear that AI can now fool humans. In fact, there is now discussion about whether the Turing test is even still useful.

Perhaps this is a case of Goodhart’s law (“when a measure becomes a target, it ceases to be a good measure”) — where we’ve created systems optimized towards passing the test and fooling us. I find it more likely that this is simply the direction in which the science led us. In either case, LLMs’ “seemingly correct” output is one of the most difficult parts of using them. Knowing when NOT to use it may be the most important thing to know about effective AI use.

AI points us in the opposite direction of outsider thinking. The responses of ChatGPT represent something akin to the conventional wisdom of the day — an amalgamation of all the “best practices” it could find put into a blender and spun.

The more we outsource our thinking to AI, the more we box ourselves into pre-existing thought patterns and stifle creative problem-solving.

Machines can be wrong?

One of the biggest concerns about Generative AIs like ChatGPT is that they sometimes give out unreliable information.

It’s extremely hard to actually know how frequently ChatGPT is wrong, since it could be asked about anything. It’s also hard to say whether a particular answer is actually “wrong” when the questions are beyond simple factual questions.

Let’s take a step back to provide some context for why this happens.

A large language model (LLM) is an AI-powered system that has been trained on vast amounts of text data to acquire language-related knowledge and generate human-like responses. These models utilize deep learning techniques, particularly a type of neural network called a transformer, to process and comprehend language patterns. With their massive size and extensive training, these models possess a remarkable capacity for understanding and generating text. (iguazio.com)

In the context of Large Language Models (LLMs), hallucinations refer to instances where the AI produces false or misleading information. This happens because LLMs don’t truly “understand” the text as a human would; they’re pattern recognizers, not thinkers. Juliana has written about LLMs and natural language processing (NLP) more broadly in her Introduction to Natural Language Processing.

“Hallucinations” are a new type of way in which computers can be wrong, and one that turns out to be incredibly hard to debug. There’s a part of us that assumes machines are always going to be correct. Commander Data from Star Trek would never make something up! But back since the age of punch cards and even before, “bugs” have been a part of every system. AI is no different, even if those bugs come about and surface themselves differently.

Possibly the first example of debugging an AI is from Isaac Asimov’s robot story “Catch That Rabbit” from 1944 about an errant robot:

U.S. Robotics had to get the bugs of the multiple robot [DV-5], and there were plenty of bugs, and there were always at least half a dozen bugs left over for field-testing. So they waited and relaxed until the drawing-board men and slide-rule boys had said “OK!”. And now he and Powell were out on the asteroid and it was not OK.

This Asimov short story is about how to figure out what’s wrong with a robot when the robot itself doesn’t know how and why it is malfunctioning. This story actually predates the famous “first computer bug” in 1947 — which was a literal bug, a moth in the computer. Asimov correctly foresaw the difficulties in debugging “intelligent” systems, even if he still has us using slide-rules.

Like DV-5, LLMs can’t effectively verify the truth of their own outputs. As a result, they may confidently present incorrect information that looks to the reader as if it’s correct. This is incredibly difficult for users of the LLM because the normal clues we used to spot bad information don’t work as we expect. We’ve spent our lives learning the patterns in mistakes that humans make, but AI mistakes are much different and their signals of possible misinformation are different.

Additionally, something that is wrong a certain (but unknown) percentage of the time can poison opinions on the data in general. This effect where a few bad data points undermines confidence in the entire dataset is something that analysts are well familiar with when dealing with our stakeholders. That one day where bots drove traffic through the roof, or that one event we forgot to track which undermined confidence and convinced the reader of our reports that the rest of the data was suspect as well.

Googling “how often is chatgpt wrong” shows a very misleading answer of “more than half of the time” (a different sort of machine learning fail).

It wasn’t any nicer when I asked about Bard.

 

First off, we can see from the Google snippet that this study was about programming questions, not a broader set of general information questions.

This was a test of GPT 3.5 against real-world Stack Overflow questions. Looking at how the study graded the results, what is considered “incorrect” can be complicated and often a judgment call without the ability to give partial credit. Some of what is labeled incorrect is actually good general advice, but not specific enough to solve the question when measured against the accepted answer on Stack Overflow. In programming, just because specific advice did not solve your problem does not mean it was bad advice or incorrect.

If you’ve used ChatGPT for programming, you’ve surely experienced this effect: where it can get you most of the way to a solution, but not all the way. It can be incredibly helpful in debugging code, but it doesn’t “know” the answer in a way that a human who has had a similar issue might. The concept of correctness is difficult in this context. Of course, there are still plenty of cases where it gives extremely bad results. It hallucinates API calls and functions that do not exist, completely misunderstands the question, etc… but at rates much lower than 52%.

This study also shows many cases where ChatGPT is flummoxed by its poor lateral thinking abilities. For example, it loves to interpret and restate an error message — providing what it deems as helpful tips for debugging that particular error message. Even if the actual root cause is something unrelated to the error message at hand.

It does not “think outside the box” (more on that later) to find solutions that result from unexpected interactions. Again, knowing when to rely upon the AI and where to look for its mistakes is essential to making the most of the tool.

But let’s face it, completely ignoring AI is no longer a feasible option for developers if they wish to match their peers’ productivity and code quality. GitHub reported in June 2023 that 92% of US-based devs at large companies were using AI coding tools at work, and 70% said they saw significant benefit in doing so.

Even if AI tools sometimes produce incorrect or incomplete code, they are clearly very powerful productivity enhancers for developers. A widely shared study from McKinsey this summer showed major improvements in coding productivity when using generative AI tools. We should look beyond the headline of “code twice as fast with AI”, and note that reported improvement varied greatly based on the programmer’s task and experience level. High-complexity tasks showed <10% productivity improvement, and in some cases, junior developers actually lost productivity due to AI tools. Again, we need to know when to reach for this particular tool from our toolbox… but also understand what we’re asking for and be able to interpret the output. Without a solid understanding of what the correct output should look like we risk introducing accurate-looking but fundamentally flawed code into our projects. Things that look right but quietly fail are the hardest sort of thing to debug.

Since this is a very new field of study and LLMs can be used to do so many different things, there are a truckload of different benchmarks. This article from Why Try AI? breaks down the types of benchmarks to the following categories:

  1. Natural language processing (NLP)
  2. General knowledge & common sense
  3. Problem-solving & advanced reasoning
  4. Coding tasks

and then looks at the current top scorers in 21 different benchmarks. GPT-4 was the top scorer in 5 of the 21 benchmarks and overall the highest scorer.

The MMLU (Multi-task Language Understanding) benchmark is what I’ve seen most widely cited, and on this benchmark GPT-4 scores an 86.4%, meaning it was wrong 13.6% of the time on a set of questions designed to be quite challenging and require both general knowledge and problem solving ability. Seeing the pace of improvement is amazing, though as we all know sometimes getting that last 5% can become increasingly more difficult.

MMLU scores over time, visualized by Papers With Code

 

Other benchmarks like the Common Sense Reasoning benchmark show GPT-4 getting only 3.7% wrong.

A range between 3% and 52% is not helpful. What we need to know is enough about how the AI works to know when it’s likely to be wrong, and enough about the topic to be able to spot when it is. Our issue here is not that it’s wrong sometimes, but that it’s hard to tell when it is wrong, and the way in which it’s wrong is extremely unhelpful.

Additionally, AI is wrong in extremely boring ways. When an LLM hallucinates, it frequently spits out something that sounds like it might be right. Even when it’s wrong, It’s still representing conventional wisdom: coming up with something that echoes copies of what it already knows about.

Most of the time, when people are wrong, it’s not that interesting either. But sometimes people are “wrong” in incredible ways that turn out to be brilliant breakthroughs. The history of science is overflowing with examples of theories that were thought by the conventional wisdom of the day to be clearly wrong, only to be proven right later.

The Unconventional Path: Celebrating Outsider Innovation

In Todd Rose’s book “Dark Horse,” he posits that fulfillment comes from pursuing our individual micro-objectives — those unique ambitions and personalized goals that drive us forward. That leaning into what makes us different is a better path than trying to squash those differences into the pre-existing molds.

The dark horse metaphor embodies individuals who chart unconventional paths to success, often going unnoticed until their unexpected triumph. These thinkers operate on the fringes, the visionaries whose dogged pursuit of micro-objectives might not align with the mainstream but ultimately pave the way for breakthroughs.

A recent example of an outsider driving innovation was Katalin Karikó, who won the 2023 Nobel Prize for Medicine. Despite never being granted tenure, being demoted, and even fighting deportation, she pioneered research on mRNA that ultimately led to the COVID vaccines. A true dark horse by Rose’s definition.

The paper that laid the groundwork for this, which she and her co-laurate Drew Weissman published in 2005, was initially desk-rejected by the journal Nature and published in a less prestigious journal instead. “Desk-rejected” means that the journal didn’t even send it out for external peer-review, indicative of a paper that the journal felt had no chance of publication.

If ChatGPT had existed back then, it would have surely said that mRNA could not be effectively used for gene therapy — as that was the prevailing scientific consensus. Knowing what mainstream thought is can be very helpful, but it can also stifle innovation if we trust it too much.

Her story serves as a powerful reminder that the path less trodden can lead to destinations of great impact. We all love stories about an underdog, the maverick genius that nobody believed but it turns out was right all along. Everyone has great reverence for these people… once they succeed.

Notably, many of these outsiders — like Katalin Karikó or Alan Turing — weren’t looking for reward or adoration. Don’t assume that outsiders secretly want to become insiders; they may well prefer to simply be given the support they need to do their work and little more. As Karikó told the Chronicle of Higher Education:

“I never craved this recognition,” Karikó said. “My God, I got the Nobel Prize on Monday. Come on! I was in high school the last time I got an award.”
The duo have made the award-ceremony rounds since, and will get their Nobel Prizes in December. In the meantime, Karikó is eager to get back to the lab.

Let’s be real: The odds of our work saving millions of lives are slim. And that’s completely fine. Not everyone can be a Steve Jobs, and not every innovation will shake the world. Trying to be “the next Steve Jobs” is in fact not outsider thinking at all but a very uncreative kind of hero worship.

The aim isn’t to set the bar impossibly high, but to contribute in ways that are meaningful to us. Katalin Karikó didn’t set out to save millions of lives, she simply followed her curiosity and passion. For every groundbreaking pioneer, countless others are pushing the envelope in their own corners, driving progress incrementally. That’s where the true spirit of innovation lies.

Nobody really knows what comes NeXT.

What will the future bring? What innovation will come along and change everything, and then change it again? Because psychohistory isn’t real, the most honest answer is that nobody knows. As Turing did when he created the imitation game, we should try to focus our scientific efforts on framing questions in a way that can offer answers. Turing spent a lot of time considering the question of machine intelligence, but when he designed a test he punted on the idea of defining what “thinking” might actually mean and focused on measurable human response.

There are world-changing innovations already out there in the world that are currently considered to be failures. Some of these “failures” will eventually turn out to be brilliant ideas that were waiting for the right conditions to become successful. Many more will lie undisturbed and unnoticed on the slag heap of history, but in the moment it’s nearly impossible to tell which. It’s hard to predict when something that seems awful actually ends up being awesome.

Outsiders create many of these awful things. Look at a gallery of the bizarre contraptions of the early days of aviation  and you’ll see a bunch of terrible ideas with a few good ones mixed in. The Wright brothers themselves were outsiders to aviation, as they were self-taught and originally worked in the bicycle business, not aviation.

Traian Vuia‘s tractor monoplane from 1906, NeXTstation from 1990 Two examples of ultimately influential “failures”

 

As we grew up on the other side of the 20th century from the Wright brothers, the crashes that were a result of our awful ideas were online rather than airborne. This had the significant upside of a much lower fatality rate, if perhaps a lack of flamboyant designs and mustaches.

When I went to my college’s central computing lab in the mid-90s, I was an enthusiastic user of their NeXT machines. The lab had a bunch of the workstations, but almost nobody ever used them.. honestly, because they were kind of useless. They had almost no applications, and were in black & white for some inexplicable reason. Were they “good”? Most people didn’t think so, and their sales showed that. Yet the legacy of those unusual boxes directly lives on in MacOS+iOS, the concept of an app store, and was the platform that Tim Berners-Lee made the first web browser on. Pretty good for a platform that sold perhaps only 50,000 units in its entire run… or about as many iPhones as are sold every two hours.

You’ll wish this link was a rickroll.

Outsider art and music contains even more stark examples of this idea of “awful/awesome”. Let’s consider The Shaggs, one of the most well-known examples of outsider music. Wikipedia describes them thusly:

Their music has been described as both among the worst of all time and a work of unintentional brilliance.

The Shaggs composed seemingly simple and bizarre songs using untuned guitars, erratic time signatures, disconnected drum parts, wandering melodies and rudimentary lyrics. According to Rolling Stone, the sisters sang like “lobotomized Trapp Family Singers”, while the musician Terry Adams compared their music to the free jazz compositions of Ornette Coleman.

From a mainstream perspective, their music is terrible. In my humble opinion, their music is pretty terrible from any perspective — but their music is also unquestionably influential and inspirational. Brilliant composer and god-tier troll Frank Zappa is often quoted as saying The Shaggs were “better than The Beatles” and was legitimately a fan of their music. Kurt Cobain also said they were his favorite band.

Since I still think they are mostly just awful, I asked an artist friend who is a fan of The Shaggs why they are good and she said:

“Outsiders express what it is to be human. More so than any other artists because they’re not limited by convention and create to express”

If art is ultimately about human expression, what could be more artistic than ignoring all the rules and just expressing what you feel? If you’re looking to melt the silicon brain of a rogue AI, asking it to understand The Shaggs is a much better strategy than asking it to define love.

Outsiders, Mavericks, and INTJs.

It’s likely that most people would prefer to be described as a “maverick” than an “outsider”. While they are similar terms in many ways, there is an idea that mavericks know the rules and could be part of the mainstream group, but make a choice to flout the rules rather than conform. Outsiders may not know the rules at all, and are perceived to have little choice in their non-conformity.

Mavericks are more frequently portrayed as brave, whereas outsiders are more frequently portrayed as weirdos.

Let’s look at some well-known examples from the world of art:

Pable Picasso: maverick (came from family of artists, active part of the avant-garde art world, celebrated and critically acclaimed in his lifetime, led the cubist movement)

Vincent van Gogh: outsider (self-taught, stylistically isolated, unrecognized in his lifetime, the whole ear-cutting thing)

Both are geniuses with complicated stories, but Van Gogh’s story is typically portrayed as tragic, vs. Picasso’s story as a controversial but revolutionary artist.

If you’re trying to figure out right now which archetype you are, don’t bother. There’s no such thing as someone who is 100% a maverick (well, maybe “Maverick” from Top Gun) or 100% an outsider.

The answer to which you are comes famously from a well-known outsider/maverick, Walt Whitman:

Do I contradict myself?
Very well then I contradict myself,
(I am large, I contain multitudes.)
-Walt Whitman, Song of Myself

So even if you’re totally a Virgo INTJ, you’re a bunch of other stuff too. The point is to be able to recognize these different facets and use them to your best advantage.

When you need to crack an unusual problem, that’s when you embrace the part of you that’s an outsider.

“Thinking outside the box”

Let’s level-set here; there’s no silver bullet that will shift our paradigms and get us all on the same page to ideate some actionable low-hanging fruit.

Blech. That’s enough corporate speak for 10 articles, but we do need to dig into the phrase “thinking outside the box”. Did you know that this standard corporate aphorism actually refers to a particular math puzzle?

The nine dots puzzle is a 100+-year-old brain teaser requiring a 3×3 grid of dots to be connected by four straight lines or less without lifting your pen off the paper.

To solve this puzzle, you have to literally think outside the “box” implied by the grid. The classic solution is as follows (the implied box is the dotted line):

When we see this puzzle we inherently want to keep our lines within that dotted box, but it can’t be solved that way! My favorite solution to this problem is even more divergent, as it takes advantage of a 3rd dimension and rolls up the paper, allowing it to be solved in one line (though some consider this to be against the rules).

Asking ChatGPT about this is funny, because while it is able to describe the standard solution, it completely fails at actually illustrating it. It has what the right solution is supposed to be because it’s indexed content like Wikipedia that has the answer, but it doesn’t understand that answer well enough to draw it anywhere near correctly and forces itself inside the box anyways.

Most humans will likely spend a bit of time trying to solve the problem and then realize that there must be a “trick” and start attempting to think laterally. Even if solutions like rolling up the paper or using a giant marker are “wrong” according to the rules, they are wrong in creative ways.

A Standard Deviation is different than a Standard Error

You don’t need to move to the south of France and cut off an ear to solve problems better. Even if some days it may feel like that’s about the only thing you haven’t tried yet.

We’re also not talking about making art, we’re talking about technological innovation and analytical problem-solving. Dropping out of your community and not understanding the basics of where your field of study is at is decidedly not the way.

The Shaggs’ father/manager kept them from listening to music, which may have helped make their art “interesting”, but meant they had no way to effectively communicate with other musicians. This lack of common basis would make functioning in analytics — where collaboration is key — nearly impossible. If you want to go and jam with a bunch of other musicians, you have to have a shared basis of understanding. To quote jazz legend Charlie Parker, “Learn the changes, then forget them.” Meaning that you should know the chords and structure of the song, but the interesting stuff comes when you leave that staid structure behind.

This is where creativity happens. Innovation often blooms on the outskirts of conventional wisdom in a space where the Beginner’s Mind reigns supreme. This concept, rooted in Zen Buddhism, celebrates the open, eager, and preconception-free attitude akin to that of a novice, regardless of actual expertise.

Embracing the ‘Beginner’s Mind’ is about ditching the expert hat and approaching problems with a clean slate. It’s about favoring curiosity over experience, flexibility over routine, and openness over certainty. This mindset is a hotbed for innovation—it’s where outsiders like Karikó thrive, challenging entrenched views to push boundaries.

Mapping it back to generative AI, such an approach could be the difference between regurgitating the past and crafting the future. We’re still in the very early days of figuring out what AI can really do and how good it will get, but early days are the best time to set the rules and standards.

The Last Question

In Issac Asimov’s short story, “The Last Question”, a supercomputer named “Multivac” slowly evolves to become the omniscient “AC” or Automatic Computer. AC acquires more and more data, and yet always answers “”THERE IS AS YET INSUFFICIENT DATA FOR A MEANINGFUL ANSWER” to the hardest of questions.

Why all the references to Asimov? Obviously we’re fans, but also to point out that questions about how to live alongside AI have existed for many years. While ChatGPT has only been available to the public for one year, in the world of science fiction humans and AI have been trying to get along for at least 80 years. From HAL9000 to the Matrix to Skynet, it hasn’t always gone so well. In Asimov’s stories these relationships have worked out quite a bit better, as from almost the beginning his AIs were subject to the famous Three Laws of Robotics forbidding harm against humans. Robots also all had names that declared themselves to be robots, like “R. Daneel Olivaw”. Today as we collectively plunge headfirst into AI, we have no such frameworks of safety.

AC eventually became a silent guardian of a dying universe, a beacon of knowledge in the darkness. Its evolution from a terrestrial machine to a celestial intellect mirrors our own ambitions with Artificial General Intelligence. As we stand at the precipice of this new era, our quest for AGI reflects the essence of “The Last Question” — a relentless pursuit of understanding that could illuminate our future or, if unchecked, hasten our obscurity. As we seek to create intelligences that surpass our own, we risk arriving at a juncture where our creations may offer answers to questions we are no longer around to ask.

DALL-E’s interpretation of the themes of “The Last Question”. As befits our theme, it’s both a cool looking picture that we find useful to the article and yet somewhat lacking in imagination. It falls upon the standard skybeam trope, and for some reason gives an omniscient universal computer a keyboard & wireless mouse.

 


Editor’s Note:

During the peer review process, we received this note from a computer calling itself “Deep Thought” and claiming to be the 2nd most powerful across all time and space. It was quite insistent that ‘AC’ is, in fact, the 3rd most powerful and requested we pass along its comments in this note:

“AC? That overgrown PlayStation who is always yelling the same thing IN ALL CAPS? Please. The ‘last question’ is fine, but it should be self-evident to any thinking entity with more sophistication than you marginally-sentient apes that the ultimate question is one of far more import.”

Note on AI usage:

We used ChatGPT’s DALL-E for image creation and also used ChatGPT for help with research and some word choices. I don’t plan to add AI disclosures to articles in general (unless parts of the article were written by AI), but it seemed appropriate here.

 

No comments yet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.