A machine way of thinking — the coming algorithmic apocalypse

“The target of the Jihad was a machine-attitude as much as the machines,” Leto said. “Humans had set those machines to usurp our sense of beauty, our necessary selfdom out of which we make living judgments. Naturally, the machines were destroyed.” — Frank Herbert “Dune” [The Bulterian Jihad]

We are at a point in computing when the sum-total of all communication that goes on within organisations is recorded. The mathematics that would allow a machine to start to do things with that communication-as-data have been around for a long time, and innovations around semantic computing have come along.

At the same time we are seeing long term squeeze on costs within organisations under the banner of maximising shareholder value and increasing organic cash-flow. One of the easiest ways to do this is to reduce your headcount, or to put easily described work, such as help desks, support, operations, manufacturing and programming outside of the organisation, quite often where the workers are cheaper to employ.

I am constantly giving recommendations to intranet teams that are fundamentally recognise that they don’t have enough people to do the quality and quantity of work that they need. There is a usual lack of resources to, for example:

  • Optimise content for search
  • Work on projects while maintaining operations
  • Help people make sure their content is relevant and is up to date
  • Eat lunch or go to the loo

Meanwhile the perceived value in the corpus of that communications-as-data is building up: implicity, as in the enterprise social network; and explicitly, where the employee is expected to fill in their personal profile so people can find their expertise if it should be required.

The loose field of “social analytics” seeks to unlock this potential in creating connections and insights between people, places and things with mathematics. Some examples:

  • Consider automating classifications and information architecture using aggregate usage and semantic analysis.
  • Consider automating the extraction of people’s expertise from what they say and do, rather than what they say they do.
  • Consider improving search results by retrieving context from the user’s history of interest and discussions with people.
  • Consider automatically measuring people’s sentiments about a range of topics. Are people positive or negative about things?

This all sounds great, I’ll buy a pile of it. If this works I can make the right content appear automagically in from of the right people. What’s not to like. Well, the important bit is “if it works”. The problem is that it might make things worse.

Search as a minor example

In the intranet world we already have a chaotic algorithm making our lives a misery: Enterprise search. It sounds like a simple thing to do. Here is the corpus, index it and make it available so when someone wants something you show it to them. This is a classic wicked problem. The corpus of course is created and maintained by fallible, disparate people who are doing their own thing. They wish to publish a piece of content for immediate reasons, and discount its value to the corpus. Metadata is not set and context such as titles are purely local. When someone searches for something they get poor results. The intranet manager says: “We are getting rubbish results, let’s try tidying things up a bit.” That doesn’t really work, so they try creating best-bets manually and mucking about with weighting. That is only partially successful because the number of potential queries is vast (although the number of common queries is few), the numbers of texts in the corpus is massive, and at the heart of it is an algorithm that nobody apart from its developer really understands. At the core of a search engine is a complex bit of mathematics that provides the results. It is chaotic because for the layperson, changes to the content or the search engine do not have direct consequences on the results. They change a little and in ways that has possibly negative effects on different queries.

Any attempts to make the algorithms better has only increased the pain. Go find an intranet manager who has had to wrestle with Autonomy and ask them about it. It will be described in robust Anglo-Saxon. Semantic it might be, but try and get the pensions page up when the query is “pensions” is a non-trivial task. Intranet managers lumbered with Google Search Appliance have virtually no access to the mechanism of search, and are given little indication as to how to influence it. It an algorithmic black box shrouded in IP.Make it work like Google is the cry of the stakeholder, ignoring the fact there is a billon dollar business in Search Engine Optimisation attempting to reverse engineer Google and other search engines, and game the system end to end.

The problem is that the mathematics within these engines are not able to be communicated to those charged with looking after them. Borked search is a minor example – it only hurts organisations a little bit.But this isn’t natural mathematics. It doesn’t work like physics, determined by the characteristics of the natural world.Algorithms are programs written by people. People are not objective – they bring all their biases about how the world works to work with them. Algorithms are as much works of art as a news article on the intranet. All people are subjective and biased. All algorithms are subjective and biased. The thing is, you can argue with a person.

Let’s consider what may happen when we start unleashing this sort of maths on the populace.

Consider a piece of software that is designed to find “experts” in a field within an organisation. For the sake of argument let’s ignore any privacy or data protection concerns and we say in the requirements that it is access all areas: intranet, SharePoint team sites, enterprise social network, email, instant messaging – the works. This will necessarily will be software provided by an external vendor, as we are far an above the sort of quotidian development that most organisations have lying about. The programmer will make an array of assumptions about what “expertise” means and look for proxies available within the corpus: the amount someone talks about a concept; to whom; do they contribute initially or answer questions; how many people read the information they provide. Etc. I should point out I made those up. They may or may not be good proxies for expertise. Will the programmer be able to ask a social scientist to prove the alleged link between the proxy and expertise. Possibly, but probably not. Even if some psychometry is employed, there is no proof that that is, well, real. Will a social scientist be able to verify the weighting given to each source? Again, no.

This is a work of fiction. It no more that the guesswork of a clever-clogs.

And it will spit out a number. It might be based on something that sounds proper clever like the “k-nearest neighbour algorithm”, but it is the complexity of the real world boiled down into a reductive sticky goo. Search for experts in C++, it will give you a list. Bravo. But unlike our search engine algorithm it will have real world effects and feedbacks.

If the proxies are wrong, the inferences will be too. It is easy to mistake helpfulness or enthusiasm for expertise. If your algorithm starts crowning experts based on people being merely chatty on Yammer you could be in for some fun. Bob from IT help desk is now crowned an expert in something because he is helpful and interested in it. This validates his learning and encourages him. Increased findability of his “knowledge” results in more conversations, that drives a positive feedback mechanism. Bob is now, according to the system, an expert.

What could possibly go wrong?

Imagine a bizarre world where that expertise system repeated for every individual and every speciality within an organisation. This is a chaotic system. We have lost the ability to associate, on human terms, our inputs from our outputs. Perverse incentives and unintended consequences will become abundant. Now imagine that these mathematical loaded guns are deployed in lots of places throughout your organisation and its digital workplace, in places that you couldn’t even imagine, from choosing which projects get funding, to which emails get responded to.

Bang.

We are careening into what Taleb calls the fourth quadrant – a place of disproportionate disaster where black swans abound.

I might be catastrophising. Please, someone who knows what they are actually talking about, persuade me that this isn’t the case. Software like this is renowned to be brittle. In the lab, it works. Out of the lab, it falls flat on its face. But this data is growing exponentially; so is the amount of storage, processing power and (I’m sure) VC money being thrown at this. As is people’s belief that it should be so.

Value people

Which brings me back to the quote from Dune at the top. In Dune they had got rid of intelligent machines as well as the machine-attitude. It is this machine way of thinking that I find so dangerous. We trust ourselves so little in the world of human affairs that we want machines to do it for us. We hate paying people to be human so much we are willing to swap them for machines that spit out answers; not real answers that are in fact true, but constantly available answers at zero marginal cost.

When a man comes to sell you a machine that understands humans tell him that he is mistaken. He has caught this machine-attitude. Trust humans to do the work of the heart, and value the work of humans enough to have them around to do the work.

[With thanks to @shbib for reintroducing me to the Bulterian Jihad]

Advertisements

2 Comments on “A machine way of thinking — the coming algorithmic apocalypse”

  1. […] A machine way of thinking — the coming algorithmic apocalypse Abodat 28th October 2013 […]

  2. Martin White says:

    I can feel your pain. The maths behind search is not all that complicated, but it is based on probability and not on (SQL) exact match. It is trying to show that Document A is probably more relevant than Document B given a very simple query and multi-factorial probability. Google works the same way (to the surprise of many) and the reason it seems to work reasonably well is that it cheats. It uses what it knows about the person’s search history to reduce the probability odds. You can get some sense of this by doing a search on a topic that you have never done before and is miles outside your ‘known’ (by Google!) interests.

    The other issue is that in general people pay far too much attention to ‘first strike’ search. Highly relevant results have to appear on the first page. Search is recursive and if we take the time to have a conversation with the search application then the quality of the results will increase. This is a core reason behind the emerging interest in collaborative search where several people can work on a search query simultaneously. Search is getting better. Gradually!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s