4 Stages of IT Harmony

02/16/2017

Read on enlightened one

Typically at a larger company, the IT department may have strict processes and controls for their work. Usually, it involves submitting a ticket for something to be resolved.

I had to learn how to work within this system so that it became productive for me. However, I learned some things along the way that made it easier for both IT and myself.

I learned that there are 4 stages for IT Harmony. I’ll describe my evolution where I become more and more harmonic with IT.

Stage 1: Complete Idiot

The programmer is trapped in an endless world of trial-and-error. They are dependent on those around them and will turn to them at the first sign of distress.

“Help! I can’t ssh into the VM!”

IT Solution: “You had a typo in the hostname.” IT doesn’t even let you type. They take the keyboard, type in the correct hostname, then walk back to their desk to work on bigger ticket items.

Remedy: Stop being dumb. Double-check your inputs.

Stage 2: Ordinary

The programmer is now aware of their surroundings. The programmer yearns for truth yet is limited by their experience.

“Help! I checked everything and I can’t ssh into the VM!”

IT Solution: “VM is down for maintenance. You didn’t look at your email stating that However were scheduling the maintenance at this time?”

Remedy: Learn to use ping to double-check if the server is even reachable. You may even use curl or wget to check ports and responsiveness. And you read your emails.

Stage 3: Cloud-Enterer

The programmer has the basic foundations for being productive. They march ahead with their knowledge where they meet the boundaries of the corporate policies.

“Help! I can’t ssh into the VM! I ping’d it and it’s there!”

IT Solution: “Our firewall blocks that port. Please submit a ticket and a justification for approval to open the port. We’ll also need a list of hosts that can access it.”

Remedy: You learn the policies and the processes of the company. Most importantly, you learn how the tools of IT work.

You learn the basic firewall toolkits for your OS.

You learn DNS and BIND.

You learn TCP/IP Networking.

You learn Linux/UNIX/Windows + Network Administration.

Stage 4: Connected

The programmer is in harmony with IT. The programmer does not do IT’s job. The programmer simply is able to identify specifically their issues thereby reducing the searching, guessing, and troubleshooting required by IT. The programmer is regularly invited by IT to partake in evening festivaties.

“Help! I can’t ssh into the VM! I used nslookup and got nothing. I tried dig as well. It looks like the DNS entry never got made for the VM!”

IT Solution: “Ah, we were trying something new with Puppet. Thanks for letting us know, we’ll fix it real quick.”

Remedy: Programmer is at harmony.

Longest Repeated Consecutive Substring in Lisp

12/06/2016

I wanted to document my journey figuring out how I solved a unique problem and how doing so in Lisp helped me find a concise and correct solution.

Twitch is a live streaming platform for gaming and Twitch chat, a group chat accompanying the live-streaming video, can become incomprehensible and unwieldy with large amounts of messages submitted rapidly. A streamer or a viewer paying attention to Twitch chat may miss messages as they are buried by the ocean of new incoming messages.

I noticed that the screen real estate is occupied by long messages and often repeated messages. So I tried to tackle that problem with compakt, a Chrome Extension for Twitch chat.

To compress long messages, I used suffix arrays and an adaptation to the classic longest repeated substring. Prototyping all this in Lisp really help me decide on a concise and accurate solution.

Suppose we had the following string: abracadabra. Suffixes are substrings that contain all characters from a position in the string to the end of the string. For the aforementioned string, all possible suffixes are:

  • abracadabra$
  • bracadabra$
  • racadabra$
  • acadabra$
  • cadabra$
  • adabra$
  • dabra$
  • abra$
  • bra$
  • ra$
  • a$

’$’ denotes the end of the string. Typically a character that isn’t present originally in the string is used for this. A suffix array is an array that contains all the suffixes for a string in sorted order. Again for the aforementioned string, the suffix array is:

  • a$
  • abra$
  • abracadabra$
  • acadabra$
  • adabra$
  • bra$
  • bracadabra$
  • cadabra$
  • dabra$
  • ra$
  • racadabra$

We take advantage of this sorted property to find the longest repeated substring!

In the example above, we explicitly list out every suffix for demonstration-sake. But we can use a more compact representation of the suffix array. Instead of the suffixes themselves, we use the position of the start of the suffix (0-based index). Because suffixes are substrings from a position to the end of the string, the position can be used to uniquely identify a suffix. The suffix array from above then becomes:

  • 10
  • 7
  • 0
  • 3
  • 5
  • 8
  • 1
  • 4
  • 6
  • 9
  • 2

Suffix array construction takes \(O(n^{2} \log (n))\) time: sorting the suffixes is \(O(n \log (n))\) and each suffix comparison is \(O(n)\). A suffix array takes \(O(n)\) space.

Note

There are algorithms that can construct the suffix array in \(O(n)\) time though the implementation I present here will not be one of those.

Sources:

1 2 3 4 5 6

Note

You may have heard of suffix trees. Often, these are presented before suffix arrays in literature. I went with a suffix array because it was easier so me to understand.

The longest repeated substring is a well-document problem. What follows is a description of the problem and how it’s solved with suffix arrays. Then I’ll describe the steps I take to adapt it to my problem: the longest repeated consecutive substring.

Longer repeated substrings are more likely to occur between lexicographically consecutive suffixes.

What I want to do first is establish an intuition for determining the longest repeated substring from a suffix array. Suppose we simply wanted to find an occurrence of a pattern in the string abracadabra. Let this pattern be bra. What we would do is a binary search on the suffix array. If the pattern exists in the string it is equal to a suffix or a substring of a suffix, also known as a prefix. Knowing this, how do we find an arbitrary substring that repeats in the string? We compare suffixes to each other! A repeated substring would appear in at least 2 different suffixes. In other words, the repeated substring should appear in at least 2 different places of the string. And knowing that the suffix array is sorted, we don't need a \(O(n^2)\) comparison i.e. comparing one suffix to rest to find the repeated substring. Longer repeated substrings are more likely to occur between lexicographically consecutive suffixes. So we look for the longest common prefix between all pairs of consecutive suffixes.

Finding the longest repeated substring is a simple matter of retaining a best-so-far match and comparing the length of new potential matches.

(defun lcp (a b)
  (let ((i (string< a b)))
    (subseq a 0 i))
)

(defun returnLonger (a b)
  (if (> (length a) (length b))
    a
    b
  )
)

(defun findLongestCommonSubstring (suffixarr str best)
  (let ((best (returnLonger
                (lcp (subseq str (first suffixarr)) (subseq str (second suffixarr)))
                best)))
    (if (<= (list-length suffixarr) 2)
      best
      (findLongestCommonSubstring (rest suffixarr) str best)))
)

(defun _range (count i l)
    (if (<= count 0)
        l
        (_range (1- count) (1+ i) (append l (list i))))
)

(defun range (count)
    (_range count 0 ())
)

(let ((s (read-line)))
  (let ((suffixarray (sort (range (length s)) #'(lambda (x y) (string< x y)) :key #'(lambda (i) (subseq s i)))))
    (let ((lrcs (findLongestCommonSubstring suffixarray s "")))
      (print lrcs))))

We benefit from iterating through the suffix array in \(O(n)\). But since we compare each character of the string, the algorithm is \(O(n^{2})\).

We have a strong foundation for what we want to do. However, it’s not perfect. Take this for example:

ATCGATCGA$

The longest repeated substring is ATCGA. This repeated substring overlaps with itself. We would like to find a repeated substring that doesn’t overlap and is consecutive.

Non-overlapping but also not consecutive:

ABBAissocoolIloveyouABBA\$

Repeated string ABBA occurs but not sequentially.

Non-overlapping and consecutive:

yoloyoloyolobaggins\$

Repeated string yolo is repeated multiple times and is non-overlapping and consecutive.

So how do we do this?

Finding the repeated consecutive substrings is simple: the repeated substring must be the same distance as the distance between 2 suffixes. So for every longest common prefix that we find, check if its length is equal to the distance between the 2 suffixes. If so, it is repeated, non-overlapping and consecutive.

(defun lcp (a b)
  (let ((i (string< a b)))
    (subseq a 0 i))
)

(defun updatelcp (oldlcp checklcp suffix1 suffix2)
  (if (and (> (length checklcp) (length oldlcp))
           (checkConsecutive checklcp suffix1 suffix2))
    checklcp
    oldlcp
  )
)

(defun checkConsecutive (lcp suffix1 suffix2)
  (= (length lcp) (abs (- suffix1 suffix2)))
)

(defun findLongestCommonSubstring (suffixarr str best)
  (let ((best (updatelcp
                best
                (lcp (subseq str (first suffixarr)) (subseq str (second suffixarr)))
                (first suffixarr)
                (second suffixarr))))
    (if (<= (list-length suffixarr) 2)
      best
      (findLongestCommonSubstring (rest suffixarr) str best)))
)

(defun _range (count i l)
    (if (<= count 0)
        l
        (_range (1- count) (1+ i) (append l (list i))))
)

(defun range (count)
    (_range count 0 ())
)

(let ((s (read-line)))
  (let ((suffixarray (sort (range (length s)) #'(lambda (x y) (string< x y)) :key #'(lambda (i) (subseq s i)))))
    (let ((lrcs (findLongestCommonSubstring suffixarray s "")))
      (print lrcs))))

Before I explain how Lisp helped me understand this logic, I want to quickly describe one other extension I made for my application.

I only wanted to find repeated substrings that consisted of whole words from the original message. So I tokenized the original string and treated each token as a character. Then I performed the longest repeated consecutive substring on this new structure. You can give the code here on my GitHub.

I love Lisp

In my head, I imagined this section as a list of features that I would ramble on about how they helped me implementation this algorithm. But I think I only need to say one thing: functional programming.

  • I was forced to take advantage of the existing structures and their properties instead of iterators and counts making for an efficient use of my allocated memory.
  • I was able to see the meat and bones of the algorithm: a lcp search between each pair of lexicographically consecutive suffixes.

I’m still very new to Lisp but I know for sure I want to dive deeper into functional programming. I’m happy to hear any feedback on this post or the code in this post. Check out the Lisp code for my application here. And please check out my extension! It’s open-sourced!

How to Build a Chrome Extension for Twitch Chat

11/22/2016

How to make your own Chrome Extensions for Twitch!

Customize and extend Twitch Chat!

People have strong opinions about Twitch Chat. Despite what one may think, it’s one of the most integral parts of Twitch.

Do you ever have an idea for customizing or improving Twitch Chat? You can do so with Chrome Extensions. Many popular Twitch “must-haves” are browser extensions like the popular BetterTTV. I’ve made my own extension and I’ll teach you how to make one yourself by presenting a basic extension. Specifically, I’ll show you how to retrieve chat messages so that you’ll be able to do cool stuff with it.

We’ll make an extension that finds any utterance of the word “bard” and replaces it with a picture of a bard: bard!.

Requirements

  • Chrome Browser (I hope to have more posts on Firefox, Safari, etc.)
  • some Javascript knowledge

Chrome Extension Basics

(just the bare minimal!)

A Chrome extension is just that: it extends the functionality of the Chrome browser. An extension usually consists of the following files:

  • manifest
  • Javascript [background or injected]

A manifest file is a JSON that describes the extension.

Then there are the Javascript files. Background Javascript files are run as long as the browser is running and the extension is activated. These files are not loaded with a visited webpage’s resources. For example, ad blockers operate with background Javascript files by blocking ads from being loaded in your page. Unlike background Javascript files, injected Javascript files are “injected” into every page a.k.a. it’s loaded along with the webpage’s resources.

Let’s make the manifest file first.

Manifest Destiny

I’m just going to dump the manifest file on you and then explain the important parts. Check out the Chrome Extension manifest resource for info on all fields.

{
  "manifest_version": 2,

  "name": "Bard",
  "short_name": "Bard",
  "description": "Search the land far and wide for the bard.",
  "version": "0.01",
  "background": {
    "scripts": ["checkIfTwitch.js"],
    "persistent": false
  },
  "page_action": {
    "default_title": "twitch_bard"
  },
  "content_scripts": [
    {
      "matches": ["https://www.twitch.tv/*"],
      "js": [
        "jquery-3.1.1.slim.min.js",
        "main.js"
      ],
      "exclude_globs": ["https://www.twitch.tv/directory/*",
        "https://www.twitch.tv/p/*",
        "https://www.twitch.tv/products/*",
        "https://www.twitch.tv/*/manager*",
        "https://www.twitch.tv/*/dashboard",
        "https://www.twitch.tv/broadcast",
        "https://www.twitch.tv/messages/*",
        "https://www.twitch.tv/settings"
      ],
      "run_at": "document_end"
    }
  ],
  "web_accessible_resources": [
    "assets/bard.png"
  ],
  "permissions": [
    "activeTab"
  ]
}

The most important parts of this file are:

  • background: Lists in an array all JavaScript files that are background scripts
  • content_scripts: Lists all JavaScript files that are injected. Order matters! We load jquery first because the other script will reference it!
  • exclude_globs: URLs where the extension won’t run_at
  • web_accessible_resources: Resources in the extension that the browser may have access to and load. We specify our bard picture right there.
  • run_at: When the injection of JavaScript is done

Meat and Bones

Diving into the code!

In the manifest file, devs can determine when the extension is loaded. This affects what the extension can act on in the DOM. For Twitch, it’s a little tricky. Notice that for run_at, we want to inject our scripts when the document, the webpage, has loaded. However, Twitch’s UI is dynamic. So the document may be done loaded but Twitch chat isn’t up yet. We need to be able to scan elements being added to the page to identify Twitch chat. To do this, we used MutationObserver.

MutationObservers are ridiculously useful. Simply put, they are objects that can react to DOM changes. We use them to look for Twitch chat.

var config = {attributes: false, childList: true, characterData: false};

var htmlBody = $("body")[0];
var chatLoadedObserver = new MutationObserver(function (mutations, observer) {
    mutations.forEach(function (mutation) {
        var chatSelector = $(".chat-lines");
        if (chatSelector.length > 0) {
            // Select the node element.
            var target = chatSelector[0];

            // Pass in the target node, as well as the observer options
            bardFinder.observe(target, config);

            // Unregister chatLoadedObserver. We don't need to check for the chat element anymore.
            observer.disconnect();
        }
    })
});

chatLoadedObserver.observe(htmlBody, config);

Let’s go one chunk at a time.

var config = {attributes: false, childList: true, characterData: false};

We specify the configurations for the MutationObserver. Using childList: true, we only want to know changes consisting of nodes being added to the element we observe. We don’t care if its attributes change, attributes: false, or if its data changes, characterData: false.

More info on config.

var htmlBody = $("body")[0];

We use jquery to select the body HTML element that we know must be already in the document.

var chatLoadedObserver = new MutationObserver(function (mutations, observer) {...}

We create a new MutationObserver and define what it should do for every DOM change event…

mutations.forEach(function (mutation) {
  var chatSelector = $(".chat-lines");
  if (chatSelector.length > 0) {
    // We found Chat!
  }

For everything change, we select the element, if any, with the class .chat-lines. That’s our Twitch chat! We can double-check this by inspecting in the browser.

Finding the name of Twitch Chat

In the future, Twitch decides to change the class name for Twitch chat, we’ll have to update our code.

observer.disconnect();

I skipped a few lines and I just want to explain this quickly. We disconnect the observer once we find chat. We no longer need to send it events because we found what we are looking for.

bardFinder.observe(target, config);

So what is a bard finder?

// Bard Search
// Attach listener that acts when a new chat message appears.
var bardFinder = new MutationObserver(function (mutations) {
  // For each mutation object, we look for the addedNode object
  mutations.forEach(function (mutation) {
    // A chat message would be an added node
    mutation.addedNodes.forEach(function (addedNode) {
      // At this point it's potentially a chatMessage object.
      var chatMessage = $(addedNode);
      if (!chatMessage.is(".chat-line", ".message-line")) {
        // this isn't a chat message, skip processing.
        return;
      }
      // Grab the actual span element with the message content
      var messageElement = chatMessage.children(twitchChatMessageContent);

      parseMsgHTML(messageElement);
    });
  });
});

bardFinder is another MutationObserver! Once we find the chat, we must observe each chat message. MutationObserver allows us to do that because chat messages are simply nodes being added to the parent node, the .chat-lines node!

mutations.forEach(function (mutation) {
  // A chat message would be an added node
  mutation.addedNodes.forEach(function (addedNode) {
    // At this point it's potentially a chatMessage object.
    var chatMessage = $(addedNode);
    if (!chatMessage.is(".chat-line", ".message-line")) {
      // this isn't a chat message, skip processing.
      return;
    }

For each added node, we check the class to see if they are chat messages.

// Grab the actual span element with the message content
var messageElement = chatMessage.children(.message);

parseMsgHTML(messageElement);

Then we find one, we retrieve the body of the text.

Finally, before going into the nitty gritty, we start the observer to listen for events.

chatLoadedObserver.observe(htmlBody, config);

Parsing the Chat message

Here is where we do our bard magic!

Looking at the parsing code will be straight-forward if you’re familiar with JavaScript. This part has the least to do with Twitch or Chrome extensions.

// Twitch chat message element: rich with media.
var parseMsgHTML = function (msgHTML) {
  var contents = msgHTML.html(
    msgHTML
      .text()
      .replace(
        "bard",
        "<img src='"+ chrome.extension.getURL('assets/bard.png') + "' alt='bard!'/>")
  );
};

This function receives an jquery HTML element. We use msgHTML.html(...) to set new HTML for the element. (don’t use .text() because it will interpret the new part as text, not HTML!).

Within there, we get the text of the HTML, msgHTML.text() and we replace any instance of the sequence “bard” with an HTML element.

This part is important:

chrome.extension.getURL('assets/bard.png')

We must use chrome.extesion.getURL when we are referencing a file in our extension package because the API will create the appropriate path for us.

Load the extension

I won’t go over how to load the extension in Chrome because Google will do a better job than me.

How to load an extension into Chrome.

This is obviously for development sake. You can of course upload it to the Chrome Web store.

That’s it!

bard demo!

Further Work

I hope you found that useful. I wanted to share some of the tricky bits such as identifying the appropriate elements of chat and how to get to them.

The skeleton extension can be found here on GitHub.

With this model, you can do a lot of things. You can parse chat and do really cool summaries, add your own emotes or wrangle the beast!

Please leave me any feedback and checkout my Chrome extension for Twitch: compakt. It compresses long messages and condenses repeated messages.

A Practical Walkthrough about Bandits for Trauma Response

10/23/2016

Assessing the best EMT response using Bandit Algorithms

A controversial experiment is being performed in Philadelphia. Researchers are exploring the appropriate response for a gunshot or stab victim: whether the victim receives advanced care at the scene or if they receive only basic care before being transported to a nearby hospital. Dr. Zoe Maher, trauma surgeon, mother, and a researcher conducting this study, asserts that it is better for victims to receive basic care until they reach the hospital. Her colleagues and she believe that advanced care, such as IV fluids and breathing tubes, will disrupt the body’s natural measures of preventing blood like the constriction of blood vessels and the forming of clots.

Most of the controversy from this study, at least from what the article describes, is due to how the study is conducted. Depending on the dispatch number of the city paramedics…

  • odd numbers will carry out advanced care
  • even numbers will provide basic care

Just focusing on appearances, the words “advanced” and “basic” imply that half of the future stab and gunshot victims in Philadelphia will just randomly get sub-bar care (because “advanced” is better than “basic”, right?). It’s reasonable to see how the population may be worried.

I want to do a basic intro to bandit algorithms and why they’re better for this type of experiment. Then I want to motivate contextual bandits as the further improved approach.

This post is a practical walkthrough of a previous post.

Creating some data

I looked everywhere for data to use in the project. I would imagine between patient information and the difficulty of collecting data in an emergency medical situation, this data is not readily available. So let’s make up some data that would most likely represent this scenario.

We’re going to create a class that outputs emergency events for us. It’s over-simplified. The probability of survivability is fixed and is dependent on the type of emergency and the treatment administered.

import random

emergency_type = ['stab', 'gunshot']

class SimpleEmergency:
    def __init__(self):
        self.emergency_type = random.choice(emergency_type)

    def apply_treatment(self, treatment):
        '''
        A very simplified model of the emergency and the outcome based on treatment.
        Outcome is treated as P(survive | (treatment, emergency_type))

        P( survive | (stab, basic)) = .6
        P( survive | (stab, advanced)) = .5
        P( survive | (gunshot, basic)) = .75
        P( survive | (gunshot, advanced)) = .45
        '''

        # If it was a stab wound and the basic treatment was applied.
        if (self.emergency_type == 'stab') and (treatment == 'basic'):
            if random.random() > .6:
                return True
            else:
                return False
        # If it was a stab wound and the advanced treatment was applied.
        elif (self.emergency_type == 'stab') and (treatment == 'advanced'):
            if random.random() > .5:
                return True
            else:
                return False
        # If it was a gunshot wound and the basic treatment was applied.
        elif (self.emergency_type == 'gunshot') and (treatment == 'basic'):
            if random.random() > .75:
                return True
            else:
                return False
        # If it was a gunshot wound and the advanced treatment was applied.
        elif (self.emergency_type == 'gunshot') and (treatment == 'advanced'):
            if random.random() > .45:
                return True
            else:
                return False

The Bandit

Now let’s jump right into the bandit! We’re going to be looking at one of the most popular bandit algorithms, the Upper Confidence Boundary (UCB) Bandit. Before we look at the code, let me give a description of the UCB bandit.

A bandit explores and exploits. That means according to its policy, when presented with a trial from the environment, the agent, our dearly beloved bandit, will choose whether to try action and assess the received reward (explore) or it will choose the action that has been the most beneficial (exploit). You can generalize that bandits differ essentially by this policy for exploration vs. exploitation. The UCB bandit’s policy will always choose the best arm that has historically given it the most reward BUT it will try different arms if it hasn’t tried them in a while.

“How the hell does that work?”

Well the UCB defines a function as follows:

action.value + sqrt((2 * log(total_counts)) / float(action.count))
^              ^
|__ term 1     |______ term 2

- total_counts : number of times the bandit was faced with a decision and chose an arm.
- arm.count    : number of times this action has been chosen.
- arm.value    : historical average for reward received for this action.

Term 1

action.value

As described above, term 1 is the historical average for the reward of this particular action.

Term 2

sqrt((2 * log(total_counts)) / float(action.count))

But term 2 does all the magic for exploration. Term 2 is a large number when total_counts is large and action.count is small. Interpreted, that means term 2 is large when a long period of time has passed without trying the action.

But there’s more to it!

All together now

Together, term 1 and term 2 create a value used by the bandit to choose which action to try. The bandit will always choose the action with the highest value. An action could have a low, historical reward (term 1) but if it’s been long enough for term 2 to be large, the bandit will try that action. This has the very cool effect of trying poorer performing arms less!

The UCB bandit will try poorer performing arms in the future. Is this a good thing or a bad thing. It depends on the environment. If this is a static environment, meaning there is an absolute best action for the bandit, then this is not an ideal feature of UCB. It will all try poorer performing arms though more infrequently as time progresses. If the environment is dynamic and the best decisions change over time, then this is a good feature. The bandit may try different arms and notice that the rewards are different. This in turn changes how it decides which arm to pick in the future.

So finally, let’s look at the entire code for the UCB bandit. Code from the great John Myles White (@johnmyleswhite) whose book is highly recommended (and from which this slightly-edited code excerpt comes from).

import math

def ind_max(x):
  m = max(x)
  return x.index(m)

class UCB1():
  def __init__(self, n_arms):
    self.initialize(n_arms)
    return

  def initialize(self, n_arms):
    self.counts = [0 for col in range(n_arms)]
    self.values = [0.0 for col in range(n_arms)]
    return

  def select_arm(self):
    n_arms = len(self.counts)
    for arm in range(n_arms):
      if self.counts[arm] == 0:
        return arm

    ucb_values = [0.0 for arm in range(n_arms)]
    total_counts = sum(self.counts)
    for arm in range(n_arms):
      # IMPORTANT STUFF!
      bonus = math.sqrt((2 * math.log(total_counts)) / float(self.counts[arm]))
      ucb_values[arm] = self.values[arm] + bonus
    return ind_max(ucb_values)

  def update(self, chosen_arm, reward):
    self.counts[chosen_arm] = self.counts[chosen_arm] + 1
    n = self.counts[chosen_arm]

    value = self.values[chosen_arm]
    new_value = ((n - 1) / float(n)) * value + (1 / float(n)) * reward
    self.values[chosen_arm] = new_value
    return

Other Bandits

As mentioned earlier, there are other bandits. They differ mostly in how their policy dictates exploration vs. exploitation. Here are some:

Testing our Bandit

Let’s test our bandit! The simulation will run as follows:

def train_UCB1(trials, arm_labels):

    # Initialize bandit with the number of actions it can perform.
    bandit = UCB1(len(arm_labels))

    for _ in range(trials):
        # Create one emergency situation
        e = SimpleEmergency()

        # The bandit will select an action, an arm.
        action = bandit.select_arm()

        # See if the action, the treatment, is a success or not.
        success = e.apply_treatment(arm_labels[action])

        # If it's successful...
        if success:
            # We reward the action.
            bandit.update(action, 1.0)
        else:
            # We don't reward the action.
            bandit.update(action, 0.0)

    return bandit

Inspection

Let’s inspect our bandit to see how it decides.

trials = 1000
treatments = ['basic', 'advanced']

bandit = train_UCB1(trials, treatments)

for treatment, value in zip(treatments, bandit.values):
    print('{}: {}'.format(treatment, value))
basic: 0.3064516129032258
advanced: 0.5194063926940644

The larger number indicates the action that is best!

Note: this result has no meaning on the Philadelphia experiments. Why? Because our data is made up! In fact, you should play with the percentages to see how it changes the outcome!

Why Bandits though?

So we talked about bandits and how it can be used in this context. But why should we?

Consider how the Philadelphia experiment is set-up. It’s set-up much like how A/B testing sets up experiments: divide population in number of groups corresponding to the experimental parameters you would like to test. In this case, the population is divided in two, one for ‘basic’ treatment and the other for ‘advanced’ treatment.

Let’s make these differing approached face-off against one another! Let the most accurate win!

def bandit_vs_AB():

    trials = 1000
    treatments = ['basic', 'advanced']
    bandit = UCB1(len(treatments))
    bandit_score = 0
    AB_score = 0

    for i in range(trials):
        # Create one emergency situation
        e = SimpleEmergency()

        # Bandit
        action = bandit.select_arm()
        bandit_treatment = treatments[action]

        bandit_success = e.apply_treatment(bandit_treatment)
        if bandit_success:
            bandit.update(action, 1.0)
            bandit_score +=1
        else:
            bandit.update(action, 0.0)

        # A/B
        if (i % 2) == 0:
            AB_treatment = treatments[0]  # basic
        else:
            AB_treatment = treatments[1]  # advanced

        AB_success = e.apply_treatment(AB_treatment)
        if AB_success:
            AB_score += 1

    return bandit_score, AB_score

and the winner is…

for setup, score in zip(['bandit', 'A/B'], bandit_vs_AB()):
    print('{} : {}'.format(setup, score))
bandit : 470
A/B : 402

Bandits win!!!

How?

Bandit algorithms choose actions based on historical rewards per action. As it learns which action is the most fruitful, it chooses it more and more (I know I’m repeating myself at this point). This means we are able to benefit while learning about the experiment! So for Philadelphia, researchers can explore treatments early on in the experiment. But as time progresses, they start to get a sense of what works better and they start administering that action more and more.

Contextual Bandits

During the progression of the argument for bandits, I’m sure most of you thought, “this view is way too simplistic!”. You’re right. In real life, there is much more information available than just ‘gunshot’ or ‘stab’. In fact, some of you may have notice in the code that select_arm() doesn’t event take an argument (for you all-star, snake-charming Pythonistas out there, yes, there is actually an argument, self, but you know what I mean :) )! That’s the amazing thing about this bandits: it doesn’t need a model of the environment! It proceeds solely off of the action-rewards interaction. But we’re smart data scientists! We want to take advantage of all the information possibly available. Enter contextual bandits.

Up until this point, our bandit only decides one action that applies to both ‘stab’ and ‘gunshot’ scenarios. But as many of you could imagine, it could vary. Perhaps ‘basic’ treatment works better for one or the other. In fact there may be other factors at play. Emergencies should actually look like…

(note: at this point, the code is mainly to communicate my point. The fake data is starting to get really…fake).

regions = ['head', 'torso', 'arm/leg']

class RealEmergency:
    def __init__(self):
        self.emergency_type = random.choice(emergency_type)
        self.age = random.randint(18, 65)
        self.time_since_arrival = random.randint(60, 1080)  # seconds. Assume in Philly, all locations are within 18 min. of an ambulance
        self.location_of_injury = random.choose(regions)
        self.multiple_wounds = bool(random.randbits(1))
        self.time_to_hospital = random.randint(60, 1080)  # seconds. Refer to "time since arrival" assumption.
        # and possibly many others....

    def apply_treatment(self, treatment):
        # The following is used to mock a distribution of survivability over the parameters above.
        # IT IS NOT SCIENCE! It is only used for simulation!

        val = 0.0
        if self.emergency_type == 'stab':
            val = 3.42
        else:
            val = 9.6
        if random.betavariate(
            self.age/self.time_since_arrival + val,
            self.location_of_injury + self.multiple_wounds + self.time_to_hospital
        ) > .4:
            return True
        else:
            return False

Given all this new information, how should the bandit deal with it?

Well, when it decides which action to take, it should account for the context!

class ContextualBandit():
    def select_arm(self, context_vector):
        # map context_vector to some qualitative value per arm.
        # Let W_i be a 1-by-n vector where the context_vector is the column vector n-by-1.
        # i is the arm index

        arm_values = [0.0 for arm in range(self.n_arms)]
        for i in self.n_arms:
             arm_values[i] = W_i[i] * context_vector  + calc_exploration_value(i) # we can add another term like in UCB too!

        return ind_max(arm_values)

    def update(self, context_vector, arm, reward):
        # perform an update on the arm's historical reward.
        update_arm(arm, reward)

        # Update our W row-vector
        loss = compute_loss(context_vector, W_i[arm])
        update_W(W_i[arm], loss)

Notice that the bandit above basically wraps a linear regression model to map the context to a reward.

Remember, a bandit algorithm dictates a policy for exploration vs. exploitation. In a sense, bandit algorithms and reinforcement learning in general are just frameworks for learning. They are not disjoint from supervised and unsupervised learning.

There are other contextual bandits. And they can be extremely complex. This one is actually considered pretty naive for a contextual bandit. I just want to show you how you take advantage of the context of a decision.

Conclusion

Don’t forget about bandit algorithms. Canonical examples for applications of bandit algorithms are health/medical experiments such as the ones being conducted in Philadelphia.

Bandits for Trauma Response

10/20/2016

In this post: high-level description of bandit algorithms and how they could play a role in recent news regarding an emergency response experiment being held in Philadelphia.

On the way home from work today, I heard on the radio about a controversial trial being performed in Philadelphia. They are studying the appropriate response for a gunshot or stab victim: whether the victim receives advanced care at the scene or if they receive only basic care before being transported to a nearby hospital.

Most of the controversy from this study, at least from what the article describes, is due to how the study is conducted. Depending on the dispatch number of the city paramedics…

  • odd numbers will carry out advanced care
  • even numbers will provide basic care

Just focusing on appearances, the words “advanced” and “basic” imply that half of the future stab and gunshot victims in Philadelphia will just randomly get sub-bar care (because “advanced” is better than “basic”, right?). It’s reasonable to see how the population may be worried.

I won’t be exploring the medical aspects of this story. I am no where near qualified to discuss those details. But I want to explore how this research is conducted and how a family of algorithms might be better suited for a study like this. These algorithms are known as Bandit Algorithms.

The Hypothesis

Dr. Zoe Maher, trauma surgeon, mother, and a researcher conducting this study, asserts that it is better for victims to receive basic care until they reach the hospital. Her colleagues and she believe that advanced care, such as IV fluids and breathing tubes, will disrupt the body’s natural measures of preventing blood like the constriction of blood vessels and the forming of clots.

The 2 decisions, advanced care or basic care at the scene, lie on the spectrum of an immediate response in a less than ideal environment (the scene) or a delayed/partial response followed by a much more ideal setting (the hospital).

I want to do a basic intro to bandit algorithms and why they’re better for this type of experiment. Then I want to motivate contextual bandits as the further improved approach. Finally, I want to discuss another way to approach this study that involves already-collected data while still using bandit algorithms.

Bandit algorithms

Bandit algorithms, also known by the longer name “Multi-armed bandit” algorithms, represent a family of reinforcement learning algorithms. Reinforcement learning poses machine learning problems as an agent with a set of actions. Actions interact with an environment and the environment will respond with a reward. A good policy describes what actions an agent should take when faced with a decision from the environment such that the agent maximizes its rewards. The phrase “multi-armed bandit” are named after the scenario in which these algorithms were first studied. Imagine a person at a casino with an entire row of slot machines to themselves.

  • The Gambler (the agent)
  • The row of slot machines (the environment)
  • The gambler chooses which slot machine they will play on (the actions, each machine is one action)
  • The payout of the played machine (the reward from the environment)
  • which machine to play (the policy)

The gambler, I hope, wants to maximize their payouts. Suppose these slot machines have differing but fixed ratios by which they will payout i.e. slot machine 1 will payout 10% of the time (I don’t gamble much but I know slot machines are much more complicated than this. We’re simplifying it for this discussion). To find the machine with the highest payout, the gambler can explore and try each machine numerous times to confidently identify the payout likelihood. And once identified, they will exploit the best machine. But the gambler has a finite purse and can’t spend it all just to determine the payouts. This is where bandit algorithms come into play.

Avoiding a very mathematical treatment of the algorithm, my best summary for it is this:

a simple, multi-armed bandit algorithm will create a policy such that the gambler will choose to play each slot machines in proportion to the rewards she or he has received from that machine.

And it sounds so simple. Because it is. The gambler should play the machines that pay them more. It’s so obvious. But the elegance of bandit algorithms is how it dictates exploration vs. exploitation in order to maximize rewards and actually, minimize the difference between potential rewards. We can’t explore how these different variations of bandit algorithms handle the reinforcement learning problem (I haven’t even gotten to its application to this story yet!) because they are many. But I do recommend @johnmyleswhite’s book Bandit Algorithms for Website Optimization. (I’m not affiliated with O’Reilly, I just figured because it was an animal book, O’Reilly is the primary source. You can also get it in other places.). It’s hands-down the best book for applying bandit algorithms.

How does this apply to health-care?!

Consider a different scenario AND consider a different approach.

You’re a doctor and you would like to identify, between the choice of 3 treatments, which treatment is best suited to cure patients suffering from Nhatinitus (trust me, you don’t want it). Little is known about the disease and all the treatments are new. So you decide to divide a population of 90 patients diagnosed with the disease into 3 groups, each group receiving one of the treatments. Results are collected when the trial is over and the group with the best recovery ratio determines what treatment will be delivered to the masses. Well, wouldn’t it suck to have been in the group that doesn’t receive the working treatment. It sucks even more when you learn that Nhatinitus gives you a nauseating reaction to desserts and inclinations to bad sweaters.

Look at that damn sweater poor guy

What if instead, as the experiment progresses, you notice that one group is performing better than the others? And what if by noticing, you progress to moving individuals from other groups to the better one, thereby improving your final success rate? That’s what bandit algorithms do for you. It allows subsequent decision to be changed dynamically based on prior rewards.

Consider quickly another scenario. People love saying they optimize their user interaction/ click-through rate/[some metric] on their website because they do A/B testing. (Side note: A/B testing. Think of the above example except with only 2 groups). Bandit algorithms would be better because they will progress towards the better solution as we learn!

SO FINALLY how does it apply to our story? It seems the Philadelphia story is testing similar to how A/B testing is done: half of the population will receive “basic” treatment and the other half will receive “advanced” treatment. Why not use bandit algorithms to dynamically decide which treatment to administer as we learn the more successful of the two?

Contextual bandits

Up until now, bandits only developed their policy based solely on the rewards they receive from the environment. But a bandit should be taking advantage of additional information (the context) and include them in their decision-making. Enter the contextual bandit.

The words “contextual bandit” should be synonymous with “John Langford” who’s one of, if not, the most prominent researchers in this area. (Side note 2: what’s up with the dominance of John’s within bandit algorithms?). Refer back to the agent and their environment. This time, the agent doesn’t choose decisions based on historical rewards. They decide given a context and the historical reward for that context. There is no one, true decision that is best. So in the example above, perhaps their isn’t one treatment that is best (the action) but the best treatment depends on the patient (the context).

Of course, this increases the complexity of the algorithm (this paper is amazing but makes my head numb). Most often, we want an algorithm that can handle an infinite space of context whereas the set of decisions must be constrained.

How does this apply to the Philadelphia trials? Well maybe there isn’t one solution that is better. Perhaps stabbing victims (a context) show more success when given “advanced” treatment whereas gunshot victims (another context) respond better to “basic” treatment due to the differing average size of the trauma between the two types of incidents. And we don’t have to limit our context to that. We can include time to respond, age/health of victim, and other factors in our context to create richer information for the bandit.

Offline Analysis

The last thing I want to touch upon is how this study can be conducted on existing data. We should look at two things: 1) the sensitivity of an experiment like this and 2) a pure and true reinforcement learning setting. These two things will seem to conflict but I’ll propose a way to compromise.

Reinforcement learning is powerful because it allows one to build a model from initially knowing nothing about the environment. The gambler has no idea which slot machine will be most fruitful and a doctor doesn’t know which new drug will work best. So a typical scenario for a scientist working with reinforcement learning is to just set the agent loose and let it train anonymously.

An experiment such as the one being conducted in Philadelphia is considered very sensitive because it is dealing with people’s lives. The article contain quotes regarding citizens’ concerns in regards to “gambling” with people’s lives. What maybe one decision to explore an action is actually a risk on somebody’s life. We would like run an experiment that reduces as much risk as possible while still gathering enough evidence to identify the best policy. Historical data may be the best way to do this. You can prevent further risk by essentially not running this experiment at all! But how do we do run a bandit on historical data?

In this tutorial by John Langford at NIPS, John runs his bandit framework, against an existing dataset and in particular, a dataset that looks like a supervised learning dataset. So there are features (the context) and each label can be interpreted as an action. As John explains, if you have a confident guess at what the bandit’s policy is for this dataset, you can set those parameters a.k.a. taking decision A was always 25% of the time. If not, you can just assume it was even amongst the possible actions (5 actions? Assume a policy that would choose a certain arm 20% of the time.) Then you can train a bandit on the dataset! The bandit algorithm will learn from that and be able to decide what to do on future events. It can also decide whether it wants to explore the actions more if the dataset didn’t provide enough training.

Conclusion

The experimental trials in Philadelphia immediately jumped to me as a prime example of where bandit algorithms are appropriate. With bandit algorithms, one can design a robust policy for decision-making and reduce the expense of the experiment.

For those of you wanting or hoping for something more detailed or technical, I sincerely apologize. I hope to revisit this again with some examples and actual code.

Thanks for reading!