Bandits for Trauma Response

10/20/2016

In this post: high-level description of bandit algorithms and how they could play a role in recent news regarding an emergency response experiment being held in Philadelphia.

On the way home from work today, I heard on the radio about a controversial trial being performed in Philadelphia. They are studying the appropriate response for a gunshot or stab victim: whether the victim receives advanced care at the scene or if they receive only basic care before being transported to a nearby hospital.

Most of the controversy from this study, at least from what the article describes, is due to how the study is conducted. Depending on the dispatch number of the city paramedics…

  • odd numbers will carry out advanced care
  • even numbers will provide basic care

Just focusing on appearances, the words “advanced” and “basic” imply that half of the future stab and gunshot victims in Philadelphia will just randomly get sub-bar care (because “advanced” is better than “basic”, right?). It’s reasonable to see how the population may be worried.

I won’t be exploring the medical aspects of this story. I am no where near qualified to discuss those details. But I want to explore how this research is conducted and how a family of algorithms might be better suited for a study like this. These algorithms are known as Bandit Algorithms.

The Hypothesis

Dr. Zoe Maher, trauma surgeon, mother, and a researcher conducting this study, asserts that it is better for victims to receive basic care until they reach the hospital. Her colleagues and she believe that advanced care, such as IV fluids and breathing tubes, will disrupt the body’s natural measures of preventing blood like the constriction of blood vessels and the forming of clots.

The 2 decisions, advanced care or basic care at the scene, lie on the spectrum of an immediate response in a less than ideal environment (the scene) or a delayed/partial response followed by a much more ideal setting (the hospital).

I want to do a basic intro to bandit algorithms and why they’re better for this type of experiment. Then I want to motivate contextual bandits as the further improved approach. Finally, I want to discuss another way to approach this study that involves already-collected data while still using bandit algorithms.

Bandit algorithms

Bandit algorithms, also known by the longer name “Multi-armed bandit” algorithms, represent a family of reinforcement learning algorithms. Reinforcement learning poses machine learning problems as an agent with a set of actions. Actions interact with an environment and the environment will respond with a reward. A good policy describes what actions an agent should take when faced with a decision from the environment such that the agent maximizes its rewards. The phrase “multi-armed bandit” are named after the scenario in which these algorithms were first studied. Imagine a person at a casino with an entire row of slot machines to themselves.

  • The Gambler (the agent)
  • The row of slot machines (the environment)
  • The gambler chooses which slot machine they will play on (the actions, each machine is one action)
  • The payout of the played machine (the reward from the environment)
  • which machine to play (the policy)

The gambler, I hope, wants to maximize their payouts. Suppose these slot machines have differing but fixed ratios by which they will payout i.e. slot machine 1 will payout 10% of the time (I don’t gamble much but I know slot machines are much more complicated than this. We’re simplifying it for this discussion). To find the machine with the highest payout, the gambler can explore and try each machine numerous times to confidently identify the payout likelihood. And once identified, they will exploit the best machine. But the gambler has a finite purse and can’t spend it all just to determine the payouts. This is where bandit algorithms come into play.

Avoiding a very mathematical treatment of the algorithm, my best summary for it is this:

a simple, multi-armed bandit algorithm will create a policy such that the gambler will choose to play each slot machines in proportion to the rewards she or he has received from that machine.

And it sounds so simple. Because it is. The gambler should play the machines that pay them more. It’s so obvious. But the elegance of bandit algorithms is how it dictates exploration vs. exploitation in order to maximize rewards and actually, minimize the difference between potential rewards. We can’t explore how these different variations of bandit algorithms handle the reinforcement learning problem (I haven’t even gotten to its application to this story yet!) because they are many. But I do recommend @johnmyleswhite’s book Bandit Algorithms for Website Optimization. (I’m not affiliated with O’Reilly, I just figured because it was an animal book, O’Reilly is the primary source. You can also get it in other places.). It’s hands-down the best book for applying bandit algorithms.

How does this apply to health-care?!

Consider a different scenario AND consider a different approach.

You’re a doctor and you would like to identify, between the choice of 3 treatments, which treatment is best suited to cure patients suffering from Nhatinitus (trust me, you don’t want it). Little is known about the disease and all the treatments are new. So you decide to divide a population of 90 patients diagnosed with the disease into 3 groups, each group receiving one of the treatments. Results are collected when the trial is over and the group with the best recovery ratio determines what treatment will be delivered to the masses. Well, wouldn’t it suck to have been in the group that doesn’t receive the working treatment. It sucks even more when you learn that Nhatinitus gives you a nauseating reaction to desserts and inclinations to bad sweaters.

Look at that damn sweater poor guy

What if instead, as the experiment progresses, you notice that one group is performing better than the others? And what if by noticing, you progress to moving individuals from other groups to the better one, thereby improving your final success rate? That’s what bandit algorithms do for you. It allows subsequent decision to be changed dynamically based on prior rewards.

Consider quickly another scenario. People love saying they optimize their user interaction/ click-through rate/[some metric] on their website because they do A/B testing. (Side note: A/B testing. Think of the above example except with only 2 groups). Bandit algorithms would be better because they will progress towards the better solution as we learn!

SO FINALLY how does it apply to our story? It seems the Philadelphia story is testing similar to how A/B testing is done: half of the population will receive “basic” treatment and the other half will receive “advanced” treatment. Why not use bandit algorithms to dynamically decide which treatment to administer as we learn the more successful of the two?

Contextual bandits

Up until now, bandits only developed their policy based solely on the rewards they receive from the environment. But a bandit should be taking advantage of additional information (the context) and include them in their decision-making. Enter the contextual bandit.

The words “contextual bandit” should be synonymous with “John Langford” who’s one of, if not, the most prominent researchers in this area. (Side note 2: what’s up with the dominance of John’s within bandit algorithms?). Refer back to the agent and their environment. This time, the agent doesn’t choose decisions based on historical rewards. They decide given a context and the historical reward for that context. There is no one, true decision that is best. So in the example above, perhaps their isn’t one treatment that is best (the action) but the best treatment depends on the patient (the context).

Of course, this increases the complexity of the algorithm (this paper is amazing but makes my head numb). Most often, we want an algorithm that can handle an infinite space of context whereas the set of decisions must be constrained.

How does this apply to the Philadelphia trials? Well maybe there isn’t one solution that is better. Perhaps stabbing victims (a context) show more success when given “advanced” treatment whereas gunshot victims (another context) respond better to “basic” treatment due to the differing average size of the trauma between the two types of incidents. And we don’t have to limit our context to that. We can include time to respond, age/health of victim, and other factors in our context to create richer information for the bandit.

Offline Analysis

The last thing I want to touch upon is how this study can be conducted on existing data. We should look at two things: 1) the sensitivity of an experiment like this and 2) a pure and true reinforcement learning setting. These two things will seem to conflict but I’ll propose a way to compromise.

Reinforcement learning is powerful because it allows one to build a model from initially knowing nothing about the environment. The gambler has no idea which slot machine will be most fruitful and a doctor doesn’t know which new drug will work best. So a typical scenario for a scientist working with reinforcement learning is to just set the agent loose and let it train anonymously.

An experiment such as the one being conducted in Philadelphia is considered very sensitive because it is dealing with people’s lives. The article contain quotes regarding citizens’ concerns in regards to “gambling” with people’s lives. What maybe one decision to explore an action is actually a risk on somebody’s life. We would like run an experiment that reduces as much risk as possible while still gathering enough evidence to identify the best policy. Historical data may be the best way to do this. You can prevent further risk by essentially not running this experiment at all! But how do we do run a bandit on historical data?

In this tutorial by John Langford at NIPS, John runs his bandit framework, against an existing dataset and in particular, a dataset that looks like a supervised learning dataset. So there are features (the context) and each label can be interpreted as an action. As John explains, if you have a confident guess at what the bandit’s policy is for this dataset, you can set those parameters a.k.a. taking decision A was always 25% of the time. If not, you can just assume it was even amongst the possible actions (5 actions? Assume a policy that would choose a certain arm 20% of the time.) Then you can train a bandit on the dataset! The bandit algorithm will learn from that and be able to decide what to do on future events. It can also decide whether it wants to explore the actions more if the dataset didn’t provide enough training.

Conclusion

The experimental trials in Philadelphia immediately jumped to me as a prime example of where bandit algorithms are appropriate. With bandit algorithms, one can design a robust policy for decision-making and reduce the expense of the experiment.

For those of you wanting or hoping for something more detailed or technical, I sincerely apologize. I hope to revisit this again with some examples and actual code.

Thanks for reading!

Working in New Domains

10/07/2016

As a data scientist, you may have domain expertise but sometimes you’ll find that your strong foundations in math, stats, machine learning, and computer science are needed on another project in your company. You may be a image processing guru but another manager would like to have you explore some NLP applications. So how should you approach a new domain?

Well you conclude, logically of course, that you know the basics, the machine learning, the libraries, the language to prototype with, the bread and butter, and that you just need to apply it to this new domain. You may do something basic to start off with. Some simple feature extraction and pipe it through one of the classic algorithms (SVM, K-means, etc.). But I think the best place to start is to implement a paper in that domain.

It won’t be easy but it’ll save time. A good paper should essentially outline everything you need to do at a high level. All you need to do is implement it yourself (again, not saying it will be easy but at least you have direction.). And if you pick a recent paper, your work will reflect the latest, state-of-the-art research in that domain. Find the terms that are used in your domain that describe what you are trying to do i.e. identifying where text is in an image? Text detection. Transcribing what the text actually says? Text recognition. Now search them on arxiv or Google Scholar and see what comes up!

Can't sudo on ChromeOS

09/11/2016

Help! I’m on my Chromebook in developer mode and I can’t sudo in the shell!

So you have a Chromebook. You booted into developer mode. You alt+ctrl+t for the terminal. You entered shell. But now when you want to try crouton or something as sudo, it prompts you for a password you don’t know!

Solution:

  1. Go into VT2 using ctrl+alt+F2.
  2. When prompted for a login, use root.
    • If you enabled Debugging Features and you set a root password, enter that password.
    • If you enabled Debugging Features and you DID NOT set a root password, enter test0000.
    • If you DID NOT enable Debugging Features, …? (contributions welcomed)
  3. Enter chromeos-setdevpasswd and follow the prompt.
  4. Once complete, you have just set the sudo password! Use it on the shell (a.k.a ctrl+alt+t and shell).
  5. Done!

Enormous thanks to Github user lan-gate.

Intro to C/C++ Extensions for Python

09/01/2016

A Practical Guide to Python C++ Extensions and Bindings

This guide is born from my recent experiences of adding Python bindings to a large C++ library. After stumbling around, I was finally able to make things work. I don’t think it’s difficult to write Python bindings or C++ extensions; it’s just hard to know how you should approach doing so for your particular case. I hope to do that in a series of blog posts. This will be just an introduction.

What are Python bindings for C++?

Writing C++ that is used by Python has a lot of names: “Python bindings”, “C++ extensions”, “wrappers”, etc. All of them entail the same thing: you’re writing code in C++ to be used in Python.

Where they differ is in mostly context:

  • “Python bindings” typically refers to adding a Python API to a C++ library.
  • “C++ extensions” more generally means extending Python with functionality written in C++.
  • “wrappers” more or less the same as “Python bindings”

And by doing so, you can:

  1. implement new built-in types
  2. call C++ functions

How is C++ code interfaced with Python?

The Python implementation you’re currently using is most likely written in C. This is known as CPython. Python can actually expose the C code that it’s using so that you can access native types and objects in C. And you can write them in C++ too.

And there’s lots of pros and cons that come with doing this…

When should I use Python bindings?

  • you want to take advantage of Python’s expressiveness to quickly develop
  • your C++ component is taking on the more CPU intensive tasks of your app but the rest of your app can be in Python.
  • you like developing in Python
  • any task where there’s a lot of “pipelining” and “infrastructure” code that needs to be written.

Python bindings allow you to leverage the strengths and weaknesses of both languages. Mainly, you can leverage the ease and speed of developing in Python coupled with the performance of C++.

Where do I wrap/attach binding/extend?

You usually have a set of functions that serve as the interface into your C++ library. You wrap around or attach bindings to these functions.

How will it be used?

Suppose you had a C++ library called myLib.a or myLib.so. Now you can use it like this in Python: import myLib. Now your CPython is extended with myLib.

How to use your large C++ library in Python…

There’s only one thing you need to do to use your C++ library in Python: You need to write adapters that convert Python objects to C++ objects and then call your C++ functions. Vice versa, you need to convert C++ objects returned from your C++ function into Python objects.

That’s it.

The task is more time-consuming than it is difficult. And it varies by how many functions you want to expose and how many objects you need to convert. There are many tools out there to help you. They help by reducing the amount of conversion code you need to write. In exchange, you make it more difficult for other users to install your package because they’ll need to obtain the same tools.

For example, consider this C++ method. It’s the entry point to a much large library with complex functionality. CustomClass, A, and B are all C++ classes.

// megaFoo.h

#include "scary.h"
#include "complex.h"

CustomClass megaFoo(vector<A>& vectA, B b);

Your C++ binding will work like this.

// megaFooAdapter.h

#include "path/to/megaFoo.h"
#include "Python.h"

class pyCustomClass(PyObject)
{
  ...
};

class pyA(PyObject)
{
  ...
};

class pyB(PyObject)
{
  ...
};

pyCustomClass pyMegaFoo(list listA, pyB b)
{
  vector<A> vA = convertListAToVectorA(listA);
  B = convertpyBToB(b);

  CustomClass cc = megaFoo(vA, B);
  return covertCustomClassTopyCustomClass(cc);
}

As you can see, megaFooAdapter does all the work. The function pyMegaFoo() accepts all Python arguments and converts them to classes that the original megaFoo() understands. But megaFoo() returns a C++ class. So we convert it back to a Python class that we’ll use.

Check out the next post for writing these functions.

Twitch Compakt

08/02/2016

Compakt

Twitch Chat streamlining. Condenses repeated messages/words/emotes.