Go for Statistical Programming

NY Open Statistical Programming Meetup: Knewton HQ

19 August 2013

Aditya Mukerjee

Personal Background

Overview of Go History

Birth

Systems programming

Design goals

OkCupid Data Workflow (circa 2009)

Python in 2009

Go in 2013

Advantages of Go (for statisticians)

Disadvantages of Go

Speed

Type system

func addOne(a int) int {
    return a + 1
}
func main(){
    a := 5
    b := addOne(a) //Assigned type is inferred
}

Getting started with Go

Go for Python programmers

See also: Go for Pythonists

Go for R programmers

Tips for R programmers

Function declaration

func doSomething(){
    var foo int
    foo = 6
    bar := 8
}

Zero values

For loops/If statements

func PrintAllNames(names []string){
    for _, name := range names {
        if name != "" {
            fmt.Println(name)
        }
    }
}

Error Handling

Structs

type CheckingAccount struct {
    Balance int
    superSecretId int64
}
type Person struct {
    MainAccount CheckingAccount
}

type Bank struct {
    Accounts []CheckingAccount
    SavingsAccounts []struct{
        Balance int
        InterestRate float64
    }
}

Interfaces in Go

Using Interfaces

type Account interface {
    Deposit(int) error
}

type CheckingAccount struct {
    Balance int 
        superSecretId int64
}

func (destination CheckingAccount) Deposit(amount int) error {
    destination.Balance += amount
    return nil
}

Concurrency in Go

Go's Concurrency Model:

Concurrency in Go: Goroutines

So what is a goroutine?

From the docs:
- A goroutine is "a function executing concurrently with other goroutines in the same address space"
- Goroutines are light
- Goroutines are multiplexed onto multiple OS threads
- Goroutines are similar to the & operator in bash/sh

Synchronicity by Example

func Greet(name string) {
    log.Printf("Greetings, %s!", name)
    time.Sleep(3 * time.Second)
}

func main() {
    Greet("Alice")
    Greet("Bob")
}

Goroutines by Example

func Greet(name string) {
    log.Printf("Greetings, %s!", name)
    time.Sleep(3 * time.Second)
}

func main() {
    go Greet("Alice")
    go Greet("Bob")

    time.Sleep(5 * time.Second)
}

Yes, it really is just that easy.

Goroutines vs. Callbacks

Concurrency in Go: Channels

Channels by Example:

func Greet(name string, response_chan chan string) {
    greeting := fmt.Sprintf("Greetings, %s!", name)
    response_chan <- greeting
}

func main() {

    cs := make(chan string)
    go Greet("Alice", cs)
    greeting := <-cs
    log.Print(greeting)
}

Let's take a closer look at that:

func Greet(name string, response_chan chan string) {
    time.Sleep(3 * time.Second)
    greeting := fmt.Sprintf("Greetings, %s!", name)
    response_chan <- greeting
    log.Print("Greeting function is tired - sleeping for a bit")
    time.Sleep(3 * time.Second)
    log.Print("now the greeting function is done sleeping - terminating")
}

func main() {

    cs := make(chan string)
    go Greet("Alice", cs)
    log.Print("Continuing execution while we wait for greeter to respond")
    greeting := <-cs
    log.Print(greeting)
    time.Sleep(10 * time.Second)
}

Go makes concurrency easy

Data processing in Go

Go Standard Library

Some tools included for free in the standard library:

Data scraping in Go

Example: Rate-Limiting & Scraping

//Execute a query that will automatically be throttled
func throttledQuery(queryQueue chan queryChan) {
    for q := range queryQueue {

        endpoint_path := q.endpoint_path
        method := q.method
        keyVals := q.keyVals
        response_ch := q.response_ch
        result, err := Query(endpoint_path, method, keyVals)
        response_ch <- struct {
            result []byte
                err    error
        }{result, err}

        time.Sleep(SECONDS_PER_QUERY)
    }
}

Channels for parallel computation

Selecting Channels

select {
    case <-algorithm1:
        // a read from ch has occurred
    case <-algorithm2:
        // the read from ch has timed out
}
select {
    case <-ch:
        // a read from ch has occurred
    case <-timeout:
        // the read from ch has timed out
}

Example: Data Collection with Background Computation

Writing extensions in C

Other useful munging tools

go-json - automatically generate static struct definitions for JSON unmarshalling
mgo - a MongoDB client library for Go
redigo - A Redis client library in Go

Conclusion

Thank you

Aditya Mukerjee