Aggregation Strategies

March 3rd, 2010

How do you find the minimum, maximum, and average of values of a set of samples over time? What if some of your data sources are unreliable and prone to drop in and out? Should you group them into regular buckets, even if that may mean multiple samples from source A and none from source B? Or should you calculate instantaneous aggregate numbers whenever a new sample comes in at the expense of having as many aggregate points as samples and potentially unevenly distributed across time? Are there better ways?

In the end I hacked up a simple library, aggregate.js, to test various strategies using Node. Right now I just have the two basic windowing and instantaneous strategies implemented, but I plan on adding more as needed.

Javascript Constructors with an Array of Parameters

March 3rd, 2010

Say we have:

function myObj() {
  this.input = Array.prototype.slice.call(arguments);
}

We can pass a variable or an array but :

var myObj1 = new myObj(1);
myObj1.input // -> [1]
var myArr = [1, 2, 3];
var myObj2 = new myObj(myArr);
myObj2.input // -> [[1, 2, 3]]

How then can we get, while still using myArr:

myObj3.input // -> [1, 2, 3]

Here’s how, courtesy of this site:

var myObj3 = new myObj();
myObj.prototype.constructor.apply(myObj3, myArr);

Note that myObj3 needn’t be an instance of myObj, it just needs to be an existing value so that the global object isn’t used as the value of this.

This is pretty cool, though of couse you can do it much easier in languages such as Scala:

val myObj4 = new myObj(myArr: _*)

Javascript Primitives

February 13th, 2010

From the Mozilla Javascript Glossary:

primitive, primitive value
A data that is not an object and does not have any methods. JavaScript has 5 primitive datatypes: string, number, boolean, null, undefined. With the exception of null and undefined, all primitives values have object equivalents which wrap around the primitive values, e.g. a String object wraps around a string primitive. All primitives are immutable.

Did you know that all Javascript primatives are immutable? I sure didn’t.

Artvertentie

February 4th, 2010

At the end of last year I found myself with a bunch of advertising credit for the Amsterdam newspaper Het Parool and no pressing need to advertise my business (you’re already here on the Bubble Foundry blog, right?). So, my friends and I decided to have some fun and make it into an art project, which we called Artvertentie. In the January 2, 2010 issue of the newspaper’s Saturday magazine we had ads throughout the magazine showing our artworks where ads would normally be. My contribution, Steganographia 2.0, in an ASCII art version of the Bubble Foundry logo based upon an encoded version of caseclass.js.

My latest Scala-inspired Javascript library: pf.js

February 4th, 2010

What is a function? While many use the words ‘method’ and ‘function’ interchangeably, more mathematically-inclined programmers make a distinction between functions and methods based on between you are guaranteed a return value. Scala is one programming language that makes a distinction between functions and methods (though it’s actually pretty easy to jump between the two). Great, just like plenty of other languages, nothing special here. Javascript functions are funny beasts. On one hand, they seem to be simple subroutines which can be passed arguments and can enclose variables – pretty standard methods in other words. However, regardless of whether you include return statements in a function, something is actually returned, undefined being returned if you hadn’t specified a return value. Hmm, so they’re kind of like mathematical methods except there’s no way to know without looking at a function’s source code whether the author failed to return a value for the input you gave it or actually did return something, the undefined object.

However, Scala also provides a middle ground with PartialFunctions. PartialFunctions are cool because they are only defined for specific inputs and you can check this. In addition, because they are defined for specific inputs, you can combine several PartialFunctions to create a new PartialFunction which is defined for the union of all the original PartialFunctions’ inputs. Wouldn’t it be cool if Javascript did that too? Well, now it can with pf.js. Enjoy.

A Subtle Javascript Mistake

January 29th, 2010

I was just pounding my head against what turned out to be a simple Javascript misconception that I hope I can save people from: in tests for the existance of a key, not a value, in an object or array. So the following work:

  var myObj = {1: "one", 2: "two"}
  1 in myObj // -> true
  2 in myObj // -> true
  var myArr = [1, 2]
  0 in myArr // -> true
  1 in myArr // -> true

But you might be surprised at the following results (I was):

  2 in myArr // -> false

That is because myArr has values 1 and 2 at keys 0 and 1, respectively. To test the existence of a value in an array, you need to use indexOf. For instance:

  myArr.indexOf(1) // -> 0
  myArr.indexOf(2) // -> 1
  myArr.indexOf("doesn't exist") // -> -1

As you can see, -1 indicates the value does not exist in the array.

Introduction to Case Classes

December 11th, 2009

I’ve been recently chatting with the creator of match-js about how his library and caseclass.js might work together and I ended up written quite a bit about case classes. Enjoy.

Scala is a newish object-oriented/functional hybrid language that runs on the JVM. It actually takes a lot of concepts from Erlang, such as its actor library. One of its key features is pattern matching, which at the most basic can be used to test equality. With the match operator that means you have something like switch/case in most languages:

val myVal = 3;
myVal match {
  case 1 => "one"
  case 2 => "two"
  case 3 => "three"
  case _ => "other"
} // returns "three"

Case classes at a glance are just simple classes that be created without the new operator and automatic getters and setters:

case class Person(name, age)
val peter = Person("Peter Robinett", 25)
peter.age == 25 // -> true

However, you can do a lot more advanced stuff:

unknownVar match {
  case n: String => println("unknowVar is a String with value: " + n)
  case n: Int if n > 0 => println("unknownVar is an Int with a value greater than 0")
  case Person(name, 100) => println("unknownVar is a Person case class where property age is 100 and name is: " + name)
}

The last case example shows a cool use of a case class. Scala knows that when a case class instance is created in a case, you are testing for matching. So, it calls the class’s unapply method (any class can have one, case classes just have them automatically) to compare the instance based upon its properties. If the parameters match, the two instances are equal. However, you can also not specify properties to match. If you don’t care about them all you can just give a wildcard (the _ character) or not give any parameters (case Person or case person: Person) but if you’d like to extract the value, you can. This is what I did. By giving it the undefined variable ‘name’, it knows that I am not comparing the name properties but instead want to assign unknownVar.name to name (assuming that unknownVar is an instance of Person!), which is then within the scope of the subsequent code block.

So, that’s a lot of Scala. What do I have in Javascript? Right now you can match based upon object types:

CaseClass.create("Person", ["name", "age"]);
var peter = CaseClass.Person("Peter Robinett", 25);
peter.match(
  {
    caseTest: "Car",
    caseFunction: function() { return 0; }
  },
  {
    caseTest: CaseClass.Person,
    caseFunction function() { return 1; }
  }
); // -> return 1 – I'm not a Car!

Parameter definition and extraction also works, though it’s limited:

var undef;
peter.match(
  {
    caseTest: CaseClass.Person("Barack Obama", 47),
    caseFunction: function() { return 0; }
  },
  {
    caseTest: CaseClass.Person("Peter Robinett", undef),
    caseFunction function() { return age; }
  }
); // -> return 25

As you can see, matching works fine though I need pass an uninitialized variable to Person() to indicate that I want to extract the corresponding property in peter. Because I cannot discover the name of this variable, I just have to create variables based upon the property names. I’d like to improve this, but the consensus seems to be that it’s impossible with Javascript today (ironically, earlier versions could). Beyond extractors, I’d like to be able to do more complex matching (better support for different object types, conditional stations) and be able to extend native objects to support my matching. I also like that your Match() method can stand alone, letting it be used as a callback and meaning that it acts as a sort of partial function, which is very cool. In Scala partial functions are functions that needn’t always return a value:

def myPF(num: Int) = {
  case 1 => "one"
  case -1 => "negative one"
}

To be honest, I’m not sure if our libraries could or should work together but they’re similar enough in spirit that I thought it might be exploring how they can complement each other. I see you’re doing stuff with web workers. I’ve also tried to take Scala’s actors and use them to do something in Javascript. actor.js is the result and is a place where I think case classes would be great for message passing (this is a key use case for case classes in Scala). However, actors are both more complex and something that I have less experience with, so I haven’t put as much time into it as caseclass.js. However, the demo does work and is, in my opinion, pretty cool!

HTTP for Web Developers

December 10th, 2009

I’m often surprised how little web developers know about the HTTP protocol. I’m by no means an expert (feel free to correct what I write in the comments!) but I think I know a fair bit and I’ll outline the basics here.

Why does it matter? What if you’re a frontend developer doing AJAX calls? How do you know that the request succeeded? Do you check that the response text isn’t empty? The status code is 200? Neither is optimal, though luckily most good libraries will take care of figuring it out.

At its most basic HTTP is a request and a response. Messages are divided into a group of headers and a single body. Each request has a method, the most common of which are GET and POST. It will also indicate the host (e.g. http://www.google.com) and the document, which may simply be ‘/’ (the host root) or ‘*’ (a wildcard). There are a lot more headers that a HTTP request may include, but those are the bare minimum. Most requests will have an empty body, though POSTs and PUTs are normally specifically used to send content to the server in the request body. A request body will consist of key value pairs, with the text pairs URL encoded and binary data base64 encoded.

Response messages will at least consist of a status code header and there is usually a message body too. A HEAD request specifically indicates that the response should only include headers (they are treated as GET requests otherwise). Your standard response has a 200 status code header, some other headers describing when and how the request was served, and a body containing an HTML webpage.

As you may have noticed, HTTP methods are a set of verbs that let the client tell the server ‘give me this information’ or ‘process this data’. Since each verb means a specific thing, you actually get a very defined and powerful little language. As I see it, it is this observation (among other things!) that is REST. While they often aren’t, URLs are at their most powerful when they represent unique things. HTTP methods let us indicate what we want to do with them. So, GET /blog/ would indicate that we wanted to get the blog webpage, while POST /blog/ might indicate that we want to submit a new blog post. Likewise, GET /blog/2009/12/http-for-web-developers/ gets the post while PUT /blog/2009/12/http-for-web-developers/ sends an updated version of the post. As you see, we’re dealing with the same blog post, we’re just doing different things with it.

HTTP status codes are powerful things and in my opinion people often overlook the cool things you can do with the right request headers and response status codes. For instance, your average browser will often include headers in its requests indicating that it has requested the resource before (e.g. If-Modified-Since, If-None-Match) and, if based on those means of comparison the resource hasn’t changed, you can return a 304 response with an empty body and the client will use its cached version. Especially with binary data (images!) you can avoid lots of bandwidth usage, something especially useful for clients on slow (mobile!) connections.

One specific thing that I’d like to draw your attention to is requesting and serving feeds. Since a client may request it often looking for changes, you definitely want to be efficient about your responses. If-Modified-Since and If-None-Match are great but responses to the requests must be all or nothing: 200 or 304. If you’re returning a 200 code you must return the entire feed. But what if the user requested a feed of 500 items and only the last one was created since the If-Modified-Since date they sent? There is an unofficial but somewhat supported set of headers called A-IM and IM. The request includes the A-IM header with the value ‘feed’,  indicating that they understand the resouce is a feed and so can support partial responses based upon their request criteria. The server can return a 200 or 304 as normal but it can also take a middle ground: it can respond with a 226 status code and the IM header, again with the ‘feed’ value, and only include the changed feed items in its body. Major savings!

One thing I should add is that dates are always a major pain in HTTP headers. The correct format is ‘Fri, 01 Jan 1990 00:00:00 GMT’. In printf format that’s ‘%a, %d %b %Y %H:%M:%S %Z’.

Both Google and Yahoo have very good tools for monitoring HTTP requests. The Yahoo Developer Network has great articles on bandwidth and speeding up websites. Significant portions of this work at both companies have been due to Steve Souders. His books High Performance Web Sites and Even Faster Web Sites are great guides.

With all this I think you have more than enough to make strong, efficient HTTP clients and servers. Good luck!

HTML and XML Character Encoding Gotchas in Javascript

December 8th, 2009

Recently I was trying to execute the following Javascript with jQuery: $("#someid").append("<div>...&deg;C</div>");

I was going crazy because it worked (a degrees symbol – ° – was shown) on one page but not another, where nothing was displayed or returned by the append method. After much frustration I stumbled on a solution and I’m sharing it here to hopefully save others some time.

I was stumped but luckily I ended up reading the Wikipedia page on character encoding in HTML and learned that XML has a much smaller set of character entity references. In fact, there are only five: &amp; → &, &lt; → <, &gt; → >, &quot; → “, &apos; → ‘. Makes sense, since you should be using UTF-8.

As it turns out, my working example was an HTML page while the non-working one was XHTML. Because of the XHTML content-type declaration the parser (I’m not sure whether in jQuery or my browser) was choking on the invalid character entity reference and failing completely. So, problem solved, though I wish the single offending entity was dropped, not the whole string!

TLDapi

December 6th, 2009

I whipped TLDapi up in the last few days. If you’re a programmer and you want a way to check if an exotic top level domain a user sent you is real, use the API to check.