Letter to Obi-wan Kenobi

Dear Obi-wan,

Only you can save our community.

We are about 100k people living in a region blessed with sun and surrounded by the wonderful blue of the Mediterranean. Our community is not recognised by politicians and has no officially recognised borders. Our community has no name but we will call it La Gibranea for now. We live in the area from Europea Point in the South of Gibraltar to Santa Margarita in the North West. Including Campamento to the East but not as far as San Roque in the North. We are a single community. We live, love, work and die together but we are confused. Our community is divided and we need your leadership Obi-wan.

In La Gibranea we are even confused about what language to speak. For example, you will often hear “Spanglish”, an unwritten hybrid of Spanish and English. We are also confused about how to exchange goods for money; there are three currencies but where can we spend them? We are confused about everything: where to pay our taxes, buy our houses, send our children for education and our elderly for care. And we are very confused and frustrated about how to travel. Every morning we engage all conceivable forms of transport to carry ourselves across the hated border that divides us.

Floating ants

Like the senators in your Republic Obi-wan, the elected powers are too obsessed with themselves to see that we are adrift like the ants in this picture. Those elected are meant to help us but they cannot because each is strung to competing sovereigns. Obi-wan, you cannot understand our community when you are looking outside in. Viewed from the perspective of a nation state, La Gibranea does not make sense and this is the root of all our problems.

We need a new power to rule us and time to resolve our destiny. We need to create a new, transitional state with a mandate for a ten year rule. We need La Gibranea to be governed with a three-way sharing of power between Gibraltar, Spain and the UK. Each in equal measure.

During this time we need all the children born in La Gibranea to be granted dual nationality. We need a single currency. We need trade agreements to disincentivize the smugglers and proper recognition for our bi-lingualism. But most of all we need to abolish the border. We need to join the Schengen Agreement and establish freedom of movement through the introduction of an integrated transport system. These problems cannot be solved by the current leadership. Please Obi-wan, help bring balance to our community and recognise the state of La Gibranea.

Thank you.

Hello World in Hypermedia Style

After perhaps reading too much about Hypermedia (too much hype?) I started getting the itch to do something. This quick blog is a record of the steps I took towards a Hypermedia “hello world”. The main tool I used was a plug-in REST client from Mozilla but anything that lets you set headers and inspect responses will do the job.

For this exercise I choose three popular Hypermedia designs: HAL, Collection+JSON and Siren. Each design has a high-level of abstraction making it suitable for re-use and an active community of developers who are busily engaged with various implementations.

HAL

In the words of Mike Kelly, HAL’s creator :

HAL is a format you can use in your API that gives you a simple way of linking

See more of the HAL specification. HAL is available in either JSON or XML and has a registered media-type of

application/hal+json

Mike has documented the HAL design by providing an interactive HAL Browser. This is the most immediate way to say “hello world” in Hypermedia style. Firstly you use the HAL browser to create an account (hint: see the hints :-)). Next you should navigate to /users/:account and then use the NON-GET button to get into the following dialogue box.

Saying it
Saying Hello World

It’s okay if you type something other than “hello world” *yawn* but changing anything else might break something .. To verify your creation navigate to the latest posts and as-if-by-magic your entry will appear (fingers x).

The HAL browser is great way to quickly get a feel for what HAL is all about. Armed with this knowledge I wanted to take a step towards a HAL client that could be controlled independently, and ultimately deployed in another context. The HAL browser seemed tightly coupled to the server, so rather than unpick it, I dug around and found another server running on the HAL builder site. I made the following request using my trusty REST client.

GET http://gotohal.net/restbucks/api
Accept: application/hal+json

And in case you weren’t paying attention earlier, here’s a picture.

Image

The HAL builder dutifully obeyed my request (yes, the Accept header is necessary) and gave me the following response.

{
    "_links": {
        "self": {
            "href": "/restbucks/api"
        },
        "orders": {
            "href": "/restbucks/api/orders"
        }
    },
    "name": "RestBucks Store",
    "project": "restbucks",
    "version": "0.0.1"
}

Now if I was a robot, I would crack-on and apply some automation to the above. Here are our first clues about Hypermedia. Custom media-types and machine-based connections! There’s way more discussion about using HAL in the  forum.

Collection+JSON

Mike Amundsen is the author of Collection+JSON and says it is

a JSON-based read/write hypermedia-type designed to support management and querying of simple collections

And just like HAL, Collection+JSON has its very own registered media-type. To get a test response from Collection+JSON we’ll return to our REST client. As before set the client up for the target service and specify the media-type.

GET https://graphviz-ewf.rhcloud.com
Accept: application/vnd.collection+json

This Hypermedia server responds with the following.

{
  "collection": {
    "version": "1.0",
    "href": "http://graphviz-ewf.rhcloud.com:80/",
    "links": [
      {
        "href": "http://graphviz-ewf.rhcloud.com:80/",
        "rel": "home",
        "prompt": "Home API"
      },
      {
        "href": "http://graphviz-ewf.rhcloud.com:80/graph",
        "rel": "graphs",
        "prompt": "Home Graph"
      },
      {
        "href": "http://graphviz-ewf.rhcloud.com:80/register",
        "rel": "register",
        "prompt": "User Register"
      },
      {
        "href": "http://graphviz-ewf.rhcloud.com:80/login",
        "rel": "login",
        "prompt": "User Login"
      }
    ]
  }
}

Graphviz is open source graph visualization software. The Graphviz API was designed to allow new users to register, add graph definitions and retrieve those definitions in various representations (pdf, jpg, png, gif). From the example API requests that are available in the Graphviz documentation, this looks like a well-conceived implementation of Collection+JSON. As authentication is required to use the Graphviz service I decided to continue my tour and found a Collection+JSON Browser. The browser functions much like the HAL browser except that the Hypermedia server is de-coupled from the client. The example Employee data is open and this means we can hit the endpoint directly from .. yep that’s right our friend the REST client.

Employee data in C+J representation
The above example shows a response from the Employee test data. Like HAL, Collection+JSON has a strong developer community.

Siren

Let’s complete our excursion with the newest design reviewed here. Kevin Swiber describes Siren as:

a hypermedia specification for representing entities

And like HAL and Collection+JSON, Siren has a registered media-type.

application/vnd.siren+json

To make our “hello-world” request to Siren we’ll use a Siren Browser developed by Wurl. Incidentally it’s great to see commercial API providers, such as Wurl and Graphviz embracing Hypermedia designs (this is the future :-)). Let’s point this Hypermedia client at a test Hypermedia server that Kevin has running on heroku. As this service is read only (404, “Cannot POST /users”) we cannot use it to make a “hello world” but the action for creating a user seems clear enough from the response to the initial GET.

GET http://siren-alps.herokuapp.com/
accept:application/vnd.siren+json

Here is the response. It has been trimmed down to the essential bit.

{
    "actions": [
        {
            "name": "user-add",
            "href": "http://siren-alps.herokuapp.com/users",
            "method": "POST",
            "fields": [
                {
                    "name": "user",
                    "type": "text"
                },
                {
                    "name": "email",
                    "type": "text"
                },
                {
                    "name": "password",
                    "type": "password"
                }
            ]
        }
    ]
}

Like the others, the Siren gang also hang out and share stuff.

Final thoughts

Something that all of the three Hypermedia designs have achieved is that at no point was it necessary to Read-The-Fscking-Manual. This is a Very Good Thing for Developer Experience. This is another clue for our understanding of Hypermedia design. Self-discovery! As said earlier, each design also has a high-level of abstraction. The flexibility provided by this abstraction does however make design selection seem rather arbitrary. Which of the three Hypermedia designs are right for me?!? I hope this will be the subject of my next excursion. Thanks.

Playing scales on the guitar

As a double-bass player I learnt that big intervals are hard. My approach to playing scales was always about learning how to stretch. Here’s how I carried that idea into my guitar playing.

scales1_t

scales2_t

click to enlarge

API Design for large hierarchies

Introduction to the Problem

Some things that you may, or may not know about a Wolverine.

{
    "aliases": [
        "glutton",
        "skunk bear"
    ],
    "range": "620 km",
    "weight": "9-25 kg",
    "topSpeed": "48km/h",
    "lifestyle": "Solitary",
    "lifeSpan": "10-15 years",
    "conservationStatus": "Threatened",
    "habitat": "Mountainous regions and dense forest",
    "culture": "Michigan is known as the Wolverine state",
    "claimToFame": "In order to steal a kill a wolverine attacked a polar bear and clung to its throat until the bear suffocated."
}

Given that the the taxonomy of the living world contains around 8.7 million species. The challenges presented by this wolverine snippet are all about catalog and index management. The navigation of a large hierarchy is therefore something that API designers need to address when designing their interfaces. REST defines constraints that help us organise resources, but REST isn’t aligned to any particular data model. We therefore need to work out how best to manage hierarchical data ourselves. This post is based on experience from the design of a working system.

Limitations of the client

Rather than tackle the tree of life, I will use a tree of “things” to help model the API design. But firstly we need to understand two important limitations of our imaginary clients.

  1. The maximum payload our client can manage is 512 kilobytes, i.e. half-a-meg.
  2. The client is pretty dumb and will not be able to traverse the hierarchy without assistance.

Introducing a tree of things

The total number of things in our fictitious tree is 258. Things are nested down to a depth of three. Each thing contains exactly six other things. The following diagram shows a slice of the tree.

a diagram showing payload hierarchy

The things are named using the following convention.

A1		B10 .. B16		C100 .. C106
A2		B20 .. B26		C201 .. C206
... 
A6 		B60 .. B61 		C600 .. C601

In the tree of things the smallest payload that a node can return is 96 kilobytes. Doing the maths and the total size of the model works out at 4.03 megabytes (4,128 kb).

level kilobytes numOfItems
A 96 6
B 576 36
C 3,456 216
4,128 258

In the common scenario the client would access the API with the following request.

GET /things/A/B/C/204

And the server would respond as follows:

{
    "id": "C204",
    "items": [ 1, 2, 3, 4, 5, 6 ]
}

We are pretending that each item is 16 kilobytes and so we’re happy that the total 96 kilobytes is well within the required payload limit. This okay for the happy path but as-it-stands the design exposes a lot of loose endpoints. For example, what should the server do with the following request?

GET /things/

The server cannot respond with everything in A, plus all the B things together with their C descendants because the payload would be 4.03 mb, i.e. the full hierarchy. Perhaps a more reasonable response would be to remove the B and C descendants leaving just those in the range A1 to A6. Hmm, but now we’re starting to make assumptions about the request .. let’s play safe for now and just tell the client they asked for too much.

GET /things/ 				# 416 Requested Range Not Satisfiable

Using this approach I can complete my API design.

GET /things/A/B/C/204 		        # 200 Success 
GET /things/A 				# 416 Requested Range Not Satisfiable
GET /things/ 				# 416 Requested Range Not Satisfiable
GET /things/A/1/B/10		        # 200 Success
GET /things/A/3 			# 200 Success

At this stage the design is functional as it satisfies our minimum payload criteria. But it isn’t that easy to navigate. The top-level responses are blocked, and those responses that are returned are simply a flat list. It’s hard for my dumb client to know what to do with these. A response like this would be an improvement.

{
    "id": "A3",
    "items": [
        {
            "id": "B30",
            "items": [
                {
                    "id": "C300",
                    "items": [

	                    /* more items here */
                    ]
                },

                /* and more here too */
            ]
        }
    ]
}

Although this creates a payload problem (it weighs in at 688 kb) it shows promise because I can start to educate my client about the nature of the hierarchy.

Using depth to control the payload

To help the client get to know the tree of things without breaking the payload, I add the following parameters to my design.

GET /things/A/3/A/B

This meaning of the additional A/B parameter is to instruct the server to give me the descendants of B, as well as the list of A items that were discussed previously. Here’s the response.

{
    "id": "A3",
    "items": [
        {
            "id": ["B31", "B32", "B33", "B34", "B35", "B36"]
        }
    ]
}

Effectively I’ve filtered out C and thus got my payload down to 112 kilobytes. The client has a response that matched the request and thus enough information to start the descent into the hierarchy.

Using hypermedia to improve on the 416 method and help discovery

The new controls allow the client to control the depth of the nested response but there is still room for improvement. If the client initially goes for data that is out of the bounds of my payload limit, then the server must still return an error.

GET /things/A/3/A/B/C 				# 416 Requested Range Not Satisfiable

After receiving the 416 response they have to try again by trimming the depth back to A/B. But how can my dumb client figure this out from a 416 status code? This is where  HATEOS can help! As the server knows the payload limits it can construct compliant URLs and pass those onto the client. For example.

{
    "id": "A3",
    "links": {
        "next": "/things/A/3/A/B",
        "prev": null
    }
}

Using the links part of the response we can now return a 200 whenever the request has gone out-of-bounds. The links redirect the client towards the part of the hierarchy that can be reached from the current location. To achieve this the client has some simple logic to perform.

if (res.links) {
	callService(res.links);
} else if (res.items) {
	renderItems(res.items);
} else {
	// panic!
}

In summary, the features of the API design.

  • Allow the depth of the response to be specified in the request.
  • Return links rather than errors when the requested payload size is excessive.
  • The navigational links should be sensitive to the current location in the hierarchy.
  • The links communicate to the client the maximum depth that a resource can support when providing a response.

To finish off let’s briefly return to our wolverine. Assuming that we are able to discover the wolverine endpoint through navigation, we would end up with something like the following.

GET /species/vertebrates/carnivores/weasels/wolverine

{
    "id": "wolverine",
    "items": { /* snipped wolverine facts */ },
    "links": {
        "next": [
            "/species/vertebrates/carnivores/weasels/wolverine/luscos",
            "/species/vertebrates/carnivores/weasels/wolverine/gulo"
        ],
        "prev": "/species/vertebrates/carnivores/weasels"
    }
}

The wolverine item fits ours size requirement and so we get the payload. As it turns out there are two sub-species (American and European) we get some further navigation too. It would be fun to prove this out with a really big data set and see how well the model holds up. I hope this walk-through has illustrated some of the problems and solutions surrounding API design and large hierarchies.

Common URL Pattern

When architecting with domains, services and URLs this pattern seems to come round a lot, but I’ve never heard of it being given a formal name or description:

protocol. service. env. example. com / product

I will call it the Common URL Pattern. In a real-world scenario it would resolve to something like: http://www.test.example.com/support. Here is the same pattern again but this time expressed by throwing some UML around.

UML diagram showing each part embedded inside it's parent
Common Url Pattern

And finally by way of explanation, some comments on the parts.

protocol Underlying transport layer such as https, smtp
service Shared across a technology stack: www, mail, api, db
environment Instances of the stack with functional variation, e.g. test, dev
domain Identifier in the Internet namespace: a.b.c
product A website or application that handles end-user interaction, e.g. /blah

It seems like a fairly adaptable model that can be applied to a large number of websites and web-based applications.

Are WADLs yesterday’s news?

It has been four years since Sun’s Marc Hadley put forward his proposal for a Web Application Definition Language. Since the arrival of the WADL, we have seen the API Economy boom and an almost overwhelming swell towards REST and away from SOAP. Given that the WADL was intended to be the RESTful answer to SOAP’s WSDL you might have expected that the volume of WADLs would be rising in equal proportion. Somehow it hasn’t happened like that, so what’s going wrong?

The basic idea was that the WADL defined your REST interface in a way that machine generated code could consume it. After all, APIs are aimed at machines more so than humans. But a review of the clients that have actually committed code to the WADL specification suggests that outside of the Java community take-up hasn’t really happened, and even within Java it is fairly sporadic. To understand what’s going on here, let’s dig a little deeper into what a WADL currently offers.

UML diagram showing a WADL

From this UML summary of a WADL you should be able to see that a number of resources have been defined and attributed a URI, together with a method and HTTP verb. Within the resource definition are the details of the request/response cycle. For a full description try the Apigee WADL reference or the original submission to W3C .

Detractors of the WADL talk about true Hypermedia clients exploring the interface through the API responses. I think this is valid. But defining the contract between the client and the server is not the only purpose of a WADL. It can also be a general description of how an API is supposed to behave, even an API that hasn’t been built yet. Commissioning an API requires the establishment of a different kind of contract; a contract between a Business Owner and an API Designer. Using a WADL to express an API design maybe a side-effect of the original purpose, however this discussion is really more about using a WADL in the context of API design than it is about implementing a REST client.

Much of the original motivation for the WADL seems to have been around providing a simple and immediate alternative to the WSDL. The fact that it hasn’t been widely adopted suggests that the specification has gone too far the other way. Is it overly simplistic? If so, what is missing – what should a future WADL specification contain in order to overcome these limitations. Here is my WADL wishlist.

  • UML and Documentation
  • Inheritance
  • Interpolation
  • Managing Application State
  • Cross-reference
  • Non-functionals
  • JSON + node.js module

UML and Documentation
Most software design is done using UML. When it comes to applying the REST architectectural style there is surprisingly little around in the UML world. For example, the popular UML tool Enterprise Architect supports only WSDLs. This is awkward when attempting to visualise RESTful designs in an Enterprise context. An experimental attempt at bridging this gap is the eaWadl tool that imports a WADL into Enterprise Architect (disclaimer I wrote it).

In addition to using the WADL as part of the design process, another good use-case is documentation. For example, the Web Application Definition Language is used by Enunciate as a framework for human-readable content. The flexible <doc> tag really helps as it allows sample JSON or XML payloads to be embedded alongside the request and/or response. A common and effective ploy is to render the WADL documentation as HTML in response to an Accept Header of text/html (which a browser sends by default).

State
With a Home Document you don’t need a WADL to define the interface up-front, just grab the index and off-you-go. This is the logical extension of the Hypermedia approach. To explore this idea, let’s imagine that we have a REST interface to describe a game of chess. It should be straight-forward enough to imagine how each chess piece could be given a resource endpoint that would describe the various behaviours of each piece. For example a GET /pieces/king would provide a response that says move one square in any direction. Okay that’s fine. But what if we want to record a game of chess? This is a whole new tin of worms; that of managing the state of the game board. For example PUT /board/12/60/Qf2+. In it’s response the server should be helping the client to manage the application state by offering a link to /board/12/60. This is because only the server knows (with authority) that the game on board 12 is on the sixtieth move, that the White Queen has put the Black King in check and it is now Black’s turn.

In this scenario a WADL can be comfortably used to define the static reference data:  GET /pieces/king. But because the game state is (obviously) unknown at design time, a different approach is needed. The approach should involve making general statements about the Hypermedia syntax used by the API to maintain state. The request and response elements offered by the current WADL don’t constrain how the syntax should be defined. But I would go further and argue that it should. Elements such as <id>, <rel>, <class> and <href> have become so baked into Hypermedia designs that they deserve an exclusive place in our new generation WADL.

Cross-reference
A general issue with APIs is that of API discovery. How are you supposed to find out about the existence of an API in the first place? And if you are managing a group of related APIs, how does the discovery of one API help you find out about the other members of your API family. Google have tackled this problem through their discovery API. But what about everyone else? As the WADL already allows us to reference external Xml Schemas, why shouldn’t we be able to reference other WADLs?

<grammars>
<include wadl="http://example.com/xyz/doc.wadl"/>
</grammars>

With over 9,000 APIs currently registered on Programmable Web cross-referencing should help bring some organisation to the API space.

Inheritance
It’s generally polite to reply to a question in whatever language was used to ask the question. Consequently it’s annoying that each <request> and <response> declaration needs a separate <representation> tag. Although the WADL is hierarchical there is no sense of a child node that inherits from the parent. Wouldn’t it be easier to declare the base representation in the root <resources> tag, and use the child nodes to override any exceptions?

Non-functionals
Functionality is one thing, but if the security features of the API disallow access then it’s game over. If our new WADL has helped you discover an API then it seems logical that some declaration of the required authentication is made at the point at discovery. For example <authn> and <authz> tags might have values such as “oauth” or “x509” together with a <profile="developer.example.com/register"> statement for humans. Similarly a <status> parameter could broadcast downtime whenever an origin service becomes unavailable.

Error Handling
While you can get a long way with HTTP status codes many error responses are specific to a particular domain. A common strategy is therefore to return a <code>200</code> to show that the underlying HTTP communication was successful and then maintain a set of API specific error codes. The status attribute of the <response> element should be able to document the existence of the bespoke messages.

Interpolation
APIs are deployed in multiple environments. In reality the <resources> tag is likely to resolve to a number of hosts.

  • http://dev.example.com/api
  • https://example.com/api

Rather than relying on an external tool (such as Maven) it should be possible to abstract the hostname for use in multiple environments.

JSON + node.js module
With NodeJs currently the most second popular download on github, the march of JSON seems set to continue. WADLs are of course expressed using XML. Frameworks such as deployd illustrate the immediacy of using Node to define and create a REST service. With a little work, it’s possible to start this process from a WADL. With a bit more effort it would be possible to encapsulate the WADL import process using an Node module, similar to the work underway at RestDoc.

In summary, we should stick with the WADL not particularly because of it’s original intentation as a machine-readable definition of a service, but because the process of designing and developing APIs is lacking a reference language. The WADL just seems like a sensible place from which to embark.

The Street Performers of El Rastro

Had a great day in Madrid where I was amazed and entertained by some amazing street performers.

beautiful boleros
[youtube http://www.youtube.com/watch?v=lP05VQAD5mM&w=420&h=315]

energy and ancestry from simplicity
[youtube http://www.youtube.com/watch?v=06hUgF_k6UI&w=420&h=315]

keeping the old glasses humming
[youtube http://www.youtube.com/watch?v=xL8k6OLrywY&w=420&h=315]

when the kids are happy ..
[youtube http://www.youtube.com/watch?v=1DKCObc7nrE&w=420&h=315]

trad jazzers

IMG-20130602-00165

 

 

 

 

 

 

 

 

skin deep

mud statue

 

 

 

 

API Predictions


This is a fantasy view of the future inspired by talking with people at APIdays. It’s a personal view and best read as science-ficiton, but you never know, some of it may come true!

NOW

Predictions in this section seem fairly likely to happen if not happening already

  1. A few branded, skyscraper APIs will continue to be dominant and used by the vast majority of apps. The number of skyscapers will represent a tiny minority of the total APIs publically available.
  2. The dx (developer experience) movement will get into full swing and impact the API design space.
  3. Hypermedia will formalise itself as a series of Architectural Design Patterns. See [1], [2] and [3] for examples.
  4. HTML5 and websockets will spawn a new generation of real-time APIs

NEXT

These predictions are not happening yet, but reasonably likely to come into place

  1. Apigee will have been acquired.
  2. Twitter will win back their developer community.
  3. NodeJs applications such as deployd [4] will continue to drive down availability of XML representations. XML may disappear from the majority of public APIs, but will continue to live-on in bespoke partner and private APIs.
  4. ProgrammableWeb will become obsolete. Instead there will be a new generation of API discovery tools [5]
  5. There will be a significant increase in the number of stand-alone API providers such as Twilio. (Twilio is a stand-alone API provider because it doesn’t have a core product in the way that Google, Facebook and Twitter do.)
  6. An open-source and independent solution for API traffic management will become available, perhaps based on [6]. API clients will become more adept at self-managing load.
  7. APIs will provide text/html as their default content-type (because humans need to understand them before the machines can get started).
  8. There will be a single go-to place for API developers to social network (and it won’t be Facebook).
  9. API security will change radically in response to the general availability of personally available hardware/mobile tokenisers.
  10. Commercial monitoring tools will get-in on the API act. Their solutions will provide views of how a single request passes through the technology stack. The apiKey will provide the glue.

FUTURE

Predictions that will probably never happen, perhaps this is a wish-list ..

  1. W3C will publish an API standard that is largely driven by Hypermedia and the requirements of the API clients. Kin Lane [7] will be involved!
  2. Semantic Web will align with a new generation of media-types that arise from the rising popularity of the Hypermedia style. A standard representation for RDF and JSON will emerge, driven by a skyscaper API provider and (eventually) blessed by W3C.
  3. A famous legal battle will project the issue of API provenance into the media. Digital signature solutions will evolve and adapt themselves to the API economy.
  4. Single Page Applications (SPAs) will be the accepted alternative to traditional page-based web sites. SPAs will use registered media-types.
  5. The Great-Twitter-Betrayal will give rise to a credit-rating style system. This system will make promises to Venture Capitalists and make them feel better about manging risk in the API economy. Everyone else will ignore it.
  6. Software tools for API management (see [8] and [9]) are provided out-of-the-box by the majority of cloud vendors.
  7. Delays in the transition from HTTP 1.1 to HTTP 2.0 will engender the adoption of a new protocol that will be optimised for API traffic and messaging.
  8. A single supplier will capture the market for API developer portals

References


[1] http://stateless.co/hal_specification.html

[2] https://github.com/kevinswiber/siren

[3] http://amundsen.com/media-types/collection/

[4] http://github.com/deployd

[5] http://www.apihub.com/

[6] http://loader.io/

[7] http://kinlane.github.io/talks

[8] http://webshell.io/

[9] https://www.runscope.com

Reading List

A fairly random collection of geeky things that I’m working my way through

Things I have read recently

Using Perl to munge an X.509 certificate

I wanted to do something fairly quick with X.509 certificates. My scribbled requirements were: connect to a server and grab the public certificate and then expose the X.509 fields programmatically. As an old Perl hacker-at-heart I headed over to CPAN and grabbed a copy of Net::SSLeay. It seems to be the de facto module for this work and is recommended by O’Reilly’s excellent book on OpenSSL that I happened to be reading. Furthermore there was a yum installer for CentOs. Happy days!

After a bit of fiddling I’d hacked the test ssl client that comes with the bundle to do the job. There were a couple of gotchas. Firstly I was getting some core dumps. I never really got to the bottom of these. I thought this might be that the yum installer didn’t do a good job (because later installs required the openssl-dev package, but yum didn’t complain). Anyway I got around the core dumps by removing the nested calls from my code. Perhaps those warnings about OpenSSL not playing nicely with threads have some weight after all? Anyway, by this time my code was starting to get a bit messy and I had hit another problem.

This isn’t to do with Net::SSLeay directly but the fact that Socket requires you to run as root. I can see this made sense at some earlier point in history when being root was a Big Thing. But now everyone has a couple of VMs kicking around and what’s the point in making user accounts? It was a bit annoying because I actually couldn’t run as root in my target environment. Seems a lot of kafuffle when all I wanted to do was make a standard https connection to places like https://www.google.com. Not so extra-ordinary! The only work around I’ve found so far is to pipe through openssl s_client and this works fine, if a bit 1970s. Please comment with any better suggestions.

The next bit was to start grappling with the certificate themselves. I did consider Sam Vilain’s OO Net::SSLeay as it looks like an improved interface and this was my main gripe with Net::SSLeay. (I should say that I got a nice reply from the author of Net::SSLeay). But I was worried that it was still Net::SSLeay underneath and by then Dan Sully’s Crypt::OpenSSL::X509 had caught my eye. It’s a really nice API. Everything just seems to be where you’d expect it. So I got stuck in and all was well for a while. Turns out this module has problems too. Mainly it stops dead on certificates that it doesn’t understand. For example https://google.co.in/ has a stonker of a certificate with several hundred X509 v3 extensions. Dan’s module just fails to cope, no warning and no nice reply from Dan. The other problem is that it isn’t finished.

Drawing of a multi-headed Hydra.

At this point I was thinking about starting again in Java ..

The other problem with Dan’s module is that the ASN.1 notation that underpins the X.509 standard is a horrible multi-layered thing. When Dan’s module get’s past the first layer it starts throwing-up gobble-de-gook. You see, not everything is a string in the world of X.509. I mean this:

X509v3 Subject Key Identifier
53:32:D1:B3:CF:7F:FA:E0:F1:A0:5D:85:4E:92:D2:9E:45:1D:B4:4F

became that.

X509v3 Subject Key Identifier
..S2........].N...E..O

There is an X.509 module as part of Crypt::SSLeay but it’s deprecated and the module is only maintained for protocol support of the amazing LWP (it puts the s in https). This is a shame because had I been able to grab the certificate from an LWP session then two birds might have been left to die. I also found this sslclient which looked perfect. But it failed the install.

Right now I’m working with Crypt::X509 by Alexander Jung. It seems to be a bit obsessed with LDAP and wants to consume your certificates in the binary DER format. I guess this is an LDAP thing because everything else I’ve seen is PEM. But it is dependent on Convert::ASN1 so I’m hopeful that it knows what to do with all those ASN.1 layers that are hidden in the guts of a certificate. I’ll let you know how it goes. Here’s that stonker.


Certificate:
Data:
Version: 3 (0x2)
Serial Number:
47:4f:4f:50:01:70
Signature Algorithm: sha1WithRSAEncryption
Issuer: C=US, O=Google Inc, CN=Google Internet Authority
Validity
Not Before: Aug 16 11:37:16 2012 GMT
Not After : Jun 7 19:43:27 2013 GMT
Subject: C=US, ST=California, O=Google Inc, CN=google.com
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (1024 bit)
Modulus:
00:b5:4e:3d:07:0f:f0:57:3a:aa:68:57:9d:1a:9b:
1b:dc:55:2f:aa:28:02:00:35:3a:3a:3b:17:00:2e:
ac:17:2d:49:f5:b2:f7:4f:d7:93:6c:84:ed:9a:d1:
a0:e0:81:64:7b:4f:67:78:bf:52:ba:d3:4c:d1:c2:
7e:67:16:fd:7f:62:f7:88:86:1b:ea:1c:38:2a:e8:
58:d2:04:11:45:67:50:73:30:49:64:6a:79:de:e3:
af:4d:8b:37:1f:ca:ca:13:dd:9e:76:7e:03:54:bf:
50:c0:bb:6f:d9:4d:34:8b:66:7e:fd:b3:43:21:c7:
4c:dc:86:ae:c4:53:b0:fa:db
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
FD:DE:A8:2D:76:DB:A4:74:C1:62:D9:D3:4B:AD:AB:8B:DD:89:7D:78
X509v3 Authority Key Identifier:
keyid:BF:C0:30:EB:F5:43:11:3E:67:BA:9E:91:FB:FC:6A:DA:E3:6B:12:24

X509v3 CRL Distribution Points:

Full Name:
URI:http://www.gstatic.com/GoogleInternetAuthority/GoogleInternetAuthority.crl

Authority Information Access:
CA Issuers - URI:http://www.gstatic.com/GoogleInternetAuthority/GoogleInternetAuthority.crt

X509v3 Subject Alternative Name:
DNS:google.com, DNS:*.google.com, DNS:*.youtube.com, DNS:youtube.com, DNS:*.youtube-nocookie.com, DNS:youtu.be, DNS:*.ytimg.com, DNS:*.android.com, DNS:android.com, DNS:*.googlecommerce.com, DNS:googlecommerce.com, DNS:*.url.google.com, DNS:*.urchin.com, DNS:urchin.com, DNS:*.google-analytics.com, DNS:google-analytics.com, DNS:*.cloud.google.com, DNS:goo.gl, DNS:g.co, DNS:*.gstatic.com, DNS:*.google.ac, DNS:*.google.ad, DNS:*.google.ae, DNS:*.google.af, DNS:*.google.ag, DNS:*.google.am, DNS:*.google.as, DNS:*.google.at, DNS:*.google.az, DNS:*.google.ba, DNS:*.google.be, DNS:*.google.bf, DNS:*.google.bg, DNS:*.google.bi, DNS:*.google.bj, DNS:*.google.bs, DNS:*.google.by, DNS:*.google.ca, DNS:*.google.cat, DNS:*.google.cc, DNS:*.google.cd, DNS:*.google.cf, DNS:*.google.cg, DNS:*.google.ch, DNS:*.google.ci, DNS:*.google.cl, DNS:*.google.cm, DNS:*.google.cn, DNS:*.google.co.ao, DNS:*.google.co.bw, DNS:*.google.co.ck, DNS:*.google.co.cr, DNS:*.google.co.hu, DNS:*.google.co.id, DNS:*.google.co.il, DNS:*.google.co.im, DNS:*.google.co.in, DNS:*.google.co.je, DNS:*.google.co.jp, DNS:*.google.co.ke, DNS:*.google.co.kr, DNS:*.google.co.ls, DNS:*.google.co.ma, DNS:*.google.co.mz, DNS:*.google.co.nz, DNS:*.google.co.th, DNS:*.google.co.tz, DNS:*.google.co.ug, DNS:*.google.co.uk, DNS:*.google.co.uz, DNS:*.google.co.ve, DNS:*.google.co.vi, DNS:*.google.co.za, DNS:*.google.co.zm, DNS:*.google.co.zw, DNS:*.google.com.af, DNS:*.google.com.ag, DNS:*.google.com.ai, DNS:*.google.com.ar, DNS:*.google.com.au, DNS:*.google.com.bd, DNS:*.google.com.bh, DNS:*.google.com.bn, DNS:*.google.com.bo, DNS:*.google.com.br, DNS:*.google.com.by, DNS:*.google.com.bz, DNS:*.google.com.cn, DNS:*.google.com.co, DNS:*.google.com.cu, DNS:*.google.com.cy, DNS:*.google.com.do, DNS:*.google.com.ec, DNS:*.google.com.eg, DNS:*.google.com.et, DNS:*.google.com.fj, DNS:*.google.com.ge, DNS:*.google.com.gh, DNS:*.google.com.gi, DNS:*.google.com.gr, DNS:*.google.com.gt, DNS:*.google.com.hk, DNS:*.google.com.iq, DNS:*.google.com.jm, DNS:*.google.com.jo, DNS:*.google.com.kh, DNS:*.google.com.kw, DNS:*.google.com.lb, DNS:*.google.com.ly, DNS:*.google.com.mt, DNS:*.google.com.mx, DNS:*.google.com.my, DNS:*.google.com.na, DNS:*.google.com.nf, DNS:*.google.com.ng, DNS:*.google.com.ni, DNS:*.google.com.np, DNS:*.google.com.nr, DNS:*.google.com.om, DNS:*.google.com.pa, DNS:*.google.com.pe, DNS:*.google.com.ph, DNS:*.google.com.pk, DNS:*.google.com.pl, DNS:*.google.com.pr, DNS:*.google.com.py, DNS:*.google.com.qa, DNS:*.google.com.ru, DNS:*.google.com.sa, DNS:*.google.com.sb, DNS:*.google.com.sg, DNS:*.google.com.sl, DNS:*.google.com.sv, DNS:*.google.com.tj, DNS:*.google.com.tn, DNS:*.google.com.tr, DNS:*.google.com.tw, DNS:*.google.com.ua, DNS:*.google.com.uy, DNS:*.google.com.vc, DNS:*.google.com.ve, DNS:*.google.com.vn, DNS:*.google.cv, DNS:*.google.cz, DNS:*.google.de, DNS:*.google.dj, DNS:*.google.dk, DNS:*.google.dm, DNS:*.google.dz, DNS:*.google.ee, DNS:*.google.es, DNS:*.google.fi, DNS:*.google.fm, DNS:*.google.fr, DNS:*.google.ga, DNS:*.google.ge, DNS:*.google.gg, DNS:*.google.gl, DNS:*.google.gm, DNS:*.google.gp, DNS:*.google.gr, DNS:*.google.gy, DNS:*.google.hk, DNS:*.google.hn, DNS:*.google.hr, DNS:*.google.ht, DNS:*.google.hu, DNS:*.google.ie, DNS:*.google.im, DNS:*.google.info, DNS:*.google.iq, DNS:*.google.is, DNS:*.google.it, DNS:*.google.it.ao, DNS:*.google.je, DNS:*.google.jo, DNS:*.google.jobs, DNS:*.google.jp, DNS:*.google.kg, DNS:*.google.ki, DNS:*.google.kz, DNS:*.google.la, DNS:*.google.li, DNS:*.google.lk, DNS:*.google.lt, DNS:*.google.lu, DNS:*.google.lv, DNS:*.google.md, DNS:*.google.me, DNS:*.google.mg, DNS:*.google.mk, DNS:*.google.ml, DNS:*.google.mn, DNS:*.google.ms, DNS:*.google.mu, DNS:*.google.mv, DNS:*.google.mw, DNS:*.google.ne, DNS:*.google.ne.jp, DNS:*.google.net, DNS:*.google.nl, DNS:*.google.no, DNS:*.google.nr, DNS:*.google.nu, DNS:*.google.off.ai, DNS:*.google.pk, DNS:*.google.pl, DNS:*.google.pn, DNS:*.google.ps, DNS:*.google.pt, DNS:*.google.ro, DNS:*.google.rs, DNS:*.google.ru, DNS:*.google.rw, DNS:*.google.sc, DNS:*.google.se, DNS:*.google.sh, DNS:*.google.si, DNS:*.google.sk, DNS:*.google.sm, DNS:*.google.sn, DNS:*.google.so, DNS:*.google.st, DNS:*.google.td, DNS:*.google.tg, DNS:*.google.tk, DNS:*.google.tl, DNS:*.google.tm, DNS:*.google.tn, DNS:*.google.to, DNS:*.google.tp, DNS:*.google.tt, DNS:*.google.us, DNS:*.google.uz, DNS:*.google.vg, DNS:*.google.vu, DNS:*.google.ws, DNS:google.ac, DNS:google.ad, DNS:google.ae, DNS:google.af, DNS:google.ag, DNS:google.am, DNS:google.as, DNS:google.at, DNS:google.az, DNS:google.ba, DNS:google.be, DNS:google.bf, DNS:google.bg, DNS:google.bi, DNS:google.bj, DNS:google.bs, DNS:google.by, DNS:google.ca, DNS:google.cat, DNS:google.cc, DNS:google.cd, DNS:google.cf, DNS:google.cg, DNS:google.ch, DNS:google.ci, DNS:google.cl, DNS:google.cm, DNS:google.cn, DNS:google.co.ao, DNS:google.co.bw, DNS:google.co.ck, DNS:google.co.cr, DNS:google.co.hu, DNS:google.co.id, DNS:google.co.il, DNS:google.co.im, DNS:google.co.in, DNS:google.co.je, DNS:google.co.jp, DNS:google.co.ke, DNS:google.co.kr, DNS:google.co.ls, DNS:google.co.ma, DNS:google.co.mz, DNS:google.co.nz, DNS:google.co.th, DNS:google.co.tz, DNS:google.co.ug, DNS:google.co.uk, DNS:google.co.uz, DNS:google.co.ve, DNS:google.co.vi, DNS:google.co.za, DNS:google.co.zm, DNS:google.co.zw, DNS:google.com.af, DNS:google.com.ag, DNS:google.com.ai, DNS:google.com.ar, DNS:google.com.au, DNS:google.com.bd, DNS:google.com.bh, DNS:google.com.bn, DNS:google.com.bo, DNS:google.com.br, DNS:google.com.by, DNS:google.com.bz, DNS:google.com.cn, DNS:google.com.co, DNS:google.com.cu, DNS:google.com.cy, DNS:google.com.do, DNS:google.com.ec, DNS:google.com.eg, DNS:google.com.et, DNS:google.com.fj, DNS:google.com.ge, DNS:google.com.gh, DNS:google.com.gi, DNS:google.com.gr, DNS:google.com.gt, DNS:google.com.hk, DNS:google.com.iq, DNS:google.com.jm, DNS:google.com.jo, DNS:google.com.kh, DNS:google.com.kw, DNS:google.com.lb, DNS:google.com.ly, DNS:google.com.mt, DNS:google.com.mx, DNS:google.com.my, DNS:google.com.na, DNS:google.com.nf, DNS:google.com.ng, DNS:google.com.ni, DNS:google.com.np, DNS:google.com.nr, DNS:google.com.om, DNS:google.com.pa, DNS:google.com.pe, DNS:google.com.ph, DNS:google.com.pk, DNS:google.com.pl, DNS:google.com.pr, DNS:google.com.py, DNS:google.com.qa, DNS:google.com.ru, DNS:google.com.sa, DNS:google.com.sb, DNS:google.com.sg, DNS:google.com.sl, DNS:google.com.sv, DNS:google.com.tj, DNS:google.com.tn, DNS:google.com.tr, DNS:google.com.tw, DNS:google.com.ua, DNS:google.com.uy, DNS:google.com.vc, DNS:google.com.ve, DNS:google.com.vn, DNS:google.cv, DNS:google.cz, DNS:google.de, DNS:google.dj, DNS:google.dk, DNS:google.dm, DNS:google.dz, DNS:google.ee, DNS:google.es, DNS:google.fi, DNS:google.fm, DNS:google.fr, DNS:google.ga, DNS:google.ge, DNS:google.gg, DNS:google.gl, DNS:google.gm, DNS:google.gp, DNS:google.gr, DNS:google.gy, DNS:google.hk, DNS:google.hn, DNS:google.hr, DNS:google.ht, DNS:google.hu, DNS:google.ie, DNS:google.im, DNS:google.info, DNS:google.iq, DNS:google.is, DNS:google.it, DNS:google.it.ao, DNS:google.je, DNS:google.jo, DNS:google.jobs, DNS:google.jp, DNS:google.kg, DNS:google.ki, DNS:google.kz, DNS:google.la, DNS:google.li, DNS:google.lk, DNS:google.lt, DNS:google.lu, DNS:google.lv, DNS:google.md, DNS:google.me, DNS:google.mg, DNS:google.mk, DNS:google.ml, DNS:google.mn, DNS:google.ms, DNS:google.mu, DNS:google.mv, DNS:google.mw, DNS:google.ne, DNS:google.ne.jp, DNS:google.net, DNS:google.nl, DNS:google.no, DNS:google.nr, DNS:google.nu, DNS:google.off.ai, DNS:google.pk, DNS:google.pl, DNS:google.pn, DNS:google.ps, DNS:google.pt, DNS:google.ro, DNS:google.rs, DNS:google.ru, DNS:google.rw, DNS:google.sc, DNS:google.se, DNS:google.sh, DNS:google.si, DNS:google.sk, DNS:google.sm, DNS:google.sn, DNS:google.so, DNS:google.st, DNS:google.td, DNS:google.tg, DNS:google.tk, DNS:google.tl, DNS:google.tm, DNS:google.tn, DNS:google.to, DNS:google.tp, DNS:google.tt, DNS:google.us, DNS:google.uz, DNS:google.vg, DNS:google.vu, DNS:google.ws, DNS:*.googleapis.cn
Signature Algorithm: sha1WithRSAEncryption
c0:a8:27:9e:20:b8:c5:de:9a:32:0a:4f:e3:8b:9b:10:8b:06:
73:31:ac:91:75:68:dd:d5:1a:eb:23:86:77:2e:78:49:99:9b:
84:4e:40:0b:50:08:c5:81:21:f2:a6:55:a1:40:27:2f:5f:93:
c5:0d:0a:51:ff:49:29:4e:2d:80:c6:5e:a5:bb:ca:df:cb:39:
29:0c:ca:28:18:a7:1c:c3:43:ff:2e:22:a8:df:41:91:7c:c4:
ba:7c:63:ce:d8:71:46:73:d7:6b:d3:12:a1:93:c0:8d:44:ce:
25:da:c1:53:05:76:d7:c8:05:c3:2f:62:95:07:36:a2:04:ee:
b4:15