Thursday, April 23, 2009

An introduction to the GSS API

Application Programming Interface or API design is one of my favorite topics in programming, probably because it is both a science and an art. It is a science because there are widely accepted principles about how to design an API, but at the same time applying these principles within the constraints of a given programming language, requires the finesse of an experienced practicioner. Therefore it is with great pleasure that I'll try to explain the ins and outs of the GSS REST-like API, as promised before. As I've already mentioned, GSS is both the name of the source code project in Google Code, as well as the GRNET-sponsored service for the Greek research and academic network (although, it's official name after leaving the beta stage will be Pithos). Since anyone can use the open-source code to setup a GSS service, in this post I'll use generic examples, so anyone writing a client for the GRNET service should modify them accordingly.

When developing an application in a particular programming language, we are used to thinking about the APIs presented to us by the various libraries, which are invariably specified in that same language. For instance, for communicating with an HTTP server from a Java program we might use the HttpClient library API. This library presents a set of Java classes, interfaces and methods for interacting with web servers. These classes hide the underlying complexity of making the low-level HTTP protocol operations, allowing our mental process to remain constantly in a Java world. We could however interact with a web server without such a library, opting to implement the HTTP protocol interactions ourselves instead. Unfortunately, there is no such higher-level library for GSS yet, wrapping the low-level HTTP communications. Therefore this post will present a low-level API, in the sense that one has to make direct HTTP calls in his chosen programming language. The good news is that the following discussion is useful for programmers with any background, since there is support for the ubiquitus HTTP protocol in every modern programming language.

A RESTful API models its entities as resources. Resources are identified by Uniform Resource Identifiers, or URIs. There are four kinds of resources in GSS: files, folders, users & groups. These resources have a number of properties that contain various attributes. The API models these entities and their properties in the JSON format. There is also a fifth entity that is not modeled as a resource, but is important enough to warrant special mention: permissions.

Users · Users are the entities that represent the actual users of the system. They are used to login to the service and separate namespaces of files and folders. User entities have attributes like full name, e-mail, username, authentication token, creation/modification times, groups, etc. The URI of a user with username paul would be:

http://host.domain/gss/rest/paul/
The JSON representation of this user would be something like this:
{
"name": "Paul Smith",
"username": "paul",
"email": "paul@gmail.com",
"files": "http://hostname.domain/gss/rest/paul/files",
"trash": "http://hostname.domain/gss/rest/paul/trash",
"shared": "http://hostname.domain/gss/rest/paul/shared",
"others": "http://hostname.domain/gss/rest/paul/others",
"tags": "http://hostname.domain/gss/rest/paul/tags",
"groups": "http://hostname.domain/gss/rest/paul/groups",
"creationDate": 1223372769275,
"modificationDate": 1223372769275,
"quota": {
"totalFiles": 7,
"totalBytes": 429330,
"bytesRemaining": 10736988910
}
}


Groups · Groups are entities used to organize users for easier sharing of files and folders among peers. They can be used to facilitate sharing files to multiple users at once. Groups belong to the user who created them and cannot be shared. The URI of a group named work created by the user with username paul would be:
http://host.domain/gss/rest/paul/groups/work
The JSON representation of this group would be something like this:

[
"http://hostname.domain/gss/rest/paul/groups/work/tom",
"http://hostname.domain/gss/rest/paul/groups/work/jim",
"http://hostname.domain/gss/rest/paul/groups/work/mary"
]


Files · Files are the most basic resources in GSS. They represent actual operating system files from the client's computer that have been augmented with extra metadata for storage, retrieval and sharing purposes. Familiar metadata from modern file systems are also maintained in GSS, like file name, creation/modification times, creator, modifier, tags, permissions, etc. Furthermore, files can be versioned in GSS. Updating versioned files retains the previous versions, while updating an unversioned file replaces irrevocably the old file contents. The URI of a file named doc.txt located in the root folder of the user with username paul would be:
http://host.domain/gss/rest/paul/files/doc.txt
The JSON representation of the metadata in this file would be something like this:

{
"name": "doc.txt",
"creationDate": 1232449958563,
"createdBy": "paul",
"readForAll": true,
"modifiedBy": "paul",
"owner": "paul",
"modificationDate": 1232449944444,
"deleted": false,
"versioned": true,
"version": 1,
"size": 802,
"content": "text/plain",
"uri": "http://hostname.domain/gss/rest/paul/files/doc.txt",
"folder": {
"uri": "http://hostname/gss/rest/aaitest@uth.gr/files/",
"name": "Paul Smith"
},
"path": "/",
"tags": [
"work",
"personal"
],
"permissions": [
{
"modifyACL": true,
"write": true,
"read": true,
"user": "paul"
},
{
"modifyACL": false,
"write": true,
"read": true,
"group": "work"
}
]
}


Folders · Folders are resources that are used for grouping files. They represent the file system concept of folders or directories and can be used to mirror a client's computer file system on GSS. Familiar metadata from modern file systems are also maintained in GSS, like folder name, creation/modification times, creator, modifier, permissions, etc. The URI of a folder named documents located in the root folder of the user with username paul would be:
http://host.domain/gss/rest/paul/files/documents
The JSON representation of this folder would be something like this:

{
"name": "documents",
"owner": "paul",
"deleted": false,
"createdBy": "paul",
"creationDate": 1223372795825,
"modifiedBy": "paul",
"modificationDate": 1223372795825,
"parent": {
"uri": "http://hostname.domain/gss/rest/paul/files/",
"name": "Paul Smith"
},
"files": [
{
"name": "notes.txt",
"owner": "paul",
"creationDate": 1233758218866,
"deleted":false,
"size":4567,
"content": "text/plain",
"version": 1,
"uri": "http://hostname.domain/gss/rest/paul/files/documents/notes.txt",
"folder": {
"uri": "http://hostname.domain/gss/rest/paul/files/documents/",
"name": "documents"
},
"path": "/documents/"
}
],
"folders": [],
"permissions": [
{
"modifyACL": true,
"write": true,
"read": true,
"user": "paul"
},
{
"modifyACL": false,
"write": true,
"read": true,
"group": "work"
}
]
}

Working with these resources is accomplished by sending HTTP protocol requests to the resource URI with GET, HEAD, DELETE, POST, PUT methods. GET requests retrieve the resource representation, either the file contents, or the JSON representations for the resources specified above. HEAD requests for files return just the metadata of the file and DELETE requests remove the resource from the system. PUT requests upload files to the system from the client, while POST requests perform various modifications to the resources, like renaming, moving, copying, moving files to the trash, restoring them from the trash, creating folders and more. The operations are numerous and I hope to cover them in more detail in a future post.

One important aspect of every RESTful API is the use of URIs to allow the client to maintain a stateful conversation. For example, fetching the user URI would provide the files URI for fetching the available files and folders. Fetching the files URI would in turn return the URIs for the particular files and folders contained in the root folder (along with other folder properties). Returning to the parent of the current folder would entail following the URI contained in the parent property. This mechanism removes the state handling from the server and puts the burden on the client, providing excellent scalability for the service. Furthermore, since the URIs are treated opaquely by the client, the API allows client reuse across server deployments. A client can target multiple GSS services, as long as they speak the same RESTful API. Moreover, links from service A can refer to resources in service B without a problem (in the same authentication domain, e.g. the same Shibboleth federation). This is the same as using a single web browser to communicate with multiple web servers, by following links among them.

1 comment:

patriamacarthur said...
This comment has been removed by a blog administrator.
Creative Commons License Unless otherwise expressly stated, all original material in this weblog is licensed under a Creative Commons Attribution 3.0 License.