Multitenancy in Google AppEngine (GAE)

Luis AtencioDecember 6th, 2011Last Updated: October 21st, 2012

0 48 4 minutes read

Multitenancy is a topic that has been discussed for many years, and there are many excellent references that readily available, so I will just present a brief introduction.

Multitenancy is a software architecture where a single instance of the software runs on a server, serving multiple client organizations (tenants). With a multitenant architecture, an application can be designed to virtually partition its data and configuration (business logic), and each client organization works with a customized virtual application instance.

It suits SaaS (Software as a Service) cloud computing very well; however, they can be very complex to implement. The architect must be aware of security, access control, etc.

Multitenancy can exist in several different flavors:

Multitenancy in Deployment

Fully isolated business logic (dedicated server customized business process)
Virtualized Application Servers (dedicated application server, single VM per app server)
Shared virtual servers (dedicated application server on shared VM)
Shared application servers (threads and sessions)

This spectrum of different installations can be seen here:

Multitenancy and Data

Dedicated physical server (DB resides in isolated physical hosts)
Shard virtualized host (separate DBs on virtual machines)
Database on shared host (separate DB on same physical host)
Dedicated schema within shared databases (same DB, dedicated schema/table)
Shared tables (same DB and schema, segregated by keys – rows)

Before jumping into the APIs, it is important to understand how Google’s internal data storage solution work. Introducing Google’s BigTable technology:

It is a storage solution for Google’s own applications such as Search, Google Analytics, gMail, AppEngine, etc

BigTable is NOT:

A database
A horizontally sharded data
A distributed hash table

It IS: a sparse, distributed, persistent multidimensional sorted map. In basic terms, it is a hash of hashes (map of maps, or a dict of dicts). AppEngine data is in one “table” distributed across multiple computers. Every entity has a Key by which it is uniquely identified (Parent + Child + ID), but there is also metadata that tells which GAE application (appId) an Entity belongs to.

From the graph above, BigTable distributes its data in a format called tablets, which are basically slices of the data. These tablets live on different servers in the cloud. To index into a specific record (record and entity mean pretty much the same thing) you use a 64KB string, called a Key. This key has information about the specific row and column value you want to read from. It also contains a timestamp to allow for multiple versions of your data to be stored. In addition, records for a specific entity group are located contiguously. This facilitates scanning for records.

Now we can dive into how Google implements Multitenancy.

Implemented in release 1.3.6 of App Engine, the Namespace API (see resources) is designed to be very customizable, with hooks into your code that you can control, so you can set up multi-tenancy tailored to your application’s needs.

The API works with all of the relevant App Engine APIs (Datastore, Memcache, Blobstore, and Task Queues).

In GAE terms,

namespace == tenant

At the storage level of datastore, a namespace is just like an app-id. Each namespace essentially looks to the datastore as another view into the application’s data. Hence, queries cannot span namespaces (at least for now) and key ranges are different per namespace.

Once an entity is created, it’s namespace does not change, so doing a

namespace_manager.set(…)

will have no effect on its key.

Similarly, once a query is created, its namespace is set. Same with
memcache_service()
and all other GAE APIS. Hence it’s important to know which objects have which namespaces.

In my mind, since all of GAE user’s data lives in BigTable, it helps to visualize a GAE Key object as:

Application ID | Ancestor Keys | Kind Name | Key Name or ID

All these values provide an address to locate your application’s data. Similarly, you can imagine the multitenant key as:

Application ID | Namespace| Ancestor Keys | Kind Name | Key Name or ID

Now let’s briefly discuss the API (Python):

Function Name	Arguments	API
get_namespace	None	Returns the current namespace, or returns an empty string if the namespace is unset.
set_namespace	namespace: A value of None unsets the default namespace value. Otherwise, ([0-9A-Za-z._-]{0,100})	Sets the namespace for the current HTTP request
validate_namespace	value: string containing the namespace being evaluated. Raises the BadValueError if not ([0-9A-Za-z._-]{0,100}). exception=BadValueError	Raises the BadValueError exception if the namespace string is not valid.

Function Name

Arguments

API

get_namespace

None

Returns the current namespace, or returns an empty string if the namespace is unset.

set_namespace

namespace: A value of None unsets the default namespace value. Otherwise,

([0-9A-Za-z._-]{0,100})

Sets the namespace for the current HTTP request

validate_namespace

value: string containing the namespace being evaluated. Raises the BadValueError if not ([0-9A-Za-z._-]{0,100}). exception=BadValueError

Raises the BadValueError exception if the namespace string is not valid.

Here is a quick example:

Datastore Example

tid = getTenant()

namespace = namespace_manager.get_namespace()

try:
         namespace_manager.set_namespace('tenant-' +    str(tid))
 
         # Any datastore operations done here
         user = User('Luis', 'Atencio')
         user.put()

finally:

        # Restore the saved namespace     
        namespace_manager.set_namespace(namespace)

The important thing to notice here is the pattern that GAE provides. It will the exact same thing for the Java APIs. The finally block is immensely important as it restores the namespace to what is was originally (before the request). Omitting the finally block will cause the namespace to be set for the duration of the request. That means that any API access whether it is datastore queries or Memcache retrieval will use the namespace previously set.

Furthermore, to query for all the namespaces created, GAE provides some meta queries, as such:

Metaqueries

from google.appengine.ext.db.metadata import Namespace

q = Namespace.all()
if start_ns:
     q.filter('__key__ >=', Namespace.key_for_namespace(start_ns))
ifend_ns:
     q.filter('__key__ <=', Namespace.key_for_namespace(end_ns))

results = q.fetch(limit)
# Reduce the namespace objects into a list of namespace names
tenants = map(lambda ns: ns.namespace_name, results)
return tenants

Resources:

Reference: Multitenancy in Google AppEngine (GAE) from our JCG partner Luis Atencio at the Reflective Thought blog.

Related Articles :