Multitenancy in Google AppEngine (GAE)

Multitenancy is a topic that has been discussed for many years, and there are many excellent references that readily available, so I will just present a brief introduction.

Multitenancy is a software architecture where a single instance of the software runs on a server, serving multiple client organizations (tenants). With a multitenant architecture, an application can be designed to virtually partition its data and configuration (business logic), and each client organization works with a customized virtual application instance.

It suits SaaS (Software as a Service) cloud computing very well; however, they can be very complex to implement. The architect must be aware of security, access control, etc.

Multitenancy can exist in several different flavors:

Multitenancy in Deployment

  1. Fully isolated business logic (dedicated server customized business process)
  2. Virtualized Application Servers (dedicated application server, single VM per app server)
  3. Shared virtual servers (dedicated application server on shared VM)
  4. Shared application servers (threads and sessions)

This spectrum of different installations can be seen here:

Multitenancy and Data

  1. Dedicated physical server (DB resides in isolated physical hosts)
  2. Shard virtualized host (separate DBs on virtual machines)
  3. Database on shared host (separate DB on same physical host)
  4. Dedicated schema within shared databases (same DB, dedicated schema/table)
  5. Shared tables (same DB and schema, segregated by keys – rows)

Before jumping into the APIs, it is important to understand how Google’s internal data storage solution work. Introducing Google’s BigTable technology:

It is a storage solution for Google’s own applications such as Search, Google Analytics, gMail, AppEngine, etc

BigTable is NOT:

  • A database
  • A horizontally sharded data
  • A distributed hash table

It IS: a sparse, distributed, persistent multidimensional sorted map. In basic terms, it is a hash of hashes (map of maps, or a dict of dicts). AppEngine data is in one “table” distributed across multiple computers. Every entity has a Key by which it is uniquely identified (Parent + Child + ID), but there is also metadata that tells which GAE application (appId) an Entity belongs to.

From the graph above, BigTable distributes its data in a format called tablets, which are basically slices of the data. These tablets live on different servers in the cloud. To index into a specific record (record and entity mean pretty much the same thing) you use a 64KB string, called a Key. This key has information about the specific row and column value you want to read from. It also contains a timestamp to allow for multiple versions of your data to be stored. In addition, records for a specific entity group are located contiguously. This facilitates scanning for records.

Now we can dive into how Google implements Multitenancy.

Implemented in release 1.3.6 of App Engine, the Namespace API (see resources) is designed to be very customizable, with hooks into your code that you can control, so you can set up multi-tenancy tailored to your application’s needs.

The API works with all of the relevant App Engine APIs (Datastore, Memcache, Blobstore, and Task Queues).

In GAE terms,

namespace == tenant

At the storage level of datastore, a namespace is just like an app-id. Each namespace essentially looks to the datastore as another view into the application’s data. Hence, queries cannot span namespaces (at least for now) and key ranges are different per namespace.

Once an entity is created, it’s namespace does not change, so doing a

namespace_manager.set(…)

will have no effect on its key.

Similarly, once a query is created, its namespace is set. Same with
memcache_service()
and all other GAE APIS. Hence it’s important to know which objects have which namespaces.

In my mind, since all of GAE user’s data lives in BigTable, it helps to visualize a GAE Key object as:

Application ID | Ancestor Keys | Kind Name | Key Name or ID

All these values provide an address to locate your application’s data. Similarly, you can imagine the multitenant key as:

Application ID | Namespace| Ancestor Keys | Kind Name | Key Name or ID

Now let’s briefly discuss the API (Python):

Function NameArgumentsAPI
get_namespaceNoneReturns the current namespace, or returns an empty string if the namespace is unset.
set_namespacenamespace: A value of None unsets the default namespace value. Otherwise,

([0-9A-Za-z._-]{0,100})

Sets the namespace for the current HTTP request
validate_namespacevalue: string containing the namespace being evaluated. Raises the BadValueError if not ([0-9A-Za-z._-]{0,100}). exception=BadValueErrorRaises the BadValueError exception if the namespace string is not valid.

Here is a quick example:

Datastore Example
tid = getTenant()

namespace = namespace_manager.get_namespace()

try:
         namespace_manager.set_namespace('tenant-' +    str(tid))
 
         # Any datastore operations done here
         user = User('Luis', 'Atencio')
         user.put()

finally:

        # Restore the saved namespace     
        namespace_manager.set_namespace(namespace)  

The important thing to notice here is the pattern that GAE provides. It will the exact same thing for the Java APIs. The finally block is immensely important as it restores the namespace to what is was originally (before the request). Omitting the finally block will cause the namespace to be set for the duration of the request. That means that any API access whether it is datastore queries or Memcache retrieval will use the namespace previously set.

Furthermore, to query for all the namespaces created, GAE provides some meta queries, as such:

Metaqueries
from google.appengine.ext.db.metadata import Namespace

q = Namespace.all()
if start_ns:
     q.filter('__key__ >=', Namespace.key_for_namespace(start_ns))
ifend_ns:
     q.filter('__key__ <=', Namespace.key_for_namespace(end_ns))

results = q.fetch(limit)
# Reduce the namespace objects into a list of namespace names
tenants = map(lambda ns: ns.namespace_name, results)
return tenants 

Resources: 

Reference: Multitenancy in Google AppEngine (GAE) from our JCG partner Luis Atencio at the Reflective Thought blog.

    Related Articles :

    Related Whitepaper:

    Software Architecture

    This guide will introduce you to the world of Software Architecture!

    This 162 page guide will cover topics within the field of software architecture including: software architecture as a solution balancing the concerns of different stakeholders, quality assurance, methods to describe and evaluate architectures, the influence of architecture on reuse, and the life cycle of a system and its architecture. This guide concludes with a comparison between the professions of software architect and software engineer.

    Get it Now!  

    Leave a Reply


    + 4 = seven



    Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use
    All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
    Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
    Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

    Sign up for our Newsletter

    15,153 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

    As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

    • Fresh trends
    • Cases and examples
    • Research and insights
    • Two complimentary e-books