Movie Database App: IMDb API

Jacob ZimmermanMarch 23rd, 2022Last Updated: March 21st, 2022

0 379 6 minutes read

Thus begins the real first article in my new Movie Database App series. Today, we’ll start looking at how I’m working with the IMDb API. Keep in mind it’s still a work in progress and could potentially remain that way until the full application is done. For instance, I still don’t have any of the “safety” features put in yet, like timeouts, retries, and “circuit breakers” with maybe backups. Also keep in mind that, while I’ll tell you my process in order when it’s important, I may tell things out of order for clarity.

If you’d like to kind of follow along a little, you can find the repo here.

How I Started

The first thing I did was make an ImdbApi class to house everything I needed, which is the api key and methods that invoke the web API’s “methods”. I started with just doing their basic search for what they call “titles”, which includes movies and tv shows. I used requests to do a simple lookup, and it sent back a big ol’ pile of unformatted json data.

Next thing I did was to dig through that data and figure out what data I cared about from it. I put together a Python dataclass called Title to hold it in and gave it a classmethod that could take in the json data and pull out only what I wanted. This worked well, so I did similar things with a few other API calls where you can get full data on a specific movie, tv show, season, or episode.

Adding More Functionality

Then I wanted to beef up the features of a couple of the other data types, namely posters and images (which are separate and work differently on their site). Instead of storing them as simple dicts (in the case of posters) or string urls (in the case of images), I wanted them to have separate pieces of data that could be used to request the other sizes provided as well.

Posters

Posters are extremely limited in different sizes available, yet they’re the ones that come with all the extra information. They come with their aspect ratio, width, height, id, link to the “original”, and even their language for whatever language might be written on it. But all that sizing information is nearly useless, since you can’t ask for custom sizes. You either get the “original”, which has those given dimensions, or you can request one of a series of options of resolutions that come in either a square cutout or “wide”, which is 16:9 or 9:16, depending on whether the original poster was portrait or landscape.

Anyway, I put together an enum that contained all the size options and a class that stored the “base” url and a single method that could take a given poster id and size option and return a new url to download that image from.

Images

Then there’s images, which are only given to you as a url to the original picture. But there’s plenty of information inside the url. There’s their id, aspect ratio (which is part of the id), and the size (which is either “original” or widthxheight for custom resolutions). So I created a similar class for generating urls, but it has several methods that allow you to request images that “fit” or “fill” certain areas, give specific dimensions, or just ask for the original.

To deal with handling two completely different ways to size images, I created a base ImageSize class with two subclasses: Dimensions and Original. Original is kind of a singleton, though not enforced because it’s not necessary – it’s more of a caching optimization. The base class is the only “public” one, and it has several factory methods for creating a new ImageSizes from a url, a string version of widthxheight, or more directly asking for specific dimensions or the original.

Splitting Things Up

At this point, I had a pile of stuff all in one file, and it was getting too long for my taste. So I started to separate the more technical API stuff from the data the app would actually use. Anything that directly dealt with creating URLs was put into what I dubbed the bareapi module, along with a few of the other poster and image classes dealing with size details.

But that ended up with circular references between the high-level and low-lowel modules. The high-level module used the poster and image classes to do their translation work to create themselves, and the API class was creating and returning the high-level data classes.

I was completely okay with high-level code referencing low-level code, but it shouldn’t be the other way around. There are a couple ways I could have done it, but what I decided to do was to layer the API class into two classes. The high-level API class would direct calls to the low-level one, which would create the urls based on input and return the raw data. Then the high-level one would take that raw data and transform it into the high-level data classes.

This creates two classes with nearly perfectly mirrored APIs, with the only real difference being that the low-level one returns dicts and the high-level one returns data classes. Having nearly identical (on the face) classes does cause some issues. First, sometimes I have to stop and think about which one I’m looking at. Secondly, any new additions or changes often have to be done to both. Seeing that this is effectively a Decorator Pattern with some slight differences, this is to be expected. But there are other options which wouldn’t have this problem, and maybe someday I’ll change over to that. For now, I’m fine with this.

Adding a Cache

Around this time, I ran into the issue that my free API key only allowed me 100 uses per day. I was doing repeats of the same call as part of manual testing, and it added up. I needed a cache for these calls. Since the calls were being made over multiple runs, a memory-based cache wouldn’t do; I needed a file-based cache. And the simplest (and most exciting) option for that was to use a SQLite database.

I had been wanting to program something that used SQLite for quite some time, so now was my chance. I programmed up a simple class to use SQLite as a key-value store where the url was the key, and the response was the value.

Side Note

There were 3 different kinds of responses I would get from the web API. One was pure bytes for a file, which was used for downloading both picture types. The other two were json responses, but some needed me to extract a “results” dictionary within that to pull out the useful data, and the others had the useful data at the top level of the json response.

I had 3 different functions that were used to handle those cases.

The Cache Class

I originally tried to have the cache directly wrap those 3 functions, but before I got that far, I decided to make it a proxy around requests.get(). Unfortunately, that failed because you can’t just save a Response object into the database, and I didn’t want to pickle anything. I got stuck, it was late, so I went to bed. The next morning, I came back to it and realized that I could store the raw response bytes each time and rebuild the Response object from that. Or, at least I theorized I could, and I investigated requests‘ documentation to see how that might work. In the end, yes, I could do that. I take the content property out, and then I can create a new Response later, inserting the bytes into the _content attribute.

Memory Cache

In the end, I wanted a memory-based cache, too, so it wouldn’t have to check the database with every call. If I’d made the call before during this run, then it’d be cached in memory too. So I made a Least-Recently-Used cache that is checked first before checking the database.

Bubbling Up

Being the paranoid guy that I am, there is an optional parameter all the way down the API call chain that can tell the cache to invalidate its data for the current call. Still paranoid, I want the user to know whether the data they’re looking at is from the cache or not so that they can know whether they might want to tell it to do the full trip, in case they believe there might be new information. To do that, a flag needs to bubble up through the return values.

API MVP Complete

I’m now at the minimum viable product of this API, and it’s time to start working on the real domain space. There are still the safety features to add, as mentioned above, but most of that will be another wrapper around this. So I’ll package this up and put it on Github soon. It won’t be an officially released package because it’s a very limited implementation of the API, but you’ll be able to use the code if you want. I hope to have enough work done in time to put up a decent article about it in a week. We’ll see.

Published on Java Code Geeks with permission by Jacob Zimmerman, partner at our JCG program. See the original article here: Movie Database App: IMDb API

Opinions expressed by Java Code Geeks contributors are their own.

Jacob ZimmermanMarch 23rd, 2022Last Updated: March 21st, 2022

0 379 6 minutes read

Movie Database App: IMDb API

How I Started