Changes in url-shortener

When I introduced my URL shortener project, I already had a working version and I thought I only needed to improve its front-end to be able to release the first stable version of the application. I also had some ideas for new features to be added to it.

However, the longer I looked at my code, the more room for improvements I saw. Although I ended up adding some new features, most of the changes I made since then were refactorizations, changes in design and architecture of the application and code style improvements.

In this post, I’m going to describe some of them.

Front-end

I designed the front-end according to rules of responsive web design. Initial CSS styles ensure it looks good on mobile screens, while the styles specified in media queries override the initial styles to adjust the front-end to larger displays.

Here are examples of the current result:

The main page on a mobile screen...
The main page on a mobile screen…
... and on a desktop screen.
… and on a desktop screen.

A full gallery is available here.

New classes for representing, creating and converting aliases

Alias values are no longer represented by a dedicated class – they are just simple string objects.

AliasAlphabet partially replaces the old Alias and NumeralSystem classes. It represents an ordered set of characters that can appear in valid alias strings and contains methods for creating aliases.

IntegerAlias class is now the only one responsible for converting between alias strings and their integer representations, as it should have been from the beginning. It no longer just calls methods responsible for the conversion, but contains the code that actually performs it.

Instances of the class depend on instances of AliasAlphabet and use them for the conversion. The constructor of IntegerAlias class makes sure all the alias strings generated by an instance of AliasAlphabet passed to it can be converted to an integer less or equal to the maximum 32 bit signed integer.

The two responsibilities: that of creating an alias value and that of making sure each possible value has a valid integer representation, are now separated. This is good, because the second responsibility is related to storage, while the first is not, and the current design better adheres to the single responsibility principle. In addition, the current code responsible for generating an alias value is simpler, and each class in the system contains substantial code.

There is only one downside to the current implementation: for some combinations of alias alphabet length and maximum new alias length, there are both aliases of maximum length whose integer values are smaller and larger than maximum value of int32. In current implementation, instances of AliasAlphabet configured with maximum new alias length of such values are treated as invalid.

In the previous implementation, the responsibilities of creating a valid alias value and converting it to integer were assigned to Alias class. Alias values were first generated as integers, and if only some aliases of given maximum length were valid, that is: their respective integer values were smaller or equal to max int32, the factory simply chose max int32 as the upper limit for random number generator, instead of the maximum integer value of any alias of given length.

The current implementation is not as flexible. I could have changed it, but I decided it wasn’t a big problem, so I didn’t do it.

New way of shortening a URL

Assigning a random alias to a target URL is no longer performed by an event handler. Instead, the method responsible for creating a new, randomly generated alias – AliasAlphabet.create_randomis passed as default_value parameter of SQLAlchemy Column object representing the alias column in the database.

Previously, both adding a URL object to database session and committing pending changes to the database were performed by the same function: url_shortener.models.register. These operations have been split: adding a URL object to session and caching it to ensure it is always unique is now performed by get_or_create class method of the BaseTargetURL class (replacing previously used ShortenedURL class). The code responsible for it is based on this recipe.

With BaseTargetURL class and commit function (replacing the register function) as currently implemented, it is possible to create multiple target URL objects without performing any database insertions and then to commit all of them at once, in one transaction.

In addition, the application never raises any exception when it generates an already existing random alias at least a certain number of times – instead, the generation is resumed without limit, until it is successful. The number of possible alias values is large enough for such an occurrence to be rare and for the integrity error to be quickly handled in case it finally occurs, so I don’t think it’s necessary to inform a user an error has occurred and to ask him to retry.

Handling homoglyphs

Homoglyphs are characters and sequences of characters that are similar to each other. In my URL shortener, for each group of homoglyphs I could think of, all characters are treated as identical to each other and are internally represented by just one of them.

This is the case both when generating an alias and receiving a request with an existing one. This prevents mistakes resulting from users mistyping or misremembering potentially confusing characters in aliases – when someone uses such a mistyped alias, the application treats it just like the actual alias value similar to it. Thanks to this, a short URL with a mistyped alias can still work just like if it was typed in correctly – otherwise, the application wouldn’t recognize it and would just display a 404 error page, and the user might not even realize he made a mistake, and even if he suspected it, he would still be forced to look for a correct URL, either by trial and error or by looking it up again.

Refactoring view functions

A few functions in the url_shortener.views module, namely:

were replaced by a single view class: ShowURL. I think it is a better solution than having four very small functions, with two of them consisting of just a call to another function.

Adding migrations

I added support for automatic database migrations using Flask-Migrate package. I thought it would be a useful addition, and it proved to be useful during the whole refactoring process because it included some changes made to the database model, namely: changing the name of the class representing a shortened URL and changing a name of one of its attributes.

Application factory and dependency injection container

One of the first decisions I made after publishing my last article on the project was refactoring it so that it would use flask blueprints and an application factory function. I thought it would improve the architecture of my project by making it more modular and testable, which would be an improvement of the code by itself and would make it easier to extend my application in the future.

The role of application factory is to create instances of Flask class representing applications, each of them configured differently.

To be able to have non-global application objects, I had to remove the reliance of my code on a global app instance and its config property. Values of configuration options had to be injected not only into instances of classes, but also into functions, including ones representing views.

For a short time I was considering solving it by simply creating all the objects representing services in the factory function and by introducing functions responsible for initialization of views. However, I decided it was better for dependencies to be created and managed automatically by a dependency injection container. I chose to use Injector because its author already provided a package integrating it with Flask and adding the possibility of injecting dependencies directly into flask view functions.

There was one problem with my application: some configuration options or application-instance-specific objects were dependencies of class objects. Having classes defined globally, directly in python modules, means that dependencies of the class objects representing them would also have to be shared globally. It was the case with SQLAlchemy model class – depending on an instance of AliasAlphabet – and a form class depending on a spam/blacklist validator object.

For the model class, the problem could be solved by making it independent of SQLAlchemy and Flask-SQLAlchemy, creating an SQLALchemy Table object dynamically for each instance of application and mapping the class to it (like here). However, I didn’t like that it would require explicitly doing things that Flask-SQLAlchemy already did automatically, especially since I already implemented migrations using a package that depended on Flask-SQLAlchemy. Instead, I decided to split the model class into a common base class and an application-object-specific class extending it. The second class was to be created dynamically and injected into views as a dependency.

As for the form class, it depends on the validator being added to validators of its URL field. It was impossible (or at least: difficult and requiring modification to internals of WTForms library) to instantiate the field in the constructor of the form class – it was necessary to define it as a class attribute. For this reason, the form class needed to be dynamically created per instance of application, too.

Creation of both class objects is managed by separate injector modules, and they are injected into views like any other dependency.

Because Flask-Migrate requires instances of both Flask and flask_sqlalchemy.SQLAlchemy, my application factory returns both of the objects, not just the instance of Flask application.

Changing storage mechanism of blacklist and adding a whitelist

I replaced blacklist file with a simpler approach to storage: a host blacklist specified directly in a python config file. My URL shortener uses my spam-lists library, and reading hosts from a text file was a legacy idea I had for this library until I realised that providing support for any specific storage mechanism in the library should not be part of its scope.

I added a whitelist for hosts known to be safe. URLs with such hosts won’t be tested against the custom and third-party blacklists.

Advertisements

One thought on “Changes in url-shortener

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s