Url-shortener: a demo and more refactoring

A demo instance of my url-shortener project is now available here:

https://url-shortener.reusingthewheel.tk

Usually, URL shorteners are deployed under a short domain name (for example: goo.gl, bit.ly). It’s not the case here, but since it’s just a demo I didn’t think looking for a short domain for it was necessary, so I chose a longer domain that is also related to my blog.

I have also made some changes to the application. Some of them were minor, like adjusting font sizes in the front end, renaming some local variables in the back end, etc., but there are also some bigger changes.

Choice of replacements for homoglyphs

Previously, the application replaced homoglyphs in alias values according to the following rules:

  • for each group of single-character homoglyphs, the one that was alphabetically the smallest was used to replace the rest
  • for each pair of a multi-character and a single-character homoglyphs, the longer one was always replaced by its shorter equivalent.

These rules didn’t take into account that characters included in the replacement strings could be missing in the alias alphabet used by the application. It didn’t cause any error at the time because the alphabet was not designed to be configurable, and its hard-coded value used by the application included all the characters that were present in homoglyph replacements. Still, relying on the alphabet having some properly hard-coded characters was a rather poor and error-prone solution, so I fixed it. Members of each group of homoglyphs are now replaced by the shortest and smallest (in terms of alphabetic order) of their equivalents whose all characters are included in the alphabet used by the application.

Unification in handling of homoglyphs

Previously, the application used different implementations of homoglyph replacement for single- and multi-letter homoglyphs. The relationships between pairs of single-letter homoglyphs were represented by a translation object created with str.maketrans method. When replacing single-letter homoglyphs in an alias value, this translation object was passed to str.translate method of a string object representing the alias.

Relationships between pairs of multi-character homoglyphs and their equivalents were represented by a dictionary object, with multi-character homoglyphs being keys and their replacements being values. The homoglyph replacement was performed by simply looping over key-value pairs and replacing occurrences of a key in an alias string with its respective value.

The first approach was used for its simplicity, but it was limited to mappings of a single character to another character, so it was necessary to use a different approach for multi-letter homoglyphs. However, this made the code unnecessarily more complex, so I decided to abandon the approach using translations and to use just the dictionary approach, regardless of length of homoglyphs.

Refactoring AliasAlphabet class

In my last post about the application, I described AliasAlphabet – a class I introduced to the project when I was reorganizing its architecture to make it follow single responsibility principle more closely. At the moment of writing the post, I had doubts about adding it and I saw it as a candidate for further refactoring, but I also wanted to publish the article as soon as possible, so I left it as it was.

The instances of the class represented alias alphabets used for generating alias values, but they also contained methods for creating aliases. They were added here because creating aliases closely depended on an alias alphabet. Still, I thought it wasn’t the cleanest design and I was considering some ideas for replacing it. I decided to replace the class with AliasFactory class, a string object representing alias alphabet and a function creating a dictionary mapping homoglyphs to strings that should replace them.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s