A demo instance of my url-shortener project is now available here:
Usually, URL shorteners are deployed under a short domain name (for example: goo.gl, bit.ly). It’s not the case here, but since it’s just a demo I didn’t think looking for a short domain for it was necessary, so I chose a longer domain that is also related to my blog.
I have also made some changes to the application. Some of them were minor, like adjusting font sizes in the front end, renaming some local variables in the back end, etc., but there are also some bigger changes.
Choice of replacements for homoglyphs
Previously, the application replaced homoglyphs in alias values according to the following rules:
- for each group of single-character homoglyphs, the one that was alphabetically the smallest was used to replace the rest
- for each pair of a multi-character and a single-character homoglyphs, the longer one was always replaced by its shorter equivalent.
These rules didn’t take into account that characters included in the replacement strings could be missing in the alias alphabet used by the application. It didn’t cause any error at the time because the alphabet was not designed to be configurable, and its hard-coded value used by the application included all the characters that were present in homoglyph replacements. Still, relying on the alphabet having some properly hard-coded characters was a rather poor and error-prone solution, so I fixed it. Members of each group of homoglyphs are now replaced by the shortest and smallest (in terms of alphabetic order) of their equivalents whose all characters are included in the alphabet used by the application.
Unification in handling of homoglyphs
Previously, the application used different implementations of homoglyph replacement for single- and multi-letter homoglyphs. The relationships between pairs of single-letter homoglyphs were represented by a translation object created with str.maketrans method. When replacing single-letter homoglyphs in an alias value, this translation object was passed to str.translate method of a string object representing the alias.
Relationships between pairs of multi-character homoglyphs and their equivalents were represented by a dictionary object, with multi-character homoglyphs being keys and their replacements being values. The homoglyph replacement was performed by simply looping over key-value pairs and replacing occurrences of a key in an alias string with its respective value.
The first approach was used for its simplicity, but it was limited to mappings of a single character to another character, so it was necessary to use a different approach for multi-letter homoglyphs. However, this made the code unnecessarily more complex, so I decided to abandon the approach using translations and to use just the dictionary approach, regardless of length of homoglyphs.
Refactoring AliasAlphabet class
In my last post about the application, I described AliasAlphabet – a class I introduced to the project when I was reorganizing its architecture to make it follow single responsibility principle more closely. At the moment of writing the post, I had doubts about adding it and I saw it as a candidate for further refactoring, but I also wanted to publish the article as soon as possible, so I left it as it was.
The instances of the class represented alias alphabets used for generating alias values, but they also contained methods for creating aliases. They were added here because creating aliases closely depended on an alias alphabet. Still, I thought it wasn’t the cleanest design and I was considering some ideas for replacing it. I decided to replace the class with AliasFactory class, a string object representing alias alphabet and a function creating a dictionary mapping homoglyphs to strings that should replace them.