Tiny URL Generator
"""
By taking our character set and converting it to an encoded string,
we need only keep track of how many urls we have generated and we will always
have a unique url.
If we wanted to scale url generation horoziontally we would only have to track
the count value in something like redis where we could extract the current value and itterate on it at the smame time to assure that we maintained uniqueness across all nodes. To simplify we could also assign a count range and preset the start / end count for each node allowing unique url generation across n workers
Also if we didnt want the urls to be sequencial we could simply salt a psudo random number with the count value to make it non-obvious how the urls were generated.
"""
import string
# the counter for the number of urls generated
count = 0
# all upper and lowercase letters and 0-9
# keep in mind we have far more characters than we need.
# I could have done only lowercase a-z and had enough to make 100 million urls
# abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
charset = string.ascii_letters + string.digits
# This is the base value of the charset we are going to permute over.
# we could increase the number of avaliable combinations by increasing
# this number so long as it does not exceed the lenth of the charset
base_num = min(25, len(charset))
# Encode the number to a base(N) string
def encode(num):
# We start by checking for num == 0 so we dont divide by zero
if num == 0:
# if zero grab the first character from the charset and pad out the
# resulting string with zeros to make sure its always 6 chars long
return charset[0].zfill(6)
base = []
# we will continue the loop for as long as the num is not zero
while num:
# divmod returns the number of times y goes into x and the remainder
# eg. divmod(10, 3) returns 3 times and 1 remainder
num, rem = divmod(num, base_num)
# we use the remainder to choose the character from the charset
base.append(charset[rem])
#join together the chacter list and pad out to 6 chars
return ''.join(reversed(base)).zfill(6)
# decode the base(N) string back into a number
def decode(short_url):
"""
return num times base_num plus the index of the current
character in the charset
url: https://my.url/abc000
a: 0 * 25 + 1 = 1
b: 1 * 25 + 2 = 27
c: 27 * 25 + 3 = 678
0: 678 * 25 + 52 = 17_002
0: 17002 * 25 + 52 = 425_102
0: 425102 * 25 + 52 = 10_627_602
"""
num = 0
for char in short_url:
num = num * base_num + self.charset.index(char)
return num
if __name__ == '__main__':
for x in range(10_000_000, 10_000_025):
print(f'{base_url}{encode(x)})
Output:
https://tin.y/00000a
https://tin.y/00000b
https://tin.y/00000c
https://tin.y/00000d
https://tin.y/00000e
https://tin.y/00000f
https://tin.y/00000g
https://tin.y/00000h
...
https://tin.y/0000du
https://tin.y/0000dv
https://tin.y/0000dw
https://tin.y/0000dx
https://tin.y/0000dy