Code Misc Personal projects

Making a cloud Native webcrawler in Go

Image of nodes connected in a graph. Showing how pages related to each other

Over the past few weeks I have been making a webcralwer. I wanted to do it as way to get better at Go with useful for learnings for Graph databases as well as being fun. The project made use of cloud native items such as AWS SQS, DynamoDB and optionally Neptune which could be swapped out for Neo4j.

What is a webcrawler?

A webcrawler or webspider is a program which visits a website, and fetches all of the links on that site and then visits them. This is how sites like Google/Bing/DuckDuckGo get the pages to populate when searching.


Image of AWS services. SQS is drawn connected to the Webcrawler as well. There is also a connection to Neptune and AWS DynamoDB

SQS queue

The primary role of the SQS queue was to allow as a store of links to explore. It also if there is an issue processing the link we have a dead letter queue automatically which allows for a high level observability.
It also allows for the program to better scale horizontally. By Allowing multiple nodes to pick up the work.


This acted as the long term storage of the data. Initially I tried modelling the links between sites inside DynamoDB however this did not work well with the access patterns due to 1MB limit on items in DynamoDB. DynamoDB works excellently as a NoSQL serverless solution for Key value look ups


I also used a graph database to store the relationships between pages. This proved much easier than expected. Due to openCypher being a really beginner friendly tool very similar to SQL but allowing for very complex relationships to be models easily. I will be using graph databases in the future for other projects as well.

Take aways

For me it was a fun project which I learnt a lot about some of the major difficulties which Search engines encounter. Such as pages not being formatted correctly, or pages linking to subpages endlessly. In the future I want to build a searching mechanism on top of this to act as a crude full search engine.

For anyone curious here is link to the Github

Misc Personal projects Tech

Deployment options for Rails

Over the past year, I have needed to deploy Ruby on Rails applications to the cloud, and I have tried a few methods.

  • Capistrano
    This cool system and is fairly interesting, but not the most modern technique it creates files called cap files which then can be used to deploy the app over SSH. However the main issue being scaling, since it only deploys to one location it means auto scaling becomes difficult.
  • Heroku
    Heroku is a PaaS system, meaning you don’t have to deal with server configuration, the application runs inside a specially configured docker container. In terms of ease of use Heroku scores very highly, Heroku makes it very easy to deploy an Application to production. However it being a platform you pay a premium for the ease of use.
  • Docker
    This methods seems the most hands on but it grants the most options, and probably the most modern.
    My recommendation is to create a deployment docker file, which can build the file from a Git repository. Then its your choice if you use something like kubernetes, or Docker Swarm to create other containers with the app running.
  • Dokku
    Dokku is an interesting project. The program uses docker containers in the backend and configures them. So in a way you can host your own PaaS system, it requires some configuration, however It can lead to a very effective PaaS System.

In the end, the type of deployment you do depends on the requirements of the project.

Code Misc Personal projects

Dev-ops on an iPad

The surprising power of an iPad

Recently I have been testing my skills/patience by trying to make and develop websites using an iPad (Pro 10.5 with smart Keyboard).


SSH initially seemed like a big hurdle since many web based SSH Clients are Java based Applets. However using an IOS app called Coda which comes with a SFTP/FTP client and SSH this was no longer and problem, however there are is probably a limitation with the usage of PEM, however I have limited usage with PEM, so your mileage may vary.


This is probably the weakest part of the whole experience, there are many apps however which can help with database management, in addition using phpMyAdmin might be the easiest and cheapest for most people.

Web development

A major problem which I found is that the iPad cannot run any server-side code (php, Ruby, Node.JS). So It was limited to just HTML and JS. However the workaround is to use a VPS, to use as a development server, and run and modify the code remotely, this is not ideal however at a pinch it is a viable alternative, especially if you create a disk image of a pre configured server, then all you have to do is start a VPS with the disk image and transfer the file you need to work on.

In iPad review

Though it is possible to deploy and run websites just from an iPad, it can be very frustrating especially when moving files around, however IOS 11 has helped the file experience on iOS. The experience is still sub par compared to a desktop. However it is nice to know it is possible, however I would only consider doing this with either an external keyboard or a smart keyboard since using the onscreen keyboard seriously limits usable space.