May 2024 – Alex's Blog

How I would Learn Programming from Scratch

I have been programming earnestly for about 8 years and professionally for 5 years. I have experience in front-end development, back-end development, and data science. My path is fairly traditional, considering I initially learned by myself and then continued my education as part of my undergraduate studies and during my placement year.

Start with Python. Regardless of whether you plan on learning front-end, back-end, or data science, Python will help you grasp the fundamentals without getting bogged down in syntax, which is something I struggled with initially. While syntax is important, it can be a barrier for many people. I started with Java, which wasn’t a horrible idea, but I spent a lot of time learning higher-level concepts like OOP that were only useful much later on.
Start creating things as soon as possible. There can be a temptation to watch YouTube videos, follow guides, and copy and paste the results, but this often gives a false sense of learning. When it came to tasks outside the domain of these guides, I struggled. Instead, look up how to do the specific subtasks needed but not the entire project. Learning a language is similar: knowing words and syntax is important, but knowing how they fit together leads to fluency. Example projects to start with include a diary application, a to-do list, and Hangman. All of these are CLI utilities, so you won’t get bogged down in display methods.
Learn how to read documentation. It might sound trivial, but being able to read and fully understand documentation without examples is a key skill that will be invaluable in the future.
Start reviewing other code and look at how projects are structured, particularly on GitHub.
Experiment with other areas of programming. For example, if you are interested in making websites, try learning JavaScript or TypeScript. If you are interested in making applications, try using Tkinter. If you are into making games, consider looking at Godot.

It’s important to have fun while learning. If you don’t enjoy the learning process, you might find programming hard to enjoy. Learning new skills is a massive part of programming, and it’s a constant journey of learning from others.

Making a cloud Native webcrawler in Go

Image of nodes connected in a graph. Showing how pages related to each other

Over the past few weeks I have been making a webcralwer. I wanted to do it as way to get better at Go with useful for learnings for Graph databases as well as being fun. The project made use of cloud native items such as AWS SQS, DynamoDB and optionally Neptune which could be swapped out for Neo4j.

What is a webcrawler?

A webcrawler or webspider is a program which visits a website, and fetches all of the links on that site and then visits them. This is how sites like Google/Bing/DuckDuckGo get the pages to populate when searching.

Architecture

Image of AWS services. SQS is drawn connected to the Webcrawler as well. There is also a connection to Neptune and AWS DynamoDB

SQS queue

The primary role of the SQS queue was to allow as a store of links to explore. It also if there is an issue processing the link we have a dead letter queue automatically which allows for a high level observability.
It also allows for the program to better scale horizontally. By Allowing multiple nodes to pick up the work.

DynamoDB

This acted as the long term storage of the data. Initially I tried modelling the links between sites inside DynamoDB however this did not work well with the access patterns due to 1MB limit on items in DynamoDB. DynamoDB works excellently as a NoSQL serverless solution for Key value look ups

Neptune/Neo4j

I also used a graph database to store the relationships between pages. This proved much easier than expected. Due to openCypher being a really beginner friendly tool very similar to SQL but allowing for very complex relationships to be models easily. I will be using graph databases in the future for other projects as well.

Take aways

For me it was a fun project which I learnt a lot about some of the major difficulties which Search engines encounter. Such as pages not being formatted correctly, or pages linking to subpages endlessly. In the future I want to build a searching mechanism on top of this to act as a crude full search engine.

For anyone curious here is link to the Github