Outside of my professional career I still like to get geeky (once a geek, always a geek). You will find below various projects I have embarked on personally, if you would like to collaborate on any projects then please get in touch.
www.Muscles.Cheap was created as a result of me getting frustrated searching for the best priced sports supplements every time I went to try and buy some, So yes; I’m also a gym addict !. After doing some research it was apparent that nobody had created a price comparison engine for the sports supplement industry, you have Moneysupermarket.com etc for insurances but nothing existed to solve my problem which was paying the lowest price possible each time I bought a supplement (as prices vary massively between sellers) – so I set out to solve it and Muscles.Cheap was created.
Muscles.Cheap periodically searches the most popular supplement seller websites, like Monster Supplements etc, and using API’s extracts all data on products that are of interest (protein) and then makes that data available to my website through the use of Elasticsearch.
I designed the engine\website from the ground up, all the graphics you see were custom created by me including the AWS architecture design to allow me to run this application at minimal OPEX cost. The big issue I had to overcome was actually time .. time to sit down and code the application layer (so the ETL layer), time which I didn’t have free due to a new baby and new job (as if a new baby wasn’t enough). So to solve this challenge I did a long and painful search for a development house in India who could deliver me my application ETL layer at minimal cost, in an agile way and fast(so I was looking for CI/CD cycles to take place with agile SCRUM project management ). Lucky for me I found an excellent development business in India called Tekbuds, who I employed to code the ETL element of my application. After some discussion around use of technology we actually took a far simply route to begin with, and used Kimono SAS service to actually do the data scraping and present an API endpoint to the data allowing my app to query on demand. This approach mixed with the use AWS services (S3/CloudSearch/EC2) worked well .. for the first 5 months anyway until Kimono got bought and decommission their service! – Bummer !.
I always knew that approach was a risk, but a cost efficient risk as it meant I didn’t need to run scraping processing myself … without wanting to fall into that trap again we proceeded to revise the scrap method and decided on running our own Python Scrapy service. Scrapy.org is python framework which allows us to run the scraping ourselves programmatically, although this now means I need to run an EC2 instance it also gives us the agility and flexibility around change and innovation.
To date Scrapy has done the job fine, with costs at a management level. Find below a summary of the final solution as it stands today:
Services Used: S3(Web Tier) EC2(Scrapy Processor) ElasticSearch Cluster (Data store)
App Front End: AngularJS/Bootstrap/HMTL5
ETL: Scrapy Python Framework, scraping around 20+ sites
Avg Monthly Cost Run: £35 (Before I introduced Scrapy I was on avg spending £1.80 mnth as I had designed it all to run from S3 web enabled object container!)
The next evolution of the product will be to Container\Cloud Foundry the application to remove the need to have those horrible things called servers running !