Projects

Currently being involved projects using NLP (natural language processing) techniques for text mining of unstructured documents and automatic document summarization and classification.

bumblebee python package

Life costs and food survey project

Zoopla text classification project

[not availiable in a public repository]

Currently involved in data science projects using PySpark 2+ on the Cloudera Data Science Workbench (CDSW) environment on CDH (Cloudera's software distribution containing Apache Hadoop , Hive, YARN , Spark) and related opensource projects.

A graph linkage interactive viz in Shiny

Have been working on a project involving NOSQL Graph databases (Neo4j) and researching the ways they can improve on existing data matching practices at the ONS.

Profiling and Optimizing python

A lunch-talk I was supposed to give at GDSP Conference 2018 but that day I was snowed in… (doh!)