Tasked with providing expertise on the implementation of Spark processing applications on Google Cloud using Docker and Kubernetes as part of a Machine Learning Platform migration to mircoservices for GM Cruise Automation, a leading Autonomous Driving Vendor. This required researching all the relevant technologies to the degree that provided a robust proof of concept, containerized Spark Application running on Google Cloud’s Kubernetes service.
Tasked with high priority/profile assignment of creating a framework to make almost two years worth of trading data stored as complex JSON objects in Cassandra accessible for analysis as requisite skills were not available in-house. This was done using a combination of Spark JSON parsing libraries and JSON binding libraries to create the set of basic and derived data tables accessible via Hive.
Responsible for reviewing and providing recommendations on improving existing in-house Hadoop clusters installed by a third party consulting company.
Set up of ELK (ElasticSearch/Logstash/Kibana) POC and Logstash Ingestion pipelines for real time metric analysis of group's enterprise servers Updates and modifications to Spark Streaming application and investigating rewrite from Java to Scala
Designed AWS cluster for the Marketing Science team. Scripted solution to launch ad-hoc , auto-scaling clusters using spot instances which bootstraps R-Studio. This allows members of the Marketing Science team to launch clusters using SparkR for investigation and analysis through a familiar R Studio environment. Since clusters are launched on as needed basis using spot instances this provides a very significant cost savings over the traditional method of an "always on" cluster solution.
Developed Annalect’s first Spark ETL process and associated framework for record processing on AWS using Scala, including the Python BOTO cluster launch and job submission script which is run as an AirFlow operator. This became the template for all future ETL workflows.
Manheim UK - A Cox Automotive BrandApril 2016 - September 2016
Leading the Design / Development and Implementation for the new Data Intelligence Division of Cox Automotive UK and its migration to its first Big Data Project using Cloudera (CDH 5.7.1) on Mircosoft Azure.
Responsible for the implementation of ingestion strategy using StreamSets and Analysis Framework using Spark,Tableau and Jupyter
Using Spark quickly solved a long standing in-house pricing problem commonly known as "Simons Bane" to find the optimal price for repairs to maximise return which heretofore was intractable using existing tools.
Mentoring and training in-house developers on Big Data Tools and components of the Hadoop Ecosystem.
Redesign of HBase DataBase Access layer for UK Immigration Application Framework.
Currently investigating/prototyping table schema and row key design, use of native HBase testing tools, access API design, coding and testing, performance strategies i.e. salting and investigation of third party tools such as Apache Phoenix and DataNucleus and monitoring strategies with JMX and DropWizard.
Designed and developed Initial proof of concept of a mission critical application using Spark, Scala and Hadoop to overcome severe bottle necks. Was able to speed up the process by a factor of 120 (12000% ) over the existing SQL/C# Application. This included the ability to aggregate and coalesce sets of files into a Spark dataframe and apply data enrichment rules via UDFs and joins on imported database tables . The resultant dataframe was saved to an Apache Phoenix Datastore on the processing cluster for end user analysis of the data at scale using standard JDBC tools. This successful first implementation led to the project getting fully funded for production development by offshore team.
Technical Advisor and SME to the Director of Semantic Technology ( Big Data ) Division, advising on Bank Wide Big Data Architecture, Adoption Strategy and Implementation.
Designed, planned, setup and tuned Cloudera (CDH 5) Cluster on Amazon AWS for Intels Internet of Things (IoT) Smart Cities Project.
Investigated use of Spark Cluster as a parallel computation engine for real-time Singular Value Decomposition of streaming sets of complex valued matrices.
Prototyped Play based web application for SparkSQL queries.
Introduced Apache Phoenix as Hive replacement and Apache Flume for data and log aggregation. . Led the design and development of the Data Analytics Lambda Architecture using the Apache Spark Stack: Spark, SparkSQL, Spark Streaming, Spark MLib, Scala and OpenTSDB.
At night set up and maintained companys first Hadoop Cluster using Hortonworks HDP 1.3 distribution via Apache Ambari using Centos Linux images hosted on Mircosoft Azure Cloud. Imported of over 100 tables from Azure SQL Server to HDFS for use by Datameer for analysis and visualization.
Utility Messaging Integration Java, Groovy/Grails, Spring (Integration), XML/XSL/XSD, MongoDB.
Sole developer of a Grails web app to allow CRUD operations for verification of a catalog of over 170 unique Business Process XML messages significantly reducing integration test cycle duration and errors.
Developed accompanying excel driven Selenium Test Suite.
STP Framework Developer. Sole developer reponsible for the initial roll out of banks new STP component, a Java Based JMX monitored framework. This application parses Ion Trade Records, updates and applies the necessary business and mapping rules to generate an XML payload for transmission via MQ to banks Trade Capture and Risk Management Application. Currently rolled out for Tradeweb Bonds and Allocations (Pre and Post) . Bloomberg Bonds, Bloomberg TOMs Trades , MTS/EBM Bonds and BrokerTec Basis and Repo Trades. Made extensive use of annotations for database persistence via Spring based DAO , XML mapping for payload generation and JMX monitoring.
Developed a standalone STP for Tokyo Office for Tradeweb Japanese Government Bonds.
TestNG Parametric Trade XML Unit Test Framework used extensively in the test and development of the STP framework to ensure the accuracy of Trades generated.
Maintenance and Modifcation of Swing Based Trade Monitor existing C++ STP components and Front office excel report generators.
Designed and Implemented a JMS Client to C# TCP Bridge with accompanying Policy Server. This is a key element of SITAs new Airport Flight Display application which allows the Java backend to pass flight display information to the Silverlight/C# client and receive logging information back for billing purposes.
Prototyped JMX Flight Display Heartbeat and Log Monitor.
Fidelity InvestmentsSeptember 2003 - September 2004
Overhaul of Fidelity’s Equity/Option/Mutual Fund Trading Portal. Currently converting to a Struts framework and adding functionality to allow for multiple order entry and complex options (i.e. straddles, spreads, butterflies etc). This was/is a high visibility project as the portal is re-branded by Fidelity’s’ enterprise clients for use by their customers.
Update and modification of portal administration applet. This applet administered user trading permissions and portal customization using Tibco Messaging Service to communicate with an Oracle Database.
Design and implementation of and XML based Alerts configuration web based application. This was built using the Struts platform for the user interface and Tibco Messaging System for message handling.
Participated in design, costing and feature implementation of several major versions and numerous point releases.
Led development effort for integration of Ariba Enterprise Sourcing and Ariba Analysis. This involved defining the appropriate data sets in Sourcing, retrieving these data through predefined APIs and customized SQL queries against an Oracle 8 database, generating template reports for end users. This allowed end users to build up an historical record of sourcing event data for qualitative analysis.
Implemented a Java/XML API Framework for modifying XML templates for 38 different sourcing event types. These templates are responsible for defining market rules for Ariba’s industry leading auction engine, the core of Ariba Enterprise Sourcing 3.0.
Introduced and implemented Market XML migration framework using XSL transformations. This allowed users of Ariba Dynamic Trade 2.0 and following versions to seamlessly migrate their auction XML templates to Ariba Enterprise Sourcing templates.
Introduced and implemented JSP/Java Bean/XML framework for the Designer Interface in Ariba Dynamic Trade 2.0. This was extensively used for the design and market validation of Exchanges, Forward and Reverse Dutch, Japanese and English Auctions, Sealed and Open Request For Quote Events and Online Negotiations.
Responsible for the design and coding for GUI elements of a Swing based Call Center Scripting Engine utilizing Tom Sawyer Graph Layout Toolkit: This tool allowed call center managers to graphically generate workflows to direct call center agent questionnaires.
Incyte PharmaceuticalsNovember 1998 - February 1999
Designed and coded sophisticated analysis spreadsheet using Swing, This a central component of Incytes commercial DNA Protein Analysis Applet. This tool allows web access to remote databases for the investigation of drug/cell interaction.
Helicopter Search and Surveillance for Counter Narcotics Interdiction in the Caribbean. Emergency deployment to Somalia in support Humanitarian and Peace Keeping operations as directed by United Nations Security Council Resolution 794.
.NET Compact Framework, JSP
Analysis methods and tools
SQL, Python, XSLT, Scala, XML, C/C++, Java
Big Data, Hadoop, Spark
Mes études et formations
Doctoral Studies: No Degree Awarded - Applied Mathematics , Finite Element Analysis - Rensselaer Polytechnic Institute1995 - 1997
Talent sourcing specialist de la plateforme Pentalog, SkillValue s’appuie sur un pool de +400 000 Experts Tech & Marketing, dont +15 000 freelancers. Évaluations de compétences, missions, offres d’emploi : les consultants SkillValue vous aident à accélérer votre carrière.
Restez informé de l'actualité tech avec la newsletter Pentalog
À propos de Pentalog
Pentalog est une plateforme de services IT qui accompagne les entreprises dans leur production digitale et leur transformation en fournissant des ingénieurs et développeurs de classe mondiale. Ses 16 Delivery Centers et Agences dans le monde comptent plus de 1600 ingénieurs, développeurs, product designers et spécialistes marketing.