Rafael D.

Data Engineer

970 dollar
Freelancer
15 ans
Londres, ROYAUME-UNI

Mon expérience

Voir plus

Lloyds Banking GroupMarch 2019 - Présent

Working on the development IFRS17 regulatory project, using Spark Streaming, Kafka, HBase and

Apache Phoenix
Voir plus

Santander BankMarch 2017 - Présent

PROJECTS:

SMART SEARCH
Tasks:
* Currently working on this project developed entirely in Spark on Scala with TDD approach and ScalaTest (FlatSpec) framework.
* Loading data from Hive tables with billions of records.
* Storing information in parquet files to be loaded on a different Hadoop cluster.

INSTANT PRICING
Tasks:
* Development of a system to gather data to calculate insurance premium quotes for every client in the Bank using Sparktacus (Santander official framework based on Spark and Scala). Achievements:
* Managing tens of millions of records successfully.

OMRS NRT DASHBOARD
Achievements and tasks:
* Near Real Time (NRT) ingestion of XML data generated by a retail banking application and presented on a dashboard based on MicroStrategy technology using the following technologies in Cloudera distribution:
* NRT Ingestion with Flume using Kafka as a channel and as a sink in a multi-hop Flume architecture.
* Development of a Flume custom interceptor for parsing XML files in Java.
* Development in Spark and SparkSQL Scala APIs of a scheduled job for compacting the small files stored in HDFS by Flume to protect NameNode memory.
* Enabling data visualization for MicroStrategy Impala connector using Hive and Impala.
* Java and Scala code Unit testing following TDD approach and ScalaTest (FlatSpec) framework.
* Saving a full amount in license costs and providing better reliability and performance.

Using Agile/Scrum methodologies and Git for version control in every project.
Voir plus

Isban UKMarch 2017 - March 2018

Working as a freelance/contractor in Big Data projects.

PROJECT OMRS NRT DASHBOARD
Achievements and tasks:
- Near Real Time (NRT) ingestion of XML data generated by a retail banking application and presented on a dashboard based on MicroStrategy technology using the following technologies in
Cloudera distribution:
▪ NRT Ingestion with Flume using Kafka as a channel and as a sink in a multi-hop Flume
architecture.
▪ Development of a Flume custom interceptor for parsing XML files in Java.
▪ Development in Spark and SparkSQL Scala APIs of a scheduled job for compacting the
small files stored in HDFS by Flume to protect NameNode memory.
▪ Enabling data visualization for MicroStrategy Impala connector using Hive and Impala.
▪ Java and Scala code Unit testing following TDD and BDD approaches and and ScalaTest
(FlatSpec) framework.

- Saving license costs of previous systems and better reliability and performance.
- Agile/Scrum methodologies, Git for version control.

Voir plus

Innovery SpainFebruary 2016 - March 2017

PROJECTS:

ILMS. INNOVERY LOG MANAGEMENT SYSTEM
Achievements and tasks:
* Ingestion, archiving, processing and query of cybersecurity systems logs compound using the following technologies in Cloudera CDH 5.7.0 distribution:
* Log archiving and log integrity check processes with Spark and SparkSQL using RDDs and DataFrames.
* Ingestion with Flume into Hbase using a CEF log format interceptor and Hbase custom serializer.
* Indexing of HBase stored data with Cloudera Search.
* Data visualization with HUE Search / custom application.
* Job scheduling and integration with Oozie API Rest.

GOLDEN CONTROL PANEL. DEVELOPMENT OF THE FIRST BIG DATA NEAR REAL TIME
ARCHITECTURE FOR SANTANDER BANK IN CHILI
Achievements and tasks:
* Ingestion of execution data from Data Warehouse and other critical systems with Flume into Cloudera Search (near real time) and Hive (batch) in a lambda architecture.
* Batch calculation of statistics from the ingested data using Hive queries scheduled in Oozie.
* Storing data in Cloudera Search (Solr) to show the statistics and the execution data in near real time by means of a dashboard built in Spring Boot.
* Whole project deployed on Cloudera CDH 5.5.5 distribution.

DEVELOPMENT OF A PoC FOR THE TOP TIER ITALIAN BANKING FIRM POSTE ITALIANE
Achievements and tasks:
* Replacement of its current financial accomplishment system with another one based on big data. The aim of it is complying with FATCA rules and having a single source of information and processes for reporting.
* The technical proposal was based on the tool Talend for Big Data.
* Use of Sqoop, HBase, HDFS and MapReduce technologies embedded in Talend.

IVDF. DEVELOPMENT OF A SYSTEM FOR INGESTING AND UPDATING SOFTWARE
VULNERABILITIES STORING THEM IN MONGODB

Using Agile/Scrum methodologies, Git for version control and AWS and Docker Hadoop clusters for devel- opment in every project.

ServiWebNovember 2012 - January 2016

Voir plus

AIRBUSMarch 2008 - October 2012

Achievements and tasks:
* DBA for the object-oriented database DOORS.
Voir plus

Indra SistemasApril 2007 - April 2008

Achievements and tasks:
* Becoming expert for the object-oriented database IBM Rational DOORS.
Voir plus

IBMJanuary 2006 - December 2006

Achievements and tasks:
* Winner of the ``Personal Achievement Recognition Programme'' of IBM.

Mes compétences

Big Data

Impala, Oozie, Hadoop, Cloudera CDH, Apache Kafka, Spark, Big Data, Hive

Software testing

ScalaTest, Unit testing

Databases

MongoDB, Oracle, MySQL, Database Administration, HBase

Analysis methods and tools

Scrum, Apache Maven, ClearCase, Agile Methodology

Technologies

Oracle Applications, Solr, HDFS, Machine Learning, MapReduce, Spring Boot, Amazon Web Services (AWS), Spring Data

Others

Spanish, Team management, Artificial Intelligence

Frameworks

Spring

IT Infrastructure

Docker, Linux, Unix, Git

Environment of Development

DOORS, Maven

Other

Java Servlets, UML/OMT, Commercial Skills, IT Team Leader, Object Oriented Database, Scrum Methodology, Develop Web components, Sqoop, Apache Hive, German, Java Server Pages, Java Enterprise Edition, English, Apache HBase, Data Collection, Spring Framework, Systems Engineer, Apache Flume, Data Warehousing, Computer Science Engineer, Develop Database Applications, Database Administration Team Leader, Java 2, Object Oriented Analysis/Design, Master

Languages

Java, XML, SQL, UML, Python, Scala, JAVA SE

Mes études et formations

Master - University of Stuttgart (Germany)

- San Diego University2016

- MongoDB University2015

- Big Data University2015

Master of Engineering, Computer Science, Studying Telecommunication Engineering - Escuela Técnica Superior de Ingenieros de Telecomunicación1998 - 2006