Youness H.

Data Engineer

415 dollar
Freelancer
4 ans
Paris, FRANCE

Mon expérience

Voir plus

BPCE SANovember 2017 - Présent

- Setting up BPCE-SA Datalake – HORTONWORKS

- Data ingestion framework development from scratch (Java 8, Spring, Scala, Spark2, Hive, Hbase, HDFS, git, jenkins, XlDeploy, KMS, Ranger…) :

     - Software architecture Design

- Development of the different bricks of the framework

- Establishment of the Devops chain to ensure continuous integration and deployment

- Ensure data security throughout the data manipulation chain (encryption before and after ingestion) with access management to exposed data

- Detecting Check’s Frauds (Azure, Databricks, Python) :

 - Encryption of checks on premise beforse sending them to Azure Storage

 - Ensuring the good transmission of millions of checks from on-premise to Azure blob Stroage

 - Ensuring data(checks) decryption on Databricks and the connection to Key Vault

 - Optimimizing the Loading of millions of checks to Spark dataframes in order to decrypt them in memory.

 - Implementing (on-premise) data anonymization for sensitive data

- Administration of the services of Hortonworks platform :

- HDFS storage management

- Management of the YARN resource manager (queues …)

- Spark Tuning

- Integration of Ambari with LDAP

- Configuration of high availability and cluster kerberization

- Configuration of Ranger policies (HDFS, Hive) and KMS encryption

- Ingestion of different business data (risks and finance) within BPCE SA in the datalake following a threelayer design on HIVE (RAW (raw data), ENHANCED (prepared data), EXPOSURE (data to be exposed)) (Spark , Hive, Scala, PySpark(python) )

- Development of a tool that implements non-regression tests for large positional data files ( for risks and finances) between different closing dates (Spark, scala)

Mes compétences

Technologies

HDFS, AWS

Environment of Development

Maven

Big Data

Big Data, Spark, Hive

Software testing

Regression testing

Analysis methods and tools

DevOps, Apache Maven

Middleware

Jenkins

Computer Tools

Microsoft Excel

IT Infrastructure

Git

Application servers

Zookeeper

Others

Continuous Integration

Languages

HQL, Scala, Java, SQL

Protocols

LDAP

Mes études et formations

Online Certicates - --

Improving the caching policy in Spark1.6 (LRU, «eviction policy») - INRIA SOPHIA ANTIPOLIS2016 - 2017

Engineering degree in Computer Science - Institut National des Postes et Télécommunications (INPT)2014 - 2017

Master IFI (Computer Science) - Polytech2016 - 2017

Development of an LTE media trace analysis tool (voice-video) (Java 8): - ORANGE Labs Lannion2017 - 2017