About

Regards! I am Jonatan Enes, a Computer Engineer obsessed with system's performance, frugality and ecologism.

Wait... Maybe everything is related!!

Currently I am working in the University of A Coruña (UDC), in Spain. You can check out my work below.

Basic Information
Age:
28
Email:
jonatan.enes@udc.es
Language:
Spanish (native), English, Galician, Catalan
Education

2016 - 2020

PhD Student
PhD in Computer Science

University of A Coruña

I have done my PhD in Big Data in the University of A Coruña (UDC). My research focused on resource (CPU, memory, disk, network and energy) analysis on Big Data infrastructures and applications, specially using containers as the infrastructure virtualization technology.

2015 - 2016

Master's Degree
Master in Big Data

University of Santiago de Compostela

For my master's degree I moved to the near-city of Santiago de Compostela (yes!, at the ending of the Way of Saint James pilgrimage). There, besides making new friends and finding love, I got my 1-year master's degree in Big Data.

2011 - 2015

Bachelor's Degree
Degree in Computer Engineering

University of A Coruña

After finishing high school I moved to A Coruña, in the northwest of Spain, to continue with my college education. In Coruña I studied a 4-year degree in Computer Science with a specialization in Computer Engineering.

In this degree I achieved a Best Student award.

Professional Skills
Performance Analysis and Engineering

Both my Thesis and my passion revolve around analyzing the performance of both systems and programs, using several techniques from simple resource monitoring to resource profiling, as well as working with several resource management paradigms possible. I have also worked with energy management in order to better asses its footprint and how it can be both passively studied and actively handled.


Big Data

I have worked with Big Data technologies like Hadoop or Spark for many years. Although I focused on the deployment, scalability, monitoring and architecture part of the processing engines, I also understand how the applications are programmed their execution quirks and potential performance issues. I have also used Big Data databases like OpenTSDB (time series) or Cassandra (column store), as well as NoSQL databases with Big Data scalability potential such as MongoDB.


Machine Learning and Visualization

I have some degree of skills with Machine Learning technologies and libraries like Python's ScikitLearn and PySparks' MLlib, which I have used to carry out some projects. However, I have worked to a greater degree with visualization technologies like Javascript's D3 or Python's Matplotlib.


Research

As a result of my PhD I know how to identify promising lines of work, lay out a defined plan to carry them out and properly present them as a finished research results in the form of scientific papers, either for journals or international conferences. I write my papers using mainly Latex. I have also been a paper reviewer for several international, high-impact journals.


Software Development

I work mainly with Python, usually creating simple and small services which I expand and integrate with as many services as necessary to create a microservice-based ecosystem and architecture. I also use Javascript if I need to create a website, usually trying to keep things simple with few libraries and external tools. Finally, I am well versed with IDEs like PyCharm and WebStorm.


System's Administration

As a computer engineer I like to invest time looking for better ways to manage a computer system, particularly using virtualization technologies such as Docker or LXC containers, or even playing with novel and imposing 'Infrastructure as code' technologies such as Salt.

Projects Portfolio
Energy capping of containers and applications (2019)

A special usage case is studied where the energy of containers is monitored and controlled with the objective of being able to set an energy limit and have it enforced. In addition, it is also interesting to consider energy as another resource, such as CPU or memory, which can be accountable and thus, scheduled and shared between different containers, applications or users.

This environment, where the energy of containers is controlled and can be limited, is created with the combination of several tools. For the monitoring of the resources and the processing and handling of the time series, BDWatchdog is used. The CPU scaling to implement the energy limitation and enforcement is implemented by using the Serverless container framework. Finally, lying at the core of this study is the energy monitoring, which is provided by the PowerAPI framework developed at the Spirals group of INRIA Lille – Nord Europe. You can check out their tool here.

Energy-capped PageRank workload
Energy-capped PageRank workload
Normally executed PageRank workload
Normally executed PageRank workload
Learn more...
Serverless Containers framework (2018)

This framework has been developed around the idea of the serverless computing and containers, that is, to only allocate for the resources an application needs at any moment, and in the same way, to bill for such resources. All the resource monitoring is perform by using BDWatchdog.

Cluster-level allocated and used for CPU for sequential TeraSort and PageRank workloads
Cluster-level allocated and used for CPU for concurrent TeraSort workloads
Cluster-level allocated and used for CPU for a FixWindow streaming workload
Node-level CPU with the resource amounts and limit thresholds for a FixWindow streaming workload
Cluster-level allocated and used memory for a TeraSort workload
Cluster-level allocated and used CPU for a PageRank workload

By deploying applications on containers around this autoscaling platform, the user is given a serverless environment where the resources allocated vary over time to adjust to those actually used, all the while maintaining a virtual environment that is really close to a virtual machine or traditional cloud instance.

More specifically, this tools is able to change the resource limits applied to a container, or a group of containers, in real time and in an automatic way without the need for any user intervention.

Learn more...
BDWatchdog, container monitoring and profiling framework (2017)
Single instance CPU time series for a PageRank workload executed with Hadoop
Cluster-aggregated CPU time series for a TeraSort workload executed with Scala
Cluster-aggregated memory time series for a TeraSort workload executed with Scala
Cluster-aggregated disk time series for a TeraSort workload executed with Scala
Cluster-aggregated network time series for a TeraSort workload executed with Scala
example of flame graph
Example of a flame graph extracted during the execution of a Hadoop TeraSort

BDWatchdog is a framework to provide both per-proces resource monitoring (CPU, memory, disk and network), as well as profiling information. All of this information is generated, processed and stored in real time by using a time series database (OpenTSDB) for monitoring and a document-based database (mongodb) for profiling. The architecture used is also scalable to support large clusters.

The end result of using this framework is two-fold:

  • Time series detailing how the resources were used along the execution time, with the possibility of filtering by using different tags like host, command, as well as performing aggregation operations like summatory o average.
Learn more...
  • Interactive flame graphs showing the percentage of time spent in a specific Java class, being subsequently subdivided across its calls.
Big Data PaaS with disk-aware scheduling (2016)
[CPU] CESGA Paas with direct host-container disk access
[CPU] OpenStack with dedicated disk-backed volumes (1:1 ratio)
[CPU] OpenStack with disk-shared volumes in a 4:1 ratio
[Disk utilization] CESGA Paas with direct host-container disk access
[Disk utilization] OpenStack with dedicated disk-backed volumes (1:1 ratio)
[Disk utilization] OpenStack with disk-shared volumes in a 4:1 ratio

This platform was designed from the start with novel concepts to specifically enhance Big Data workloads in two ways:

  • Docker containers are used instead of virtual machines backed by hypervisors.
  • Thanks to the use of Mesos as the resource scheduler, a disk-aware capability is implemented and added. This feature gives the users the option of choosing disks as dedicated resources in the same way as CPUs and memory.
Learn more...
Work/Research Experience

April 2019 - July 2019

Inria Lille, France

Research stay
Integration of tools to create an energy limiting framework for containers

For 3 months I stayed at Lille to work on the Spirals group in the integration of their energy measuring PowerAPI tool and both of my frameworks, BDWatchdog and the Serverless containers. The result of this integration, an environment where energy is treated as just another resource and thus, it can be accountable and shareable, is described above.

July 2015 - June 2016

Internship
Development of a Big Data PaaS in the Galicia Supercomputer

For a year I worked in the Galicia Supercomputer Center (CESGA) developing a novel PaaS where Big Data applications can be easily deployed by users.

In this platform applications also benefit from specific data-processing enhancements such as lighter virtualization using Docker containers and direct and local disk access by including whole disks as a configurable resource along CPU and memory.

Research results

Conference paper in:

IEEE Cluster 2020
14-17 September
Kobe, Japan
Power Budgeting of Big Data Applications in Container-based Clusters

Jonatan Enes, Guillaume Fieni, Roberto R. Expósito, Romain Rouvoy, Juan Touriño 2020 in IEEE International Conference on Cluster Computing, CLUSTER 2020, Kobe, Japan, 2020, pp. 281–287.

Full length article in:

2020 - Future Generation Computer Systems (FGCS)
Real-time resource scaling platform for Big Data workloads on serverless environments

Jonatan Enes, Roberto R. Expósito, Juan Touriño, Future Generation Computer Systems 105 (2020) 361-379.

Full length article in:

2018 - Future Generation Computer Systems (FGCS)
BDWatchdog: Real-time monitoring and profiling of Big Data applications and frameworks

Jonatan Enes, Roberto R. Expósito, Juan Touriño, Future Generation Computer Systems 87 (2018) 420–437.

Full length article in:

2018 - Journal of Grid Computing
Big Data-oriented PaaS architecture with disk-as-a-resource capability and container-based virtualization

Jonatan Enes, Javier L. Cacheiro, Roberto R. Expósito, Juan Touriño, Journal of Grid Computing 16 (4) (2018) 587–605

Full length article in:

2018 - Future Generation Computer Systems (FGCS)
Bdev 3.0: Energy efficiency and microarchitectural characterization of big data processing frameworks

Jorge Veiga, Jonatan Enes, Roberto R. Expósito, Juan Touriño, Future Generation Computer Systems 86 (2018) 565–581.

Volunteering

I strongly believe that computing science has the duty of serving as a tool to aid research that tackles society's current biggest challenges, among which we could include climate change, food and water security or health.

Volunteering computing is a type of distributed computing that has been going on since the 90's (with SETI as its first project) and that allows users of the whole world to connect their computers to a network in order to donate their computing resources, thus creating a cluster. This cluster offers researchers of any country or institution equal and usually free access to a pool of resources, with the only condition that any research result has to be given back to the community.

I have been a proud volunteer since 2010 (My stats), currently computing on the international CRUNCHERS SANS FRONTIERS team for the World Community Grid project (one of the many available), an umbrella project that focuses on humanitarian research. If you are interested in volunteering computing, you can check out the BOINC client and the list of available projects.

Team's motto: To crunch and to serve!

Contact Me

Email:

You can contact me through mail using the address: jonatan.enes@udc.es

Jonatan Enes

© Creative CV. All rights reserved.
Design - TemplateFlip