About

Regards! I am Jonatan Enes, a Computer Engineer obsessed with system's performance, frugality and ecologism.

Wait... Maybe everything is related!!

Currently I am doing my PhD in the University of A Coruña (UDC), in Spain. You can check out my work below.

Basic Information
Age:
25
Email:
jonatan.enes@udc.es
Language:
Spanish (native), English, Galician, Catalan
Education

2016 - present

PhD Student
PhD in Computer Science

University of A Coruña

Currently I am working on my PhD in Big Data in the University of A Coruña (UDC). My research focuses on resource (CPU, memory, disk, network and energy) analysis on Big Data infrastructures, specially using containers as the infrastructure virtualization technology.

2015 - 2016

Master's Degree
Master in Big Data

University of Santiago de Compostela

For my master's degree I moved to the near-city of Santiago de Compostela (yes!, at the ending of the Way of Saint James pilgrimage). There, besides making new friends and finding love, I got my 1-year master's degree in Big Data.

2011 - 2015

Bachelor's Degree
Degree in Computer Engineering

University of A Coruña

After finishing high school I moved to A Coruña, in the northwest of Spain, to continue with my college education. In Coruña I studied a 4-year degree in Computer Science with a specilization in Computer Engineering.

In this degree I achieved a Best Student award.

Professional Skills
  • Resource monitorization
  • Application profiling
  • Performance anomalies troubleshooting
  • Power consumption monitoring
  • Data visualization:
    • Javascript D3
    • Python Pandas
    • Matplotlib
    • Gnuplot
  • Machine Learning:
    • Python
    • R
  • Processing engines:
    • Hadoop MapReduce
    • Spark
  • Infrastructure virtualization:
    • Hypervisors (VirtualVox,VMWare)
    • Containers (Docker and LXD/LXC)
  • Infrastructure deployment:
    • Salt
  • Infrastructure management:
    • MDadm
    • Iptables
  • Programming languages (best skills):
    • Python
    • Java
    • C
  • Full stack development:
    • LAMP
    • LEMP
    • MEAN
  • Databases:
    • *SQL
    • MongoDB
    • CouchDB
    • Cassandra
    • OpenTSDB
  • Web:
    • AngularJS
  • Frameworks:
    • PyCharm
    • Eclipse
    • WebStorm
  • Latex document generation
  • Web development and design
Projects Portfolio
Automatic container resource scaling platform (2018)

This resource Autoscaling tool has been developed around the idea of the serverless computing, that is, to only allocate for the resources an application needs at any moment, and in the same way, to bill for such resources. All the resource monitoring is perform by using BDWatchdog.

More specifically, this tools is able to change the resource limits applied to a container, or a group of containers, in real time and in an automatic way without the need for any user intervention.

Cluster-level allocated and used resource amounts for memory during a TeraSort workload
Cluster-level allocated and used resource amounts for CPU during a PageRank workload
Node-level CPU scaling with the allocated and used amounts as well as the lower and upper limit thresholds for a FixWindow streaming workload
Cluster-level allocated and used resource amounts for CPU during a FixWindow streaming workload

By deploying applications on containers around this autoscaling platform, the user is given a serverless environment where the resources allocated vary over time to adjust to those actually used, all the while maintaining a virtual environment that is really close to a virtual machine or traditional cloud instance.

Learn more...
BDwatchdog monitoring and profiling framework (2017)
Single instance CPU time series for a PageRank workload executed with Hadoop
Cluster-aggregated CPU time series for a TeraSort workload executed with Scala
Cluster-aggregated memory time series for a TeraSort workload executed with Scala
Cluster-aggregated disk time series for a TeraSort workload executed with Scala
Cluster-aggregated network time series for a TeraSort workload executed with Scala
example of flame graph
Example of a flame graph extracted during the execution of a Hadoop TeraSort

BDWatchdog is a framework to provide both per-proces resource monitoring (CPU, memory, disk and network), as well as profiling information. All of this information is generated, processed and stored in real time by using a time series database (openTSDB) for monitoring and a document-based database (mongodb) for profiling. The architecture used is also scalable to support large clusters.

The end result of using this framework is two-fold:

  • Time series detailing how the resources were used along the execution time, with the possibility of filtering by using different tags like host, command, as well as performing aggregation operations like summatory o average.
Learn more...
  • Interactive flame graphs showing the percentage of time spent in a specific Java class, being subsequently subdivided across its calls.
Big Data PaaS with disk-aware scheduling (2016)
CPU for a cluster deployed in our Paas with direct disk-access
CPU for a cluster deployed on OpenStack with dedicated disk-backed volumes
CPU for a cluster deployed on OpenStack with volumes sharing an underlying disk in a 4:1 ratio
Disk utilization for a cluster deployed in our Paas with direct disk-access
Disk utilization for a cluster deployed on OpenStack with dedicated disk-backed volumes
Disk utilization for a cluster deployed on OpenStack with volumes sharing an underlying disk in a 4:1 ratio

This platform was designed from the start with novel concepts to specifically enhance Big Data workloads in two ways:

  • Docker containers are used instead of virtual machines backed by hypervisors.
  • Thanks to the use of Mesos as the resource scheduler, a disk-aware capability is implemented and added. This feature gives the users the option of choosing disks as dedicated resources in the same way as CPUs and memory.

Learn more...
Work Experience

July 2015 - June 2016

Internship
Development of a Big Data PaaS in the Galicia Supercomputer

For a year I worked in the Galicia Supercomputer Center (CESGA) developing a novel PaaS where Big Data applications can be easily deployed by users.

In this platform applications also benefit from specific data-processing enhancements such as lighter virtualization using Docker containers and direct and local disk access by including whole disks as a configurable resource along CPU and memory.

Research results

Full length article in:

2018 - Future Generation Computer Systems (FGCS)
BDwatchdog: Real-time monitoring and profiling of big data applications and frameworks

Jonatan Enes, Roberto R. Expósito, Juan Touriño, Future Generation Computer Systems 87 (2018) 420–437.

Full length article in:

2018 - Journal of Grid Computing
Big data-oriented PaaS architecture with disk-as-a-resource capability and container-based virtualization

Jonatan Enes, Javier L. Cacheiro, Roberto R. Expósito, Juan Touriño, Journal of Grid Computing 16 (4) (2018) 587–605

Full length article in:

2018 - Future Generation Computer Systems (FGCS)
Bdev 3.0: Energy efficiency and microarchitectural characterization of big data processing frameworks

Jorge Veiga, Jonatan Enes, Roberto R. Expósito, Juan Touriño, Future Generation Computer Systems 86 (2018) 565–581.

Contact Me

Email:

You can contact me through mail using the address: jonatan.enes@udc.es