The engineer of Machine Learning

The press and the academic community have been talking about data science and data scientists for just over a decade. Although there has always been some debate about what "data scientist" means, we have reached the point where many universities offer data science programs: masters, certifications etc. The world was a simpler place when all we knew were statistics. But simplicity is not always healthy, and the diversity of data science programs demonstrates that there is a wide demand for professionals in the field.

As the field of data science developed, several specialties emerged that can be confused. Companies use the terms "data scientist" and "data science team" to describe a variety of functions, including:

individuals who perform ad hoc analysis and reporting, including business intelligence (BI) and business analysis;

persons responsible for statistical analysis and modelling, which in many cases involves experiments and formal testing;

machine learning modelers.

These data scientists are more like machine learning modelers (machine learning, in free translation), except that they are building something: they are product-centric rather than researchers.

THE ROLE OF THE MACHINE LEARNING ENGINEER

So we can say that to some extent machine learning engineers do what software engineers (and good data engineers) have done all the time: they have stronger software engineering skills than typical data scientists; can work with engineers who maintain production systems and understand software development methodologies, agile practices and the full range of tools that modern software developers use.

Because their focus is on making data products work in the production context, machine Learning engineers think holistically and take components such as the registry or testing infrastructure into account. They are prepared for specific problems of monitoring data products in production. There are many features in application monitoring, but machine Learning has requirements that go even further.

Machine Learning engineers are involved in something called deep learning (deep learning) that explores software architecture and design, plus practices with A/B testing - most importantly, they don’t just "understand" the A /B test, they know how to do A /B testing on production systems. They understand registration and security issues and know how to make registration data useful to data engineers.

Machine learning and data science

How is machine learning different from "data science"? Data science is clearly the most comprehensive term. But there’s something significantly different about the way deep learning works. It’s always been convenient to think of a data scientist exploring the data: looking for alternative approaches and different models to find one that works. Classics like Tukey’s Exploratory Data Analysis set the tone for what many data scientists have been doing: exploring and analyzing large amounts of data to find the value that is hidden in them.

Deep Learning significantly changes this model, as scientists do not dwell directly on the data: you know the result you want, but you are allowing the software to discover it. Do you want to create a machine that outperforms chess champions, encodes photos correctly, or translates audio between different languages? In machine Learning, these goals are not achieved through careful exploration; in many cases, there is a lot of data to explore in any meaningful sense and many dimensions. The idea of machine learning is exactly this: it builds the model itself, making its own exploration and adjustment of data to achieve the proposed objectives.

As a result, data scientists do not exploit as much. Their goal is not to find significance in the data: they believe that meaning is already present. Instead, its goal is to build a machine that can analyze the data and produce results on its own, a neural network that works and can be adjusted to produce reliable results.

There is less emphasis on statistics - in fact, the holy grail of machine Learning is "democratization," reaching a point where machine learning systems can be done by subject matter experts, not doctors and PhDs. The goal is for a chess player to build the next generation of Deep Blue, not a researcher, and for a linguist to build the mechanism that makes machine translation into Spanish.

This change has a corresponding effect on the machine learning engineer profession. In machine learning, models are not static. Models can become obsolete over time. Someone needs to monitor the system, retraining the machine when needed. This is not a job that developers who initially created the system will find attractive, but it is deeply technical. In addition, it requires an understanding of monitoring tools, which were not designed with data applications in mind.

The safety of processes

Any practicing software developer or IT employee must understand the security that machine Learning processes require. As far as we know, we have not yet seen significant attacks on machine learning systems. But they will be increasingly tempting targets. What new types of vulnerabilities does machine learning present? Is it possible to "poison" the data in which the system is trained, or force a system to have a bias when it should not? The fact is that this is a new system and we need to expect an entirely new class of vulnerabilities.

As the tools improve, we will see more data scientists who can make the transition to production systems. But we’ll still need data engineers and machine learning engineers: engineers who are literate in data science and machine learning, that understand how to deploy and run systems in production and are up to the challenges of supporting machine learning products.

There are many roles within companies that are changing. Also understand the role of the CDO (Chief Digital Officer), and its responsibilities within the transformation and the differences of assignments with the CDO (Chief Data Officer).

Team of commanders, experts prepared to take insights from the market and transform into relevant content

tripulação ET
tripulação ET