Will GenAI Replace Data Engineers? No – And Here’s Why.

These days, maintaining with the newest developments in GenAI is more durable than saying “multimodal mannequin.” It looks as if each week some shiny new answer launches with the lofty promise of reworking our lives, our work, and the way in which we feed our canines.

Data engineering isn’t any exception.

Already within the wee months of 2024, GenAI is starting to upend the way in which information groups take into consideration ingesting, remodeling, and surfacing information to customers. Tasks that have been as soon as basic to information engineering at the moment are being achieved by AI – normally quicker, and generally with the next diploma of accuracy.

As acquainted workflows evolve, it naturally begs a query: will GenAI substitute information engineers?

While I can not in good conscience say ‘not in one million years’ (I’ve seen sufficient sci-fi films to know higher), I can say with a fairly excessive diploma of confidence “I do not assume so.”

At least, not anytime quickly.

Here’s why.

The present state of GenAI for information engineering

First, let’s begin off our existential journey by trying on the present state of GenAI in information engineering – from what’s already modified to what’s more likely to change within the coming months.

So, what is the largest influence of GenAI on information engineers in Q1 of 2024?

Pressure.

Our personal survey information reveals that half of knowledge leaders are feeling important stress from CEOs to put money into GenAI initiatives on the expense of higher-returning investments.

For information engineering groups, that may imply kicking off a race to reconfigure infrastructure, undertake new instruments, determine the nuances of retrieval-augmented era (RAG) and fine-tuning LLMs, or navigate the countless stream of privateness, safety, and moral issues that shade the AI dialog.

But it is not all philosophy. On a extra sensible degree, GenAI is tangibly influencing the methods information engineers get work carried out as nicely. Right now, that features:

  • Code help: Tools like GitHub Copilot are able to producing code in languages like Python and SQL – making it quicker and simpler for information engineers to construct, check, preserve, and optimize pipelines.
  • Data augmentation: Data scientists and engineers can use GenAI to create artificial information factors that mimic real-world examples in a coaching set – or deliberately introduces variations to make coaching units extra various. Teams may also use GenAI to anonymize information, enhancing privateness and safety.
  • Data discovery: Some information leaders we have spoken with are already integrating GenAI into their information catalogs or discovery instruments as nicely to populate metadata, reply advanced questions, and enhance visibility, which in flip might help information customers and enterprise stakeholders use GenAI to get solutions to their questions or construct new dashboards with out overburdening information groups with advert hoc requests.

And by and huge, these developments are excellent news for information engineers! Less time spent on routine work means extra time to spend driving enterprise worth.

And but, as we see automation overlap with extra of the routine workflows that characterize an information engineer’s day-to-day, it is regular to really feel a little bit… uncomfortable.

When is GenAI going to cease? Is it actually going to eat the world? Are my pipelines and infrastructure subsequent?!

Well, the reply to these questions are, “most likely by no means, however most likely not.” Let me clarify.

Why GenAI will not substitute information engineers

To perceive why GenAI cannot substitute information engineers-or any really strategic position for that matter-we have to get philosophical for a second. Now, if that kind of tte–tte makes you uncomfortable, it is okay to click on away. There’s no disgrace in it.

You’re nonetheless right here?

Okay, let’s get Socratic.

Socrates freelanced as an information engineer in his spare time. Image courtesy of Monte Carlo.

Artificial “intelligence” is restricted

First factor’s first-let’s bear in mind what GenAI stands for: “generative synthetic intelligence”. Now, the generative and synthetic elements are each pretty apt descriptors. And if it stopped there, I’m unsure we would even be having this dialog. But it is the “intelligence” half that is tripping folks up nowadays.

You see, the power to imitate pure language or produce just a few strains of correct code does not make one thing “clever.” It does not even make someone clever. A bit extra useful maybe, however not clever within the true sense of that phrase.

Intelligence goes past spitting out a response to a fastidiously phrased query. Intelligence is data and interpretation. It’s creativity. But regardless of how a lot information you pump into an AI mannequin, on the finish of the day, it is nonetheless ostensibly a regurgitation machine (albeit a really subtle regurgitation machine).

AI is not able to the summary thought that defines an information engineer’s intelligence, as a result of it is not able to any ideas in any respect. AI does what it is advised to do. But you want to have the ability to do extra. Much more.

AI lacks enterprise understanding

Understanding the enterprise issues and use circumstances of knowledge is on the coronary heart of knowledge engineering. You want to speak with your small business customers, take heed to their issues, extract and interpret what they really want, after which design an information product that delivers significant worth primarily based on what they meant-not essentially what they stated.

Sure, AI may give you a head begin as soon as you work all of that out. But do not give the pc credit score for automating a course of or constructing a pipeline primarily based on your deep analysis. You’re the one who needed to sit in that assembly when you can have been taking part in Baldur’s Gate. Don’t diminish your sacrifice.

AI cannot interpret and apply solutions in context

Right now, AI is programmed to ship particular, helpful outputs. But it nonetheless requires an information group to dictate the answer, primarily based on an infinite quantity of context: Who makes use of the code? Who verifies it is match for a given use case? Who will perceive how it should influence the remainder of the platform and the pipeline structure?

Coding is useful. But the true work of knowledge engineers includes a excessive diploma of advanced, summary thought. This work – the reasoning, problem-solving, understanding how items match collectively, and figuring out the way to drive enterprise worth by way of use circumstances – is the place creation occurs. And GenAI is not going to be able to that type of creativity anytime quickly.

AI essentially depends on information engineering

On a really primary degree, AI requires information engineers to construct and preserve its personal functions. Just as information engineers personal the constructing and upkeep of the infrastructure underlying the info stack, they’re turning into more and more answerable for how generative AI is layered into the enterprise. All the high-level information engineering abilities we simply described – summary pondering, enterprise understanding, contextual creation – are used to construct and preserve AI infrastructure as nicely.

And even with essentially the most subtle AI, generally the info is simply unsuitable. Things break. And not like a human-who’s able to acknowledging a mistake and correcting it-I can not think about an AI doing a lot self-reflecting within the near-term.

So, when issues go unsuitable, somebody must be there babysitting the AI to catch it. A “human-in-the-loop” if you’ll.

And what’s powering all that AI? If you are doing it proper, mountains of your personal first-party information. Sure an AI can clear up some fairly menial problems-it may even offer you place to begin for some extra advanced ones. But it will possibly’t do ANY of that till somebody pumps that pipeline stuffed with the best information, on the proper time, and with the best degree of high quality.

In different phrases, regardless of what the flicks inform us, AI is not going to construct itself. It is not going to keep up itself. And it positive as information sharing is not gonna begin replicating itself. (We nonetheless want the VCs for that.)

What GenAI will do (most likely)

Few information leaders doubt that GenAI has a giant position to play in information engineering – and most agree GenAI has monumental potential to make groups extra environment friendly.

“The capability of LLMs to course of unstructured information goes to alter a whole lot of the foundational desk stakes that make up the core of engineering,” John Steinmetz, prolific blogger and former VP of knowledge at healthcare staffing platform shiftkey, advised us not too long ago. “Just like at first everybody needed to code in a language, then everybody needed to know the way to incorporate packages from these languages – now we’re transferring into, ‘How do you incorporate AI that can write the code for you?’”

Historically, routine guide duties have taken up a whole lot of the info engineers’ time – assume debugging code or extracting particular datasets from a big database. With its capability to near-instantaneously analyze huge datasets and write primary code, GenAI can be utilized to automate precisely these sorts of time-consuming duties.

Tasks like:

  • Assisting with information integration: GenAI can robotically map fields between information sources, recommend integration factors, and write code to carry out integration duties.
  • Automating QA: GenAI can analyze, detect, and floor primary errors in information and code throughout pipelines. When errors are easy, GenAI can debug code robotically, or alert information engineers when extra advanced points come up.
  • Performing primary ETL processes: Data groups can use GenAI to automate transformations, corresponding to extracting data from unstructured datasets and making use of the construction required for integration into a brand new system.

With GenAI doing a whole lot of this monotonous work, information engineers shall be freed as much as give attention to extra strategic, value-additive work.

“It’s going to create an entire new type of class system of engineering versus what everybody appeared to the info scientists for within the final 5 to 10 years,” says John. “Now, it should be about leveling as much as constructing the precise implementation of the unstructured information.”

Zach wilson data engineering future

How to keep away from being changed by a robotic

There’s one huge caveat right here. As an information engineer, if all you are able to do is carry out primary duties like those we have simply described, you most likely ought to be a little bit involved.

The query all of us have to ask-whether we’re information engineers, or analysts, or CTOs or CDOs-is, “are we including new worth?”

If the reply isn’t any, it could be time to degree up.

Here are just a few steps you may take as we speak to be sure to’re delivering worth that may’t be automated away.

  1. Get nearer to the enterprise: If AI’s limitation is a scarcity of enterprise understanding, then you definately’ll wish to enhance yours. Build stakeholder relationships and perceive precisely how and why information is used – or not – inside your group. The extra you already know about your stakeholders and their priorities, the higher geared up you will be to ship information merchandise, processes, and infrastructure that meet these wants.
  2. Measure and talk your group’s ROI: As a bunch that is traditionally served the remainder of the group, information groups threat being perceived as a price heart relatively than a revenue-driver. Particularly as extra routine duties begin to be automated by AI, leaders have to get snug measuring and speaking the big-picture worth their groups ship. That’s no small feat, however fashions like this information ROI pyramid provide shove in the best route.
  3. Prioritize information high quality: AI is an information product-plain and easy. And like several information product, AI wants high quality information to ship worth. Which means information engineers have to get actually good at figuring out and validating information for these fashions. In the present second, that features implementing RAG accurately and deploying information observability to make sure your information is correct, dependable, and match in your differentiated AI use case.

Ultimately, gifted information engineers solely stand to learn from GenAI. Greater efficiencies, much less guide work, and extra alternatives to drive worth from information. Three wins in a row.

Call me an optimist, but when I used to be inserting bets, I’d say the AI-powered future is shiny for information engineering.

This article was initially printed right here.

The publish Will GenAI Replace Data Engineers? No – And Here’s Why. appeared first on Datafloq.