Why we built Etiq

Chris Billingham

March 28, 2025

Blog

Let's be honest about data science for a moment. It's not just challenging because of complex algorithms or massive datasets. It's challenging because data scientists are playing a mental game of 3D chess that their tools simply don't support.

At Etiq AI, we've spent countless hours listening to data scientists explain their real-world challenges. We've observed them struggling with three different worldviews simultaneously, trying to make sense of legacy code, and attempting to explain complex pipelines to non-technical stakeholders.

We've built a solution that works the way data scientists do - not the other way around. We're not here to tell data scientists how to do their jobs or force them to learn yet another tool. We're here to make their lives easier by responding to their actual needs - making debugging effortless, testing straightforward, and production readiness achievable.

But what are the real challenges data scientists face and how are Etiq addressing them with empathy and practicality.

The Problems for Data Scientists in the Real World

You're Playing 3D Chess in Your Head

If you've ever been 600 lines deep into a script and wondered, "Who am I anymore?" you're not alone. As a data scientist, you're not just thinking about code like a software engineer would. You're simultaneously juggling three different states:

What your code is doing - the functions, the logic flow
What your data is doing - its shape, distributions, and quality
How these two interact - which transformations affect which features

This mental juggling act becomes increasingly difficult as projects grow in complexity. It's no wonder that people get lost sometimes - you can be deep into writing code and it's difficult to know how you got there, what's happening to your data along the way, and how all these pieces fit together.

Your Tools Don't Talk to Each Other

The current landscape of tools is fundamentally split, leaving data scientists caught in the middle. As we've observed when talking to teams, you're often given access to tools built for software engineers, or software that is focussed on your data but they are fundamentally different tools, targeted at different users and jobs.

These separate domains don't communicate with each other, forcing data scientists to mentally bridge the gap. When something goes wrong, you're often left piecing together what happened across disconnected systems with no clear way to see how data and code interactions led to the problem.

You're Forced to Change How You Work

Many tools claim to help data scientists but actually add to their workload by imposing new requirements and workflows. So many tools out there today that force you to change to leverage what they promise. They ask you to learn this new package, learn this new system, download a new platform. If you do all these things, then you'll have one to this brand new capability.

The reality is that data scientists ask for and want tools that adapt to them, not the other way around. They want solutions that "work where you work" rather than forcing them to learn entirely new systems or drastically change their approach.

"Just Put It in Production" - What Does That Even Mean?

One of the most challenging aspects of being a data scientist is the nebulous concept of "production readiness." As we've heard many times from data scientists themselves "Does anyone, ever, teach you what production means, and how you get your code into it?"

We heard from one Data Scientist who reflected that at one of their first roles, putting a model into production meant turning their computer on at the start of the day, and running a script, then pasting the outputs into a database. This was the reality of "productionisation".

This knowledge gap creates real problems because production readiness encompasses many concerns that aren't taught in academic settings or through online courses. Without clear guidance, data scientists struggle to bridge the gap between a working model and a reliable system that can consistently deliver business value.

Debugging Feels Like Looking for a Needle in a Haystack

When models break - and they always will at some point - most tools are great at letting you know something's wrong but terrible at helping you fix it. We've heard many times that lots of software is happy to provide a large siren warning you that yes, there is a problem. But what to do? Less so.

It's like being told, "There's a needle in this haystack. Good luck finding it!" You're left digging through code, retracing transformations, and trying to mentally reconstruct exactly what happened - a process that can take days or even weeks for complex models.

This reactive approach to debugging is not only frustrating but also a massive waste of time and talent. Data scientists should be building new capabilities, not spending endless hours troubleshooting older, legacy ones.

You're Speaking Greek to Your Stakeholders

Beyond the technical challenges, data scientists face the crucial task of explaining their work to stakeholders who may not understand the technical details. What business stakeholders really want to know is much simpler, where does the data come from, what processes does it go through, and where do the outcomes go?

Creating this kind of clear documentation often requires substantial manual effort, taking time away from model development and improvement. Without better tools for visualization and explanation, data scientists struggle to bridge the communication gap.

How Etiq is Solving These Problems

We Come to You, Not the Other Way Around

Etiq AI integrates directly into your existing workflow and IDE. Our approach is different, we meet you where you are. If you've got a script, we can scan it and understand it without requiring you to change a single line of code. We integrate directly with the tools you already use, whether that's VSCode now or notebooks in our upcoming release.

Everything runs locally by default too, so there's no worries about data and code leaving your systems. Etiq will run just as well in your workplace, as it would on a desert island sat in the sun. This ensures your data stays secure and that you can work in even the most locked-down environments.

We Make the Invisible Visible

Our Data and Code Lineage visualizes the complex relationship between your code and data, mapping each transformation and showing exactly how data flows through your model.

This visualization solves a fundamental problem, how do we support Data Scientists and their three-world problem? Lineage ensures you can see at a glance where your data comes from, what functions process it, and how it all flows together. No more keeping all of that in your head. It provides that crucial understanding, helping you see how everything interconnects in a way that's simply not possible by visually scanning through code.

This representation isn't just helpful for data scientists, it's also valuable for explaining your work to others, providing a clear logical flow and helping you and your teams communicate not only what your code is doing, but how it happens.

We Tell You What to Test (And Where)

Building robust models requires testing, but knowing what to test and where to test it isn't always obvious. We've heard time and again from Data Scientists that teams just aren't building tests in or where tastes are being built they are either incorrect or in the wrong place.

Our Testing Recommendations solve this by identifying exactly what tests you need and where to put them. We don't just tell you that you need a test, we tell you exactly where to put it, down to the line number, and provide guidance on what the test does and why it matters. This removes the guesswork and ensures your models are robust before deployment.

We Find the Root Cause, Not Just the Symptoms

When tests fail, Etiq doesn't just tell you something's wrong, it tells you exactly where the problem started and how to fix it. Our RCA agents use all the information at their disposal, the code, the data, the lineage, and the nature of the test failure and pinpoint where the error first occurred, and provide a suggested fix.

This is a fundamentally different approach to debugging. Instead of just getting an alert that something's wrong, you get precise information about where the issue originated and clear guidance on how to resolve it.

It's the difference between being told "Your model is failing" and being told "You have target leakage starting at line 27, and these are the variables causing the problem." This level of precision turns what might have been days of debugging into a quick, targeted fix.

We Document Your Work Automatically

The Lineage visualization provides instant documentation of your ML pipeline that both technical and non-technical stakeholders can understand.

When someone asks how your model works, you can show them a clear visual representation rather than diving into technical jargon. This bridges the gap between data science and business teams, making it easier to explain your work to compliance and risk teams, demonstrate value to leadership, and onboard new team members.

Effortless Data Science, with Etiq

Data science doesn't have to be so hard. The unique challenges data scientists face aren't due to lack of skill or effort, but rather to a lack of tools that truly understand their workflow and needs.

At Etiq AI, we've built a solution that respects how data scientists actually work. We don't force you to change your process or learn yet another system. Instead, we provide tools that integrate seamlessly into your existing workflow, reducing cognitive load, automating testing, and making debugging effortless.

As we like to say here, we want you to "never debug again." That might be aspirational, but it captures our goal: to make your life as a data scientist easier, more productive, and less stressful.

We're with you every step of the way, without getting in the way. That's the Etiq AI difference. Don't believe us, sign up today for a free trial and find out for yourself.

‍