Behind the scenes: how we shipped AI Query Assist

February 28, 2024
In Blog, Tech
ai, Generative, Tech blog
3 min read

Ravi Chandra, Chief Product & Technology Officer at Dexibit reflects on the team’s experience discovering, designing and going to market with generative AI

Product people, WELCOME TO THE WORLD OF PROBABILISTIC COMPUTING 🎲🎲!

I’ve been thinking about “the probability of providing a valuable result to the user” as a success metric for new AI product capabilities. Most of the time you’re winning, but then… snake eyes… and you’ve just served your user a profoundly terrible result. Oof.

If you didn’t guess it, I’m talking about generative AI here. Very recently we shipped the results of our very first foray into this space.

It was quite the experience. I’ve put to paper and captured six key lessons learned, and the journey to get there, in a blog post linked in the comments. Please have a read and let me know your thoughts!

We recently shipped our very first (generative) AI product capability. The journey was arduous. It all started with a wee spark, but to ignite that into something genuinely valuable for our customers took a concerted effort through unfamiliar territory…

As the saying goes: nothing ventured, nothing gained. I thought I might pull back the curtain and share some of the trials and tribulations along our journey, and hopefully inspire others in our little community.

It started on a hot and humid Auckland day. Over the summer break, we tend to put a pause on BAU, so experiments are du jour. One of our amazing engineers was playing with LLMs in an isolated and focused part of our application. And, with just a handful of lines of code, we had the spark.

Assembled together for our first team catch up of the year, we were greeted with an unexpected demo: generative AI in our product! Most visualisations in our product can be refined by writing raw SQL into a filter box. While extremely powerful it’s a difficult feature to wield successfully, because a user must know correct SQL syntax and have knowledge of the underlying data model else they’ll encounter a somewhat cryptic error message.

This hidden prowess was revealed to the team like a magic trick. Everything still appeared normal. We were in the demo, looking up syntax and column names, until, wait… we weren’t. John, the engineer-turned-magician of this story, had started writing into the box in natural language. Asking, no, demanding, what he wanted. He clicked the new magic wand button… and everything just worked, every time! It felt like magic and fittingly earned the internal codename of “John’s Magic Wand.”

Learning #1: end-to-end experiments are hugely important to break new ground. 🧪

The successful experiment gave us the confidence to invest further with a desire to take it to our customers. We quickly tidied things up and shipped it to get (internal) user feedback. However, almost immediately the team experienced a sense of disappointment. Once we pushed on non-trivial use cases the quality of the results from our LLM were revealed to be pretty terrible. We had well and truly landed in the trough of disillusionment.

Learning #2: actually, there is no magic. ✨

Upon reflection, where we were was about right.

Our technical implementation was barely more than a generative AI “hello world.” Our key stakeholders’ expectations were sky high and were laced with assumptions and euphoria from being blown away by ChatGPT. Realigning around realistic expectations was harder than you might think, because this was so foreign to all of us.

Our customer team, fierce advocates for our users, expected AI to get things right, every single time. Our engineering team pushed for a greater onus on the customer, and education on effective prompt writing. Our product team tried to understand the thresholds of viability across jobs to be done. As you can see, there was a very healthy, but real, tension that led to some interesting discussions.

At the end of the day, getting alignment was a very important step to making forward progress on the initiative. The shared understanding could then dovetail into (what I believe to be) a coherent customer go-to-market plan, ensuring we provide more value than confusion.

Learning #3: welcome to the era of probabilistic computing! 🎲

Traditional software generally either works or it doesn’t, barring some nefarious bugs or corner cases. Not so for an AI product.

Evaluating the performance of our new feature was its own challenge, and a big contributor to the difficulty of setting expectations.

The output of our new feature is not strictly deterministic for any particular input, and certainly the permissible input space (natural language) is almost infinite. Moreover, the behaviour of LLMs tend to change over time. Thus instead of a classical pass/fail condition, we arrived at an evaluation criteria of “the probability of providing a valuable result to the user.”

It’s difficult to share exactly where we drew the line in the end, but I believe our customers are very likely to experience very good results 😊.

Leaning #4: this isn’t a solved problem, it’s more research than development. 🚧

From an engineering perspective we’d just scratched the surface of LLM technology. Behind this door is a cavernous land of data pipelines, textual embeddings, vector search, retrieval augmentation generation, prompt engineering, and on and on. We quickly realised that context is the key to great LLM results. But, in our case, the context for any particular user request is context dependent. We explored many different prompting techniques that were relevant to our business domain. Our team explored vector search databases, new data pipelines, and APIs in order to shuffle the right contextual data to the right places.

Perhaps, most surprisingly, is that none of this is “plug and play.” Every facet feels immature and unproven. Perversely, it can be an extremely uncomfortable place for a tech team, because there are no paved roads. There is a huge amount of learning required and the pace of change is breakneck. Since our first prototype there have been three new updates to the LLM model we’re using. We’ve loaded screeds of data into a server based vector database only for them to launch a superior serverless variant days later. We’ve used, dropped, and reused many supporting libraries and abstractions in an attempt to figure out what works for us.

Observability has been a constant challenge. An early decision that has definitely stood the test of time was to capture every natural language request and its result. We used this data to better understand what our early users were trying to achieve, which formed the basis of our test suite.

Another key productivity win came from building some internal tooling such that any of our staff could experiment with different LLM prompts in-situ, and see the difference in results. Our designer, very sympathetic to the written word, took this as a natural language challenge. Without engineering involvement or code changes they, and others, could tweak our prompts and found some great ways to improve our AI result accuracy.

Learning #5: can we actually ship this? 🚢

Product work is never complete until it’s shipped in production. At Dexibit, we bias towards shipping early and often, but this was more fundamental than a typical product increment. Despite our product in good order and expectations aligned, we had to slow down to work through important business ramifications to determine if we actually could ship this technology.

Firstly, there are the legal concerns. We had to ensure that we have appropriate leeway within our customer contracts to share input and data via LLM APIs, and appropriate confidentiality security agreements on the vendor side as per our SOC2 compliance requirements.

Then we have social and safety concerns. Can we ensure our product is quite literally safe-for-work and well moderated? Issues like social justice and discrimination are of the upmost importance to us and our customers – have we done our best by them?

There are new pricing models to understand. LLM pricing is typically usage based, determined by the consumption of “tokens.” What is a token, you ask? It’s a chunk of a word, about 4 four letters on average. The rate for a token differs between what you provide as input to the LLM and what is returned as output. Of course, input is a function of user question, context, and prompt which, in our case, is also context dependent. All of this is to say that modelling our cost-to-serve hurts my brain.

And after all that, how do we even launch this: opt-in, opt-out, something else…? What would our customers think? In the end, we concluded that our implementation was valuable enough, and designed to be intentionally optional, such that we could activate it for all while allowing users to opt-out, if desired.

Learning #6: learn by doing! 🎭

I am so glad we’ve launched this new capability. Already, we can see clearly it’s changed the behaviour of our users by lowering the barrier to entry of a very useful existing feature. More profoundly, it’s a new tool in our product-building toolbox that we’ve developed significant intuition around.

It was quite the journey to get here, but I know we’ll look back in hindsight on this moment and wonder “what was all the fuss about?” It has only exacerbated our thirst for new experiments in AI, across a wider gamut of customer, and internal, touch points.

I highly encourage anyone reading this to think about their own possible experiment – and make it happen!

My parting advice is to treat this technology as a means to an end. Yes, by all accounts, we might well be at the precipice of a technological revolution. But that doesn’t excuse us from understanding the problems we need to solve, and continuously questioning the value proposition of an AI solution along the way. Finally, don’t fall foul of the hype. This is a field in its infancy; the rapid pace of innovation is fantastical yet can be hugely overwhelming.

Good luck, and please share back your experience!

Ravi Chandra

Chief Product & Technology Officer

Ravi has delivered product across a diverse mix of industries with many different constraints. He’s embedded this dimensionality into a set of leadership model parameters to unearth valuable insights from our data. Ravi thrives on teamwork and loves delving into our digital excavations. When not reinforcement learning, he enjoys spending time with his three dino-mite daughters.

Get insights delivered right to your inbox

Want to learn more about Dexibit?

Talk to one of our expert team about your vision to discover your data strategy and see Dexibit in action.

Behind the scenes: how we shipped AI Query Assist

Ravi Chandra, Chief Product & Technology Officer at Dexibit reflects on the team’s experience discovering, designing and going to market with generative AI

Ravi Chandra

Chief Product & Technology Officer

Get insights delivered right to your inbox

Want to learn more about Dexibit?

Features

Who We Serve

Resources

About Us

Sign up to our mailing list

Download Visitor Excellence Kit

Download Visitor Excellence Kit

Download Data Audit Workbook

Download Data Audit Workbook