Keeping it clean in data integrity
- Posted on
👤 Featuring Pierre Perez, Success Director & Angie Judge, CEO of Dexibit
When it comes to data, a little mess can make a big difference. In this episode of The Data Diaries, hosts Angie Judge and Pierre Perez from Dexibit dive into the essentials of keeping your data clean, consistent, and trustworthy. From visitor counts to revenue reports, they explore how data integrity issues creep in, why they matter, and what every attraction can do to stay ahead, with a few stories (and laughs) along the way.
Transcript (generated with AI)
If you want go from gut feel to insight inspired, this is the Data Diaries with your hosts from Dexibit, Pierre Perez and Angie Judge. The best podcast for visitor attraction leaders passionate about data and AI. This episode is brought to you by Dexibit. We provide data analytics and AI software specifically for visitor attractions so you can reduce total time to insight and total cost of ownership while democratizing data and improving your team’s agility. Here comes the show!
So, I know we’re supposed to be talking about data integrity, but we have to talk about the Louvre first. We cannot not talk about the Louvre this week.
We can talk about the Louvre, absolutely. I think I’ve told you that this week and I will tell you again, but simple things work best. Right? There was, there was no hanging from the ceiling from a drawer with that’s attached to a helicopter with a dozen people watching red beams and some lasers.
You’re always gonna have Lasers!
Yeah, there was just a cherry picker, some hand grinder and that’s it. I think that the true master stroke key in that highest was the fact that they did it when the Louvre just opened, which meant security were focused on putting everyone to safety instead of potentially intercepting the thief. Right? So, yeah, the, a sad day for France, obviously.
$88 million or 88 million euros of sadness for France.
I’ve seen another figure. I’ve seen 150. Somewhere. So I don’t know exactly, you know, the number and I guess we will potentially never know. But yeah, a crazy, a crazy day for France, a crazy week thing for France. Crazy week for, for museums around the world. And yeah, just absolutely insane. Insane. Yeah, it’s sad to think they might never get them back.
We had a, one actually just up the road from our New Zealand office in Parnell, where an art gallery was robbed. I think they did it in the night and they just busted through the front windows, um. Of some very special paintings, which they did end up tracking down and getting back years later. But I think that was a steal to order where the running theory here [with the Louvre] is that these guys are gonna chop up these crown jewels, like one of them’s got over a thousand diamonds on it or something, and, and sell ’em off in piece, which wouldn’t surprise me.
And I think they’d get, they’d get some pretty good money for it. And, and thanks to, you know, all your. Bitcoins and Ethereum and stuff. Nowadays there’s, there’s less likelihood of being able to trace payments, right. Which is another topic, but I think, you know, we, uh, you’re right, we would potentially never seen this piece again, which is, which is sad, um, because it is a piece of history.
But hey, they played a game. They, they potentially won. It is what it is and let’s move. Let’s move on. Yeah. And learn from the mistakes, I guess, and talking about mistakes. That’s a segue Change of subjects. It’s a nice segue into data integrity. I guess the mistakes are, don’t have the same repercussions, right? When you make a mistake, when you have integrity check.
Well, unless you’re a particular gallery in London who miscounted their visitors for years and then find it publicly, but yes. Okay. I’m with you potentially.
So yes, you’re right. That, you know, trying to avoid mistakes can be, um, I mean, we all make mistakes You and I, we, we make mistakes every day. It’s, it’s about how quickly you realize that you’ve made a mistake and how can you correct it if you can correct it, or how can you adapt if you can adapt. And then especially here, very relevant, is how do you learn from it not to try not to do it again?
Right. I think one of the big things that comes up for me when we start working with a new customer is usually some of these things like visitation and revenue memberships, et cetera, they normally come from potentially multiple systems to begin with. So like visitation, you know, there’s all these sorts of rules and revenue, especially with recognition, et cetera.
And there’s usually some manual manipulation on a spreadsheet. And so that’s sort of the first big risk area is that something goes wrong in those calculations or that person leaves. And we are not quite sure how they were doing things or how people recognize things back in 2012 when it was somebody else.
And so one of the really big first wins is getting a lot of those rules out getting them agreed and visible and talked about and sought about, and then getting them automated so they’re never subject to that sort of human manual intervention error again. But that can be a really, really difficult process in itself if they’re not written down, which 99% of the time they’re not.
To try and kind of, we are essentially black box reverse engineering. Like what are these rules? And often finding them out through this iterative, iterative testing of trying to get data integrity, acceptance. Yeah, that’s so true. And I think we’re in a unique position here where after all these years, without being, you know, without being, without selling too, too cocky at all.
But we sort of know where best practice, right? We, we sort of know. What we should do, what we shouldn’t do. But then some of these things may differ, you know, from team to team, right? Like an accounting team may count revenue from a ticket, you know, obviously when the ticket is due to be scheduled or, or when it is reading, and then an event, ideally, you know, ideally, often not though, right?
Ideally, ideally, often not. Yeah. As we, as we found out. But, you know, the, the ticket folks and, and the sales folks maybe like, well, not we are gonna count revenue when the ticket is sold because when our KPIs are being hit. We saw that may ticket for the exhibition, et cetera. Um, so is who are you speaking to?
And, and potentially identifying they are different, different rules of recognition based on who you’re speaking to. Right. And try to have everyone to agree on one set of rules that we are gonna be using to |A. Check integrity, and B, continue using that data with the same roles in place going forward.
Yeah, and this is something that’s really important whenever you talk about visitation to your team, to talk about how you measure visitation and just do it again and again and again and again. And one of the things we find is that teams often forget, even when we. Do a workshop, we’ll talk about, these are the revenue recognition rules, or these are the visitation recognition rules.
And then a couple of weeks later, somebody will say in the meeting like, wait, how are we measuring this? And so just sort of repeating that over and over and just, just accepting that that’s what we’re gonna do is just continue repeating that and not feeling bad, like if the team forgets. So, you know, obviously having them documented and Dexabit as a product, having them automated.
Means we don’t have to remember, but I think it is helpful for everyone who’s working with data to know where it comes from, how it’s being calculated, and, and they do need that reminder and that’s not abnormal if that’s the case. Yeah, absolutely. You’re right. And, and on the other side, if, if someone is being of changing something, right, where, whether that change starts, is it being documented?
If we’re gonna change something, how? We relay that information to the team that have been doing it in a specific way for these years, right? How are we gonna manage that change essentially? Mm-hmm. It’s good to rip the plaster off actually. I mean, if you’re gonna make change the time when you’re starting to implement Dexabit and fleshing out some of these business rules and these recognition rules, like this is the time to make change.
If you find. That things are happening that are a little bit weird in terms of how things are being counted. And trust me, we’ve seen it all. Like we’ve seen galleries who are counting visitors three times because they’ve got multiple campus buildings and they’re counting each time a visitor goes in and then adding them all up.
And that’s just the way things have been done for years. And so we’ll just do things that that way and not question them. We’ve seen places that count people coming into cafes and stores that aren’t actually, that are connected to the general public and aren’t actually part of the, um, attraction perimeter.
So the visitor could come and buy something from the shop and not actually come into the attraction and be counted. We’ve seen people count. Tickets that have been booked that are free, and then they’ve got massive attrition rates and they’re counting everybody as a visitor, even though half of them don’t show up.
I mean, it just goes on and on. But this is the time to rip off the plaster. If there are really uncomfortable truths like that, they come out, just take them head on and take the moment to massage them out and to make that correction and then move on cleanly. ’cause this is the whole point of having that point of truth.
I think one of my favorite one is, one of my favorite point would be, we are playing a scaling factor. So a foot for couch always gets really juicy as to how gets, what is the science behind? Yes. How people get to that, to that factor. So, um, you know, yeah, being aware all these things, right? Like where, who, who this on a scale factor.
When was, when was it done? Like essentially, have we reviewed that in the past? Well, that’s recently. Right. Like, have we reviewed that recently? Because if we apply scaling factor, and that’s, you know, let’s say 90% for example, and, and for the ones who don’t use footfall camera, he has a, as a source of visitation, all footfall counter, uh, both vault footfall cameras usually count.
Um, you know, it’s, it’s a beam break essentially. So you’re, you’ll be counting each body coming to museums, to the museum or to the gallery. And what people do that they scale that factor usually down to account for staff people entering more than once, et cetera. And when you, you know, when you say that large, you can then understand that it’s not the exact science, it may not be an exact science, depending on what time of the day you’re making an experiment, right?
One day of the week you’re making an experiment and it’s also may become irrelevant as time. Right. You may not schedule the same amount of people in next year, and therefore you have less staff, et cetera, et cetera. So it’s really important to review these logics as often as possible. I think, Angie, you and I spoke about pricing and how often you need to review pricing.
You know, instead of reviewing once every five years, redo it every year, it’s all the same logic here, right? Mm-hmm. Do you agree that, yeah. Some systems, like you should review them often enough to something. Yeah, and speaking of integrity, like making sure that those hardware equipment work as well. Like they can fail and they can fail very silently.
Like they’ll just slowly start counting less and less people. And it can be very subtle things, like if the light situation changes, which might happen between summer and winter, or maybe the case that I was mentioning in London, it was a, a light bulb failure in the atrium area where the counter was counting and therefore the hardware.
Hadn’t been recalibrated to the new level of light, and it was undercounting significantly that we’re talking. I can’t remember the actual numbers, but something close to like a million visitors a year or something. So it can fail very, very silently where you’re seeing data. It’s just less so, integrity
checking all the way back to the hardware is great as well. So one of the things we often suggest is at daylight savings, if you don’t live in Queensland like Pierre does, where they just don’t believe in that or something. But the rest of the world, when you have daylight savings, put your clocks back and then test your hardware counters, and maybe a good time every six months to revisit those recognition rules.
Just make sure that they’re still relevant for today and you’ve tested all those assumptions still apply. I agree. And the, the way we usually do integrity testing right on our side is that, well, the way I like to do it, at least is to always start to broad and then go very granular, right? To understand how are we looking, you know, for example, let’s take some some revenue data because that’s what I’ve been working on this morning, for example.
So how we, when we building integration, we usually, we ask our question as, what is the number that you’re seeing on your side? So what are, what have you been reporting and what are we seeing on our side using things like NAPI and SFTP, whatever it is that we’re doing, and you, like, we take a year worth of data.
What is the data that we see? Is it an acceptable data? Yes or no? And sometimes the answer is yes. Yeah. Sometimes the answer is yes. You know, when we look at a year, we are close enough and DBM for that 9% accuracy for for most data. But then you don’t stop there, right? You also need to go,
If I look at a month and what am might, what about if I look at a week daily, do I get that same accuracy? Because things can be fair process change over time, and you want to be, you want to make sure that you are capturing. This change of processes because you may recognize revenue, you know, a specific way during the part of the year, but then a different way during another time of the year.
And if you only look at a very large window of time, things sort of balance out. So it’s less obvious that you’re not, you know, counting things properly. But then when you start to zoom in a little bit ad hoc, and sometimes there’s a little bit of loss involved too. To find out that you’re doing something wrong, that’s when you start to get the gotcha moments, right?
Yeah. I think as you come down in, there’s granularity as well. And then as, so if you come down from say, looking at a year to looking at a month, to looking at a day, and then you can also dimension the data as well, like. Different products and types and, and discount codes and channels and all of these sorts of things you can start to hone in.
That’s really great for problem solving. So if you are seeing more or less than you’re expecting, it starts to give you an idea of. Why, so that that’s sort of that process that we go through. So every system that we integrate, particularly the ones as Pierre mentioned, with a financial ramification, like ticketing, point of sale, et cetera, membership, that’s the process that we go through afterwards.
So we’ll say, here’s this inter integration, it’s ready. Now do a data integrity acceptance. We’ll do one on our side against the numbers that you’ve provided. Maybe you’ve given us a spreadsheet or a screenshot out of your system, and then we’ll ask you to, to take a look at that and to give us essentially a data integrity acceptance sign off to say, yes, we agree that this number is, is correct, or All good enough.
As, as peer mentioned, we, we can sometimes aim for that 99%. ’cause chasing down, you know, if you’re missing 12 tickets a year. Chasing down those 12 tickets and understanding why it’s gonna take a long time. Is that worth it for anyone? Probably not. Um, if it’s 1200, hell yes. Like, let’s get into it. So really starting to look at, um, at what’s good enough for either the purpose of which the
data or the insights being used for decision making or to sort of the standard that it needs to be, if it’s being used for any kind of audit functions or anything like that. I know I pull a cap here for these five tickets. Just to, just to, just to challenge that. No, not to challenge, but to pull, um, to pull a pin perspective.
I’ll say yes. Unless the small discrepancies are recent, and the way I’m saying that is Yeah, it is, it is a good one because. Having a, you know, having perfect integrity up to a point, and then starting to notice discrepancy recently mean a change was made, and that discrepancy and that data may then accelerate over time because it did not capture that change.
So I think there is like a little bit of a, um, a cavity here. Is there, right? Like everything looks good and then. Recently you see some small discrepancies started to creep in. Then to go and ask the question to re-identify what is causing that recent discrepancy in that data. And you mentioned. Probably the most, and I’ve actually just had this, this afternoon, Susan is away on vacation.
Had one of her customers come with a data integrity question. And the most common thing, which is the case probably in this one, is it takes, it often takes users a while to, to get the heads around this, the, the sort of concepts of the ticket lifecycle and how they play out in data. So when we talk about the ticket lifecycle, we’ve got tickets that are sold, tickets that are scheduled.
Tickets that are redeemed, so, and then other things like refunded or canceled or changed or whatever. So the, when the ticket is, is booked is the day on which it was originally sold or otherwise reserved. If it was free, the schedule date is when the visitors expected to show up if they’ve chosen a, um, advanced pass, sort of reserve time, and then the date in which it’s redeemed is when it’s scanned or otherwise kind of checked in, however you call that.
And if you can think about it, there is actually two catchments that apply when you’re looking at data, there is the catchment of the date range. Say if you’ve got a visualization, the date range that you’re applying, what status have you used for that date range? And then there’s the, the catchment of what are you visualizing?
So the status that you’re using for. Your, if you’re using say a time series that the, the status that you’re using for the display of that data, and you can get quite quickly the wrong data if you’re using the wrong status for either of those things. And that’s the case that we’ve had with this support request this afternoon.
And it’s probably the number one cause if we get a data integrity question. It comes down to that ticket life cycle. It’s the first place at least I know I go to look for, for things. Have you, have you found that the same, Pierre? Yeah. 100% I think, but very, very, um, specific to tickets, right? The issue that we’re talking about, it’s like really, you don’t really see that anywhere.
Like maybe, well you just had an example this afternoon with Shopify, right? Yeah. Yeah, true. Um, not, but not really relating to, to the dates. Right. It was more that for, for Oh, the status. Yeah. For all of views. For all of views. No, I’m not gonna use, use, sorry for all of you, uh, that are, that are listening.
Again, you know, connecting to an API is is very different too. Connecting to a user interface, right? An API, you get every single piece of data that, that you have access to. It’s not being, it’s not being clean as much, it’s not being massaged as much. So you really need to understand what’s happening. And in that specific integration that you followed, that we, I realized that we had draft all this also coming up, which was why we had the data with what my, you know, what our client sent us and this, the, all these little small details I can, that can offset.
Upset, sorry, your, your data integrity. The dates, ones for tickets. So yeah, it’s a very, very particular one where if someone sends you a number that don’t have extremely fine context of, I looked at these tickets and this is, I used, you know, when they were booked and instead of, you know, when they were scheduled.
Usually that’s not what happens. Usually people, you know, come to us or look, do, and they’re like. I’m seeing this many tickets, you know, between these date and these days, and I need to go ask, what do you mean? Can you provide me more context? Are you using, you know, are you saying that you’ve seen this many tickets being scheduled between this date to, you know, from that day to that day said, how many you sold?
Are you talking about old tickets? Are you talking about admission tickets? Basic, very different labels zone. Details that you need to go through to ensure that when you’re going integrity, check on your side, you have the right information. And there’s a kind of equivalent in the footfall arena as well that we come across, and that is the, the rules that are being put around that in time, that the footfall data doesn’t have that dimension of a lifecycle to it, but time becomes exceptionally important.
You don’t tend. We don’t tend to put like an open hours around ticketing data because a ticket is a ticket. You don’t tend to schedule them at three o’clock in the morning. But with footfall data, this counts a lot because you might have a cleaning crew come through out of hours and you wanna exclude them.
And so we do tend to apply open hours for most locations to their footfall counters to make sure that they’re only capturing pure. Operating visitors, but that can introduce some discrepancies as well, right? Yeah, absolutely. You, you’re totally right. I mean, when, when do you wanna recognize people coming in and out?
Wow. Mm-hmm. I’m thinking of a, of a specific organization that I, that I work with right now. You know, they’re getting a number from their providers and we still, uh, to this date right now, two years on, don’t know how this provider comes up with that specific number because when we apply the rules that.
We know the right ones. We are getting a different number, but at least we know on our side that we are treating that correctly. We are counting visitors from this specific area, from, you know, let’s say 8:00 AM to 5:00 PM and from this specific area here, which is a cafe, opens a little bit earlier here in Queensland, wake cafes open at 6:00 AM.
So that’s how, um, you know early, right? Yeah, absolutely. We, we, that’s how we count people at the cafe, but then we know the cafe close earlier than the museum, for example. So we stop counting earlier. So it’s all this like very little rules that, you know, we need to be aware of, you know, when are we stump stop counting people there, right?
That from what time do we stop counting people? What about when we have a special event? What about if we have an opening night? For a specific exhibition, how are we gonna count these people coming? How are we gonna count these people? So there’s a lot of bias tricks that we can put in place to, to account for that essentially.
That might be something like an open hours exception, but if we are comparing back to a spreadsheet and then it turns out the person running the cafe is like, oh yeah, we had a wedding over here and I just manually added in 1200. And that’s why our numbers are different. Like one of the things we always say when you’re doing data integrity is to refer back to the system of source, not to like somebody’s spreadsheet, because you never know what.
Copy and paste errors they might have or manipulation that they’re applying to their numbers. I don’t use that word in a bad way, but like manual changes that they’re making to their numbers that can throw things like this out. And also to make sure that they’re comparing back to a, a production system because.
We, it’s more than once we’ve been caught out by, um, by examples who were doing data integrity and a customer’s saying to us, Hey, this data’s not matching. And it turns out they were looking at a non-production environment, so a development or a test environment or a sandbox environment rather than their actual production data.
So there’s all sorts of things. And, and you were mentioning time zones like or time. Timeframe. So sometimes if something shuts at three, it might be the difference of value looking at 3:00 PM or 2 59, and that might be like the equivalent of six or seven people. But that might add up quite a bit over the course of the year.
So maybe it’s, maybe it’s 3 0 1 actually, ’cause you wanna count the 3:00 PM entries. It’s all of these sorts of things that can get really fiddly into the rules. But these are the things that we are looking for. When we’re going through any iterative assessment of data integrity, I mean, for me, one of them, I think one of my s is when.
When I see a, we looking at a spreadsheet and this has been a source of truth so far. Yes. Inside. Inside. Just part of you dies. Yes. But hey, you know what, like this is part of the reason why people come to work with us, right? So that’s fine. But the question is, are we willing and able to let go of that spreadsheet at some stage to trust the data that we have?
Right. And, and that, you know, the example that I’ve been working on again the past few weeks where. Um, you know, this organization were reporting visitation and ticket breakdown. For example, your spreadsheet, and they were putting numbers from the ticketing system, right? But they were putting numbers from a ticketing system too early during the day, right?
They were, they were closing at five or slightly after five, and they were pulling numbers and I think three, I mean, sorry, four 30 or 4 45 before they went home. Before they went home is, which is. Absolutely fair. You do not wanna stay over time to try to push some numbers, right? But the fact is that all the data was not in the sole system yet.
So when they were doing the end of day, they were actually missing some data and it took us a while to realize that they were doing so. It took us a long time to understand what they were doing. So, but what called us, you know, what sort of made us understand that is that. When they finally told us we are using, you know, a, b, c report to pull these numbers, when we looked at these reports, you know, from like in raw data from the data they were putting on their sign, not via the API, well, hang on a second.
This numbers actually different from the one we have in your spreadsheet and. That difference is, and then, you know, this is where we have an excellent team and we probably will name someone because he’s outstanding in our team. But you know, Jason who works with us is absolutely amazing and starting to diagnose things really quickly.
And he was like, well, hang on, this is actually, that problem is accelerating towards the end of the day. So I think what’s happening is that, you know, the, the data has not synced yet. It’s not synchronized, and this is why. Seeing a difference. And sure enough, that was the reason. So this was an analyst at a visitor attraction who finished work at five o’clock.
And so at 4 4 30, they would run a report and say, this is the number from the day, and then go home. And the the attraction hadn’t actually closed. And so they weren’t capturing the full day’s numbers and they were putting that on a spreadsheet. And that’s the spreadsheet that the customer used for data integrity to say the numbers.
But actually it was their process of pulling that data that was causing it. Wow. It’s amazing how, how some of these things can evolve and that can be going on for like 10 years. Right. And, and nobody notice. Nobody knows. And, you know, it can be a, a quite subtle and unobvious place to look. Yeah. And, and I think the, a rule that I had overall.
Is that we usually ask people to brand reports in the morning, even if it’s very early hours, no one’s life is gonna change to understand how you did today at 8:00 PM tonight, or 6:00 PM tonight, or 6:00 AM tomorrow, no one’s life is gonna change. So let all your systems sync nicely. Let all the data, come in and then bite me for do that very early tomorrow.
Because guess what? If you did really well tomorrow, it’s gonna give you a massive pump, right? It’s gonna pump up your team. You did extremely well. If you didn’t do too well, it’s motivation to do even better today. But the fact is that you not, you don’t want to end your day on the high or, or low, you’re going to bed anyway, so you don’t.
Really kidding? I think so. The fact is that wait for tomorrow morning. There’s no hurry. I don’t think there’s any hurry. Again, if someone is listening to that podcast and you know, think, Hey, look, you’re wrong here, or for whatever reason, please reach out and, and we can have a, a talk. But so far I have not been in a situation when I report has to be, uh, interpreted in evening.
And it’s always, you know, it can wait too morning. There’s lot stress that way too, if your numbers are down. Yeah, exactly. Don’t have to worry about it until the next day. Um, and it also avoids some of the time zone issues. So time zone sometimes when a system sends through data. It will send it through in a particular time zone.
So it might send it through in like UTC, no matter where the attraction is in the world. And that’s a transformation we need to apply. Now, if that’s the case, it’s a really easy to spot saying that’ll come up straight away because we’ll be seeing visitors at 4:00 AM in the morning and, and finishing at midday and going, Hey, that doesn’t sound right.
So that’ll be something that’ll come out to bear pretty quick. But where this can be interesting is sometimes a vendor. Report, um, via the API, um, a time zone without daylight saving. And they’ll report it this the whole way through the whole year. And the very subtle shift of that around midnight for things like e-commerce can change numbers or as it sort of relates as peer mentioned to reporting.
So. Keeping an eye out for those sorts of things can make a slight difference as well when it comes to time, time of day. Some other weird things that can go wrong, but one of the things that’s shocked me the most is on more than one occasion, I can think of at least two off the top of my head. We have found ticketing vendors, mainstream, widely used ticketing vendors whose reporting in their user interface doesn’t match the data that they put out in their API. And it’s been either us or in some cases our customers who have found that and reported it to their vendor. And there’s been a lot of back and forth with that vendor about proving that point. And in some cases, the vendors never truly accepted that that’s the case.
But I think this is a really interesting point that you make Pierre around, like what do you trust and at what point do you let go of the thing that you trusted? Accept something new as the source of truth when all evidence points to Rome. That’s true. That’s true. And we’re talking about these systems, by the way, and, and at the end of the day, these systems are developed by people.
Right? And people make mistakes. You can make developed mistakes. Again, we make mistakes, but it’s about how you, again, how quickly you pick up on that, right. Api I, one is a good one that are being different from the user interface to the, to the APIs. It’s, it does happen. And as you say, it’s not, not only with the small guys, it can be the big guys as well.
Yeah, very, very, but the small guys very likely. But yeah, the, the big ones, it always surprises me and sometimes it can be very, very subtle, very edge case type situations, which to be fair, aren’t going to be coming up for them all the time. And sometimes sort of how a system is being custom implemented or configured for a particular customer that’s maybe been out of their norm.
But yeah, always, uh, always interesting when that happens. Another one that comes up quite often is like how you deal with refunds and adjustments. And this is more sort of a operational thing than anything else, but if you can think about it like. Often what we do with the system is, and lots of APIs work in slightly different ways in terms of how they get data to us, but if they’re pushing us.
Data out, like as it happens and then somebody goes in the system and changes some tickets and like manually adjusts something that happened a few months ago, we might not match. And that’s why when we say if it’s 12 tickets in a year, and it’s as peer pointed out a while back, like. Maybe who cares? But if we are sort of chasing down those things, or if they are sort of adding up and becoming quite substantial, often refunds and adjustments is one of the areas that we will look to, to see how are we, how is the system capturing those things?
How are we then capturing those things, particularly if they’re historic and how far back do we care? And there’s some tricks that we can pull to. Sort of net this out. Like we can do reconciliation runs at the end of each month when we go through and replace the data from that month. We can maybe do that once a year if we’re trying to chase down audit level integrity.
But that’s maybe something to keep in in the back of your mind when you’re thinking about data integrity is what? What am I doing with adjustments and what? Weird processes. ’cause there are weird things that happen. You know, they’ll find that one user’s done this particular thing to these tickets and changed the name of them or something retrospectively after they were issued.
And it’s all the sorts of things that you don’t expect. But us human beings do these things that can trip things up down the line. Yeah, I think your point is communicate change, right? Like if you, if you make a change, communicate that to your team, communicate that to, you know, someone like us, your, your vendor, et cetera, because you.
Small changes can make a big impact. And we’ve seen it, you know, on the vendor side as well. When some vendors make a change, don’t let us know. And then all of a sudden, you know, we are getting million. Like, Hey, look, the data looks wrong. And, and we go back and like, oh yeah, like, you know, they, the way they pushing us, the data change from what they were doing a week ago.
And we have no communication and that’s why the data is not looking the same. So yeah, you’re right, like communication change. Communicate that change is extremely important to your team and to the rest of your stakeholders. And then probably one of the other org change aspects is, um, sort of managing expectation around pragmatism.
And I wanna give a really. Really interesting example of this that can come up quite frequently. We do some weather integrations, forecast an actual, and we do it on all sorts of attributes like the humidity or the wind speed or the precipitation and all of these sorts of things. And one of the attributes is the type of weather.
And I didn’t realize until we got into analyzing whether just how political this was, but asking someone what type of day they feel it is on any given day is actually down to a lot of human judgment and everybody will give you a different answer. So like for example, and some people might think I’m getting weird by talking about this, but for example, let’s say it was cloudy today.
For the most part. You woke up and it was cloudy and it rained from 10 to 11, and then the sun came out briefly between three and four. What kind of day would you call that? Like is it a cloudy day? Is it a rainy day? Is it a sunny day? Like especially if you spent the time outdoors during that time when it was sunny, like your memory of that day might be really different.
And so we have to try and assign this. Thing of like, what kind of day do we call this day and based on these various conditions and you know, do we care about the op operating hours? Like if the attraction is open between 10 and six, is it more about what the weather was doing then or is it more important of what the weather was doing like between eight and and 11 when people were planning to visit?
And so we sort of enter into this really like, how deep do we wanna think about this topic kind of thing. On in terms of whether. Weather type, but the most important thing is that the attributes, like the precipitation level are correct, but we often get quite hung up on this thing of weather type. And the same can happen a little bit in terms of voice of the visitor.
Like is this a joyful comment or a surprising comment like, or a kind of mixture of both. Like sometimes we can sort of, we can sort of get. Very wound up in some of the details of what these labels are. And at the end it doesn’t matter too much when we are dealing with volume and certainly in, in weather and, and also invoicing the visitor.
Like we can fall back on other attributes as well. Yes, and that’s such a good point. By the way, journals, there must, feels like there’s a degree of interpretation between, you know, quantitative and qualitative data. Right? And like when you look at integrity between the two, it’s a totally different ball game as feel like.
You know, the quantitative data, there’s, there’s rules in place, there’s rules to follow, et cetera. When you talk about qualitative, it’s another different kind of worm that you’re opening. It’s so much fun for interpretation. Is this a risk or not? Is this a complaint or not? I mean, I, I’ll always trust the AI better than humans these days, because when we test AI results against human results, like.
The AI is more consistent about how it like labels an emotion or how it labels a topic and, and it will do the same thing where a human being comes to this and you ask three people, you get three different answers. But yeah, I guess it comes again down to that pragmatism of to what degree does it matter when we’re trying to do a bit of qualitative analysis?
Mm, that’s so true. So true. When you were describing the weather, by the way, and that’s just, I think that’s an Auckland. That’s an Auckland type of weather. An Auckland problem. It’s four, four different type of weather in one day, blue and hot, or it’s torrential rain. You just have two choices. Yeah. They don’t happen on the same time.
Not they say in Auckland it’s four seasons in one day. ’cause you need to take your sweater, your umbrella, the sun hat. And that’s true. Which is. It’s always been a, a tricky part for the attractions, right? Especially the ones that are located outdoors. Yeah. So we have two oceans, this is weird, but in the city of Auckland, um, not many, many people who live outside New Zealand know this, but you can walk from one side of the Auckland to to the Auckland, one side of Auckland to the other, uh, in about two hours.
And on one side you’ll get the Tasman Sea. On the other side you get the Pacific Ocean. And this is what causes ridiculous weather. But it can be like very different weather in different parts of the city. And again, like what do you call that day? Was it a rainy day or a sunny day? If it was different on the west coast of the city versus the east, who knows?
That’s a topic for another day. What about Google Analytics Pier? Um, yeah, I mean, I feel like there is some, you know, we we’re talking about being attached to data, being attached to a certain reporting type. There’s some data that you should be attached to, some that are, maybe you shouldn’t because they may not be.
Exact representation of, uh, insights and, you know, things are happening, uh, in your business and mm-hmm. And Google Analytics is, is, is one of them. I don’t know if we can name and shame, we’re not naming shaming, we are just saying that this is just reality, I think at this point. Yeah, it is. It is potentially not an exact representation of what’s happening on your website, which, Hmm.
You know. There’s various different reasons for that. There’s, you know, the cooking, cooking, sorry, no, the cookies, I’m getting all the words right. Today. You probably, probably probably tell, but, um, you know, people accessing cookies on your website, can you track them? Yes. No, and, and, and the data is a little bit of a place, right?
So, um, it’s, you know, when you probably, yes, there are a number that you’re after and there are numbers that you’re reporting, but. That number that you’re reporting is from the get go and not an exact science. A little bit like the scaling factor on the footfall that we’re talking about, right? Is that not an exact science?
Right now you’ve been looking at that number and for you it is exact, but the way that you get to that number that Google get to that number is not exact. It should be taken with a pinch of salt. Yeah. If Google have reported like 7,321 visitors were on your website on a given day, or unique uses, I should say, uh, it may not be correct because they’re relying on underlying technology for tracking unique users and page impressions and things like that, which doesn’t always work, particularly in an area where people are using various different browsers and different privacy settings.
Different operating systems and things like that. So there’s all sorts of things that go into that that can disturb that number. So, and then how you pull data out of Google’s API can. How you dimension it as you’re pulling it can change that number slightly as well. So our general advice when it comes to Google is if it’s a few points off, like worry about something else.
What counts as the trends and the patterns? You know, are you getting less, are you getting more, is it more when you do this or that? How these various campaign conversion rates. Comparing, you know, which pages are in their performance, they’re comparing, those are the things that matter. What doesn’t matter was whether it was 739 or 734, and, and that number is probably roughly between 7 25 and seven 50 if we’re being honest.
So like, I think this is one of the, the vendors where we would say like, rough gain, grain of salt on all the numbers to begin with. Don’t worry too much about the exactness of the integrity. No worry more about what it is that you want to do with that number. Right? What it is that you want to do next.
Yeah, and can you do what you want with that number? Essentially, can you identify trends, insights, et cetera? The reality is that, you know, if you have 103,000 people, you know, visiting your website and you try to do an insight based on that. If your numbers is 102,000 people, your insight is gonna be the exact same.
You’re gonna be looking at the same trend, you’re gonna be looking the same insight. So think more about what it is that you want to do with that data instead of, I really wanna get that data as close as possible. This we purely talking about visitation, revenue, et cetera, membership. And how many members do you have then?
Yes. Right, but we call this core data right here. The executives actually core data, which are usually these three, right? Euro, I mean four visitation tickets, revenue membership, that to look like where you need to be as close as you can. As accurate as you can. So have we missed anything off this list?
There’s probably a thousand other things that can go wrong, but we’ve got some really good ones here. Yeah, I think been, in the past few years, there’ve been potentially more, um, more things going, going on, but so far, yeah, I think that’s, that’s almost all I can, I can think of. I, I think that’s like another juicy one, but it’s just escaping my mind right now.
Story for another day. But there’s a couple of tips that I would have. Three big things, when you’re going into data integrity, acceptance, the first one that I would suggest is to have. The expectation that this is going to be an iterative process, because chances are when we’ve gone into a data automation workshop, we haven’t got all of the rules out of everyone on every system to understand what’s going on.
Chances are there’s things that are like, oh, yes, and, and by the way, we didn’t tell you this so. Go into it expecting that this is going to be iterative, particularly on things like ticketing where it is complex data, and that way it’s not like a, oh my God, this is wrong. Kind of first initial reaction when we’re looking at data to do an integrity test.
It’s more like this is actually a discovery activity to work out what our recognition rules or whatever we’re talking about is, and we’re expecting this to be iterative. And I think sort of setting that expectation up front helps teams a lot. The second one is. It’s really, really helpful if we know in advance what the tolerance level is that we’re aiming for.
This actually applies to forecasting as well. This can be a very different number when you think about beforehand versus when you’re thinking about it after. So knowing about it in advance to say it’s 99%, or actually we’re only comfortable with it being a hundred per year, or. Whatever that number is, if you have something in mind, um, do, do bring that up, um, before we get into things.
’cause then we’ve got something to aim for if we are going to allow a tolerance or need a tolerance on a particular data set. And with that as well, it’s ideal if you can name somebody who’s responsible because there is, there is some work to be done in data integrity testing, and so knowing who that person is going to be accountable for that, that we can go back to is a really useful thing.
’cause often it then becomes this job that falls on the floor between everybody standing around looking at it to having a sort of point person. Usually your system admin or. Ticketing manager or someone like that whose job it is to do this data integrity exercise for each system is, is useful. Um, and then my third tip would be.
When we start on an integration, do provide a screenshot of the source system showing the data that you’re expecting, not the spreadsheet that somebody’s copied and pasted that data into, but that screenshot of the source system showing for these dates and times. These are the numbers that we’re expecting for these things.
That provides and makes sure that there’s the production environment. And not anything else that we need to know. Like sometimes even like the time zone of the person who is looking at the system versus the time zone that the attraction is in can be different. So anything like that, um, that’s going to help.
Having those numbers ahead of time is really useful for us to, again, get the context of, of what we we’re looking for with fewer iterations to get there. Any tips to add Pierre? No, the, um, you the right, you know, it’s a game of patience and then it’s a game of resilience and it’s again, you know, being pragmatic and keep revisiting those rules.
Keep checking that, oh yeah, that’s a good point. It’s not, it’s not a stop and down. It’s about putting a process in place to make sure the integrity keeps being at a level that is acceptable. It’s not a, it’s not a one stop shop, you know, so, um, don’t feel bad if, if you know this is happening and we are getting.
You know, you’re getting somewhere at one point and you are happy and then six months down the line is not there. That, that is normal. Things change, as we said. Um, and being on top of that and having processes in place will help you and us. Ensuring that we keep delivering amazing results and insights. I have, uh, as you talked about that I have a story that came up in my mind of a, a place in New York, um, where they counted their visitation off their tickets, and if the tickets were called general admission as a product and they went towards a visitation, and if they.
Weren’t, they were considered other activities, exhibitions, or whatever, and along came a new system administrator and changed naming convention to call it. Entry or something like this, or whatever the case was. And sure enough, it broke and it was, I, uh, I’m making it sound very dramatic. It was actually a lot more subtle and therefore a lot harder to discover at first glance, um, than this.
And these sorts of things do happen from time to time. So, uh, making sure that everybody’s always thinking about them, always have them front of mind. Re communicating it over and over and of, of course, having those rules documented, index a bit and automated off that documentation goes a long way to making sure that things like that don’t happen.
Yeah, and that’s such a good one. And you also remind me to, to make another comment, is you do not need the year on your general admission tickets. Oh, I know who you’re talking about now. General admissions, I call it general admission. 2024. 2025. Yeah. Um, you did not need that, um, for, for any purpose. So general admission is fine.
We managed to get through this episode without mentioning a single attraction’s name against any of these examples. So, uh, hats off to us. Love that. Your secrets will stay with us. Well you have a great weekend, Pierre. I’m off to play with ChatGPT’s Atlas.
If your goal is to get more visitors through the door, engaging and spending more, leaving happy and loyally returning – check out Dexibit’s data analytics and AI software at dexibit.com. We work with visitor attractions, cultural and commercial, integrating with over a hundred industry source systems across visitor experience and venue operations, providing dashboards, reports, insights, forecasts, data management and a unique data concierge.
Until next time, this is Dexibit!
Ready for more?
Listen to all our other podcasts here:






























































