Why do neither rv_data_raw_ad_request nor rv_data_raw_ad_impression contain records?

raffael · March 3, 2016

Hello,

I set up Revive and already managed to set up and deliver a text banner. The impressions of this ad banner are correctly counted (f.x.) on Statistics / Global History.

So, as far as I can tell, everything works just fine.

Now I would be interested in more fine grained statistics on an impression / request level. Especially I would like to be able to figure out the HTTP Referer information which was sent along with the impression / request.

Taking a look at the DB I found to very promising tables:

rv_data_raw_ad_request
rv_data_raw_ad_impression

Both tables contain a field named "referer".

But for unknown reasons both tables remain empty.

Do I have to switch logging on impression / request level on somewhere?

Kind Regards

Raffael

Erik Geurts · March 3, 2016

Those tables are around from about a decade ago, but the software doesn't actually use them anymore. They will never contain data.

If I remember correctly, you asked me that via the contact form of my site, and that's what I already replied to you directly last week.

raffael · March 3, 2016

Those tables are around from about a decade ago, but the software doesn't actually use them anymore. They will never contain data.

Thanks - that explains it.

If I remember correctly, you asked me that via the contact form of my site, and that's what I already replied to you directly last week.

You do not remember correctly.

Kind Regards

Erik Geurts · March 4, 2016

Well, it is still surprising that I got the exact same question (with almost identical wording) in my mail the other day.

raffael · March 4, 2016

Well, it is still surprising that I got the exact same question (with almost identical wording) in my mail the other day.

I promise it wasn't me. Though, it indicates that I am not alone with a demand for such a feature.

I hacked a solution by adding a function to /www/delivery/asyncspc.php which stores $_SERVER in a file. Of course this solution is handicapped from scratch. Is there a plug in or a clean solution that you are aware of?

Erik Geurts · March 4, 2016

The reason that this data is no longer being collected and stored is because it doesn't scale well. In OpenX Source v2.6 (one of the predecessors of Revive Adserver) this was scrapped and replaced with a method called "bucket logging".

Matt Glover · March 12, 2016

Erik,

I disagree! ;) It sales well, it just doesn't scale easily. I really need to be able to cut statistics across verticals and groups verticals for different purposes and for that reason need to be able to work with the raw data in a data mining environment. I'm trying to get my head around the codebase now to work out whether we can write a plugin to add this feature back in or whether we need to modify the core.

The current "whiteboard" plan looks like using a redis array to accumulate then have independent workers (not on the revive front ends) serialising the data into a secondary mysql instance(s). Is there a neat way to hook into the server and click events to capture the referrer info from a plugin.

andrewatfornax · March 12, 2016

Hi @Matt Glover,

As an employee of OpenX at the time the move to the new aggregated stats tables happened in OpenX Source 2.8, I suppose we could more technically say that non-aggregate data collection doesn't scale well using a MySQL-based relational database - and that was the key concern for the business at the time!

While other relational databases may offer scaling through sharding, OpenX at the time wasn't about to consider a move away from MySQL for it's systems, and alternatives like Cassandra, Mongo or Redis weren't considered to be mature enough - so the decision was made to go to aggregated logging.

I agree that now, it would very much be possible to use another engine other than MySQL to allow scalable, high performance logging that includes the ability to report on things like the referer, or geotargeting information, etc.

However, one of the long-standing advantages of Revive Adserver has been that you need very little technical knowledge or access to be able to install it. If MySQL or PostgreSQL is installed, and you have FTP access to a server with PHP, then that's about it. The more we add to Revive Adserver - especially in terms of adding another logging engine - the more we have to maintain. If we were going to make a change like this, I would want to make the new logging engine the default, and drop support for logging to MySQL/PostgreSQL, otherwise maintenance of logging will become a real burden.

But, how many existing Revive Adserver users would this then mean can no longer afford to run the server, because they are on really cheap hosts and can't afford a "proper" host that has root access, and wouldn't be able to manage the server anyway if they did have access?

I'm not saying "no way" - updating the logging engine would be a really fun project! But is it the right direction for the product? Do we keep things simple, and accept that it won't suit everyone - or do we make Revive Adserver more fully featured, and accept that some lower end users will end up being left behind?

Something to think about, and perhaps we'll get a survey out to the community in the near future about what direction the project needs to head in...

Erik Geurts · March 12, 2016

Erik,
I disagree! ;) It sales well, it just doesn't scale easily. I really need to be able to cut statistics across verticals and groups verticals for different purposes and for that reason need to be able to work with the raw data in a data mining environment. I'm trying to get my head around the codebase now to work out whether we can write a plugin to add this feature back in or whether we need to modify the core.
The current "whiteboard" plan looks like using a redis array to accumulate then have independent workers (not on the revive front ends) serialising the data into a secondary mysql instance(s). Is there a neat way to hook into the server and click events to capture the referrer info from a plugin.

My company uses a form of Redis logging for ad delivery and collection of raw statistics, which performs like a dream, but it still does it by a form of bucket logging. We use it to serve billions and billions of ad impressions a month, and still at extremely reasonable costs levels. Most ad servers that seriously scale I have used so far, don't offer the level of logging you are thinking about, because it simply doesn't scale for ordinary use cases.

Matt Glover · March 12, 2016

But, how many existing Revive Adserver users would this then mean can no longer afford to run the server, because they are on really cheap hosts and can't afford a "proper" host that has root access, and wouldn't be able to manage the server anyway if they did have access?
I'm not saying "no way" - updating the logging engine would be a really fun project! But is it the right direction for the product? Do we keep things simple, and accept that it won't suit everyone - or do we make Revive Adserver more fully featured, and accept that some lower end users will end up being left behind?

Now that makes a whole lot of sense! I love the fact that Revive is accessible for everyone and personally would hate to see that lost.

Thank you for the informative response by the way.

We're reserving about 2M impressions a week through our instance now, and its growing and my need for reporting on browser types, geolocation etc across multiple, dynamic groupings of advertisers/campaigns/zones is growing with it.

We've forked the revive repo and started getting our heads around the codebase. What would you recommend is the simplest way to be able to get access to the client browser info via a plugin if possible? Is there an event subscription model we can hook into?

My company uses a form of Redis logging for ad delivery and collection of raw statistics, which performs like a dream, but it still does it by a form of bucket logging. We use it to serve billions and billions of ad impressions a month, and still at extremely reasonable costs levels. Most ad servers that seriously scale I have used so far, don't offer the level of logging you are thinking about, because it simply doesn't scale for ordinary use cases.

understood - at that scale I understand that some kind of incremental bucket system is needed for browser info and location counts. For now we are still small enough (2M/week) to be able to take a less "normalised" approach. Its a matter of needing near real time info to guage reader engagement across 40 print and digital assets to enable the marketing and product teams to scale revenue and invest in the right places.

andrewatfornax · March 14, 2016

We've forked the revive repo and started getting our heads around the codebase. What would you recommend is the simplest way to be able to get access to the client browser info via a plugin if possible? Is there an event subscription model we can hook into?

There is an event model, but naturally, it's wildly undocumented - best way to learn how things work would be to step through the code with an IDE debugger.

Alternatively, take a look at the older OpenX Source 2.6 code if you can find it, and see how it used to do the logging!

Somewhat unhelpful replies, I know, but I am trying to focus on the basic documentation for now, rather than developer documentation.

Matt Glover · March 15, 2016

Somewhat unhelpful replies, I know, but I am trying to focus on the basic documentation for now, rather than developer documentation.

Not at all - thanks for the tips and direction!

Why do neither rv_data_raw_ad_request nor rv_data_raw_ad_impression contain records?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation