Jump to content

Traffic Overload On Database Server


Recommended Posts

We are hosting revive on Amazon Web Services. There are three web servers behind a load balancer all using the same RDS MySQL database.

According to the database metrics, the server usually averages less than 20 connections at any instant. Then all of a sudden the number of DB connections are maxed out. Running SHOW FULL PROCESSLIST reveals that they are all running similar INSERT/UPDATE statements.

INSERT INTO rv_data_bkt_m
            (interval_start, creative_id, zone_id, count)
            VALUES ('2014-04-04 23:00:00', '90', '14', '1')
     ON DUPLICATE KEY UPDATE count = count + 1

The server instance has been increased several times. It currently maxes out around 3,000 DB connections. It seems ridiculous to keep scaling up at this point.

 

Is there an easy way around this?

The only option I have right now is to hack into the core code. I'm thinking about using memcache or flat log files and updating rv_data_bkt_m in a batch mode.

 

Any other ideas?

Link to comment
Share on other sites

How are you 3 web servers connecting to the MySQL database?

1. Do you have remote MySQL server where the 3 web servers instantly send the data over to the MySQL database?

or

2. Do you have MySQL replication setup where data first goes to the MySQL database on each web server and then gets send over to the remote MySQL server?

Also what storage are you using on your web servers and MySQL? SATA or SSD?

Link to comment
Share on other sites

Well, it is fairly simple. That's a symptom that the the database database can't keep up with the current Revive traffic. You normally should look into tweaking it or upgrading it, especially with a faster disk subsystem, but I don't know what are the options for RDS. I'm not a huge fan of the "cloud" as in my experience performance has always been pretty bad when it comes to database systems, but maybe you can upgrade to have more guaranteed IOPS?

Link to comment
Share on other sites

Thanks. We have upgraded the database to an instance with the most memory to get more max_connections.

We serve about 10,000,000 impressions per day. About once per day a log jam forms and all the queries are the above query. Would the same problem happen if I set up a local database on each server and then replicated it to a larger instance?

Link to comment
Share on other sites

Yeah, I'm not a big fan of cloud servers for ad serving either, as performance is not that great.

That is what your problem is. Your data has to travel over the network to the database each time the ad is served. You need to get pure SSD storage on your web servers and setup MySQL replication. Serving 10,000,000 impressions per day with 3 web servers is pretty low. We have customers that serve over 20,000,000 impressions per day with 3 web servers.

Using SSD and setting up MySQL replication should help a lot.

Link to comment
Share on other sites

Thanks for the suggestions. We are looking into changing our setup.

 

I don't have a screenshot, but I did capture 'SHOW FULL PROCESSLIST' in spreadsheet form. Most of the connections have a State of 'update'. I parsed it and put it in spreadsheet form to try to see if there was one creative_id or zone_id that was more common that others.

 

I'll spare you all 2,945 rows, so here are the first 50:

Id	User	Host	db	Command	Time	State	date	creative	zone	count
126053658	revive	ip-10-146-162-42.ec2.internal:37375	revive	Query	62	query end	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053706	revive	ip-10-146-162-42.ec2.internal:37414	revive	Query	61	update	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053723	revive	ip-10-146-162-42.ec2.internal:37427	revive	Query	61	query end	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053724	revive	ip-10-146-162-42.ec2.internal:37428	revive	Query	61	update	2014-04-04 22:00:00'	 '81'	 '14'	 '1'
126053738	revive	ip-10-69-5-228.ec2.internal:54802	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053742	revive	ip-10-69-5-228.ec2.internal:54804	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053744	revive	ip-10-69-5-228.ec2.internal:54806	revive	Query	60	update	2014-04-04 22:00:00'	 '112'	 '15'	 '1'
126053748	revive	ip-10-180-0-49.ec2.internal:48851	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053751	revive	ip-10-146-162-42.ec2.internal:37435	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053756	revive	ip-10-180-0-49.ec2.internal:48855	revive	Query	60	update	2014-04-04 22:00:00'	 '112'	 '15'	 '1'
126053763	revive	ip-10-69-5-228.ec2.internal:54813	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053765	revive	ip-10-69-5-228.ec2.internal:54815	revive	Query	60	query end	2014-04-04 22:00:00'	 '41'	 '11'	 '1'
126053768	revive	ip-10-180-0-49.ec2.internal:48858	revive	Query	60	query end	2014-04-04 22:00:00'	 '60'	 '2'	 '1'
126053769	revive	ip-10-180-0-49.ec2.internal:48859	revive	Query	60	update	2014-04-04 22:00:00'	 '81'	 '14'	 '1'
126053770	revive	ip-10-180-0-49.ec2.internal:48860	revive	Query	60	update	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053772	revive	ip-10-69-5-228.ec2.internal:54818	revive	Query	60	update	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053778	revive	ip-10-69-5-228.ec2.internal:54821	revive	Query	60	update	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053789	revive	ip-10-180-0-49.ec2.internal:48866	revive	Query	60	query end	2014-04-04 22:00:00'	 '81'	 '14'	 '1'
126053791	revive	ip-10-69-5-228.ec2.internal:54826	revive	Query	60	update	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053798	revive	ip-10-180-0-49.ec2.internal:48872	revive	Query	60	query end	2014-04-04 22:00:00'	 '112'	 '15'	 '1'
126053799	revive	ip-10-146-162-42.ec2.internal:37444	revive	Query	60	query end	2014-04-04 22:00:00'	 '71'	 '11'	 '1'
126053801	revive	ip-10-180-0-49.ec2.internal:48874	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053802	revive	ip-10-180-0-49.ec2.internal:48875	revive	Query	60	update	2014-04-04 22:00:00'	 '41'	 '11'	 '1'
126053804	revive	ip-10-69-5-228.ec2.internal:54829	revive	Query	60	query end	2014-04-04 22:00:00'	 '14'	 '7'	 '1'
126053805	revive	ip-10-146-162-42.ec2.internal:37445	revive	Query	60	query end	2014-04-04 22:00:00'	 '111'	 '13'	 '1'
126053807	revive	ip-10-69-5-228.ec2.internal:54830	revive	Query	60	update	2014-04-04 22:00:00'	 '81'	 '14'	 '1'
126053814	revive	ip-10-69-5-228.ec2.internal:54833	revive	Query	60	update	2014-04-04 22:00:00'	 '60'	 '2'	 '1'
126053821	revive	ip-10-69-5-228.ec2.internal:54834	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053822	revive	ip-10-180-0-49.ec2.internal:48883	revive	Query	60	update	2014-04-04 22:00:00'	 '60'	 '2'	 '1'
126053823	revive	ip-10-180-0-49.ec2.internal:48884	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053826	revive	ip-10-69-5-228.ec2.internal:54835	revive	Query	60	update	2014-04-04 22:00:00'	 '112'	 '15'	 '1'
126053828	revive	ip-10-146-162-42.ec2.internal:37453	revive	Query	60	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053829	revive	ip-10-146-162-42.ec2.internal:37454	revive	Query	60	update	2014-04-04 22:00:00'	 '41'	 '11'	 '1'
126053833	revive	ip-10-146-162-42.ec2.internal:37457	revive	Query	60	update	2014-04-04 22:00:00'	 '112'	 '15'	 '1'
126053835	revive	ip-10-146-162-42.ec2.internal:37459	revive	Query	60	update	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053839	revive	ip-10-146-162-42.ec2.internal:37462	revive	Query	59	update	2014-04-04 22:00:00'	 '14'	 '7'	 '1'
126053841	revive	ip-10-146-162-42.ec2.internal:37464	revive	Query	59	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053842	revive	ip-10-146-162-42.ec2.internal:37465	revive	Query	59	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053845	revive	ip-10-146-162-42.ec2.internal:37467	revive	Query	59	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053848	revive	ip-10-146-162-42.ec2.internal:37470	revive	Query	59	query end	2014-04-04 22:00:00'	 '90'	 '14'	 '1'
126053849	revive	ip-10-146-162-42.ec2.internal:37471	revive	Query	59	update	2014-04-04 22:00:00'	 '112'	 '15'	 '1'
126053853	revive	ip-10-180-0-49.ec2.internal:48889	revive	Query	59	update	2014-04-04 22:00:00'	 '110'	 '14'	 '1'
126053854	revive	ip-10-146-162-42.ec2.internal:37475	revive	Query	59	update	2014-04-04 22:00:00'	 '72'	 '10'	 '1'
126053860	revive	ip-10-146-162-42.ec2.internal:37481	revive	Query	59	query end	2014-04-04 22:00:00'	 '38'	 '11'	 '1'
126053862	revive	ip-10-146-162-42.ec2.internal:37483	revive	Query	59	update	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053868	revive	ip-10-146-162-42.ec2.internal:37488	revive	Query	59	update	2014-04-04 22:00:00'	 '41'	 '11'	 '1'
126053869	revive	ip-10-146-162-42.ec2.internal:37489	revive	Query	59	update	2014-04-04 22:00:00'	 '60'	 '2'	 '1'
126053870	revive	ip-10-146-162-42.ec2.internal:37490	revive	Query	59	update	2014-04-04 22:00:00'	 '59'	 '1'	 '1'
126053871	revive	ip-10-146-162-42.ec2.internal:37491	revive	Query	59	update	2014-04-04 22:00:00'	 '90'	 '14'	 '1'

Link to comment
Share on other sites

  • 2 weeks later...

You should also look into tweaking your MySQL config. I see in your full processlist hostnames, which means that with every mysql connection you make it does a nameserver lookup; making it even slower. Look into : skip-name-resolve

(also check your mysql permissions if you edit it)

Link to comment
Share on other sites

  • 3 weeks later...
  • 2 weeks later...
  • 2 weeks later...

Thanks for all the responses. I understand the concerns about cloud and virtualized servers.

I have been testing on two AWS instances to compare Percona to the out-of-the-box MySQL. I don't see any improvements with the Percona server when it is doing writes. I see a big improvement when doing reads, but that isn't the issue. I can cache reads on the application server to solve the read issue.

 

I tested using sysbench:

sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=test --mysql-user=root --mysql-password='root' --max-time=60 --oltp-read-only=off --oltp-index-updates=on --oltp-non-index-updates=on --max-requests=0 --num-threads=8 run

On the Percona server this was the result:

sysbench 0.4.12:  multi-threaded system evaluation benchmark

No DB drivers specified, using mysql
Running the test with following options:
Number of threads: 8

Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 pct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting...
(last message repeated 7 times)
Done.

OLTP test statistics:
    queries performed:
        read:                            351064
        write:                           75228
        other:                           50080
        total:                           476372
    transactions:                        25004  (416.63 per sec.)
    deadlocks:                           72     (1.20 per sec.)
    read/write requests:                 426292 (7103.06 per sec.)
    other operations:                    50080  (834.45 per sec.)

Test execution summary:
    total time:                          60.0152s
    total number of events:              25004
    total time taken by event execution: 479.9083
    per-request statistics:
         min:                                  6.51ms
         avg:                                 19.19ms
         max:                                209.38ms
         approx.  95 percentile:              40.04ms

Threads fairness:
    events (avg/stddev):           3125.5000/33.28
    execution time (avg/stddev):   59.9885/0.01

And this was the result on the MySQL server:

sysbench 0.4.12:  multi-threaded system evaluation benchmark

No DB drivers specified, using mysql
Running the test with following options:
Number of threads: 8

Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 pct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting...
(last message repeated 7 times)
Done.

OLTP test statistics:
    queries performed:
        read:                            640752
        write:                           137304
        other:                           91536
        total:                           869592
    transactions:                        45768  (762.66 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 778056 (12965.19 per sec.)
    other operations:                    91536  (1525.32 per sec.)

Test execution summary:
    total time:                          60.0112s
    total number of events:              45768
    total time taken by event execution: 479.6347
    per-request statistics:
         min:                                  3.11ms
         avg:                                 10.48ms
         max:                                522.89ms
         approx.  95 percentile:              17.23ms

Threads fairness:
    events (avg/stddev):           5721.0000/16.02
    execution time (avg/stddev):   59.9543/0.00

This is the my.cnf on the Percona server:

[mysql]

# CLIENT #
port                           = 3306
socket                         = /var/lib/mysql/mysql.sock

[mysqld]

# GENERAL #
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/lib/mysql/mysql.sock
pid-file                       = /var/lib/mysql/mysql.pid

# MyISAM #
key-buffer-size                = 32M
myisam-recover                 = FORCE,BACKUP

# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000
skip-name-resolve
sql-mode                       = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY
sysdate-is-now                 = 1
innodb                         = FORCE
innodb-strict-mode             = 1

# DATA STORAGE #
datadir                        = /var/lib/mysql/

# BINARY LOGGING #
log-bin                        = /var/lib/mysql/mysql-bin
expire-logs-days               = 14
sync-binlog                    = 1

# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 500
thread-cache-size              = 50
open-files-limit               = 65535
table-definition-cache         = 1024
table-open-cache               = 2048

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 256M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 12G

# LOGGING #
log-error                      = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes  = 1
slow-query-log                 = 1
slow-query-log-file            = /var/log/mysql/mysql-slow.log

Am I missing something or does Percona not really help on very write intensive applications like the Revive ad server?

Link to comment
Share on other sites

I also tested a real Revive query using Apache's ab tool.

 

This is the index.php file what did the query:

<?php

$db = array("host"=>"localhost","db"=>"test","user"=>"root","pass"=>"root");
$db = new PDO('mysql:dbname='.$db['db'].';host='.$db['host'],$db['user'],$db['pass']);


$creative_ids = array("3","4","7","8","10","11","12","13","14","15","16","17","18","22","23","24","25","26","27","32","33","34","35","36","37","38","39","40","44","45","46","47","48","53","54","55","56","57","58","59","60","61","62","63","64","71","72","73","74","75","76","81","82","83","84","90","91","92","99","100","101","110","111","112","119","120","121","151","152","175","176","177","186","205","206","207","210","211","212","214","215","216","217","218","219","227","228","230","231","232","233","234","235","236","237","254","256","269","270","271","272","273","274","275","283","284","285","286","287");

$zone_ids = array("4","5","8","7","9","11","10","12","6","1","2","3","14","13","15","17","18","28","38","23");

$zone_id = $zone_ids[array_rand( $zone_ids )];
$creative_id = $creative_ids[array_rand( $creative_ids )];
//$zone_id="4";
//$creative_id="3";

//echo 'zone: ' . $zone_id .'<br>creative: ' . $creative_id;
$sql = "INSERT INTO revive1 (interval_start, creative_id, zone_id, count)
            VALUES ('2014-06-09 11:30:00', '" . $creative_id . "', '" . $zone_id . "', '1')
     ON DUPLICATE KEY UPDATE count = count + 1";


$db->query($sql);

ab -n 10000 http://localhost/

On the MySQL server:

Server Software:        Apache/2.4.7
Server Hostname:        localhost
Server Port:            80

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      1
Time taken for tests:   33.712 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1840000 bytes
HTML transferred:       0 bytes
Requests per second:    296.63 [#/sec] (mean)
Time per request:       3.371 [ms] (mean)
Time per request:       3.371 [ms] (mean, across all concurrent requests)
Transfer rate:          53.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    3   1.7      3      33
Waiting:        1    3   1.7      3      33
Total:          1    3   1.7      3      33

Percentage of the requests served within a certain time (ms)
  50%      3
  66%      3
  75%      4
  80%      4
  90%      4
  95%      5
  98%      7
  99%     10
 100%     33 (longest request)

On the Percona server:

Server Software:        Apache/2.2.22
Server Hostname:        localhost
Server Port:            80

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      1
Time taken for tests:   115.514 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      2120000 bytes
HTML transferred:       0 bytes
Requests per second:    86.57 [#/sec] (mean)
Time per request:       11.551 [ms] (mean)
Time per request:       11.551 [ms] (mean, across all concurrent requests)
Transfer rate:          17.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     6   11  11.3      9     242
Waiting:        6   11  11.3      9     242
Total:          6   12  11.3      9     242

Percentage of the requests served within a certain time (ms)
  50%      9
  66%     10
  75%     11
  80%     12
  90%     13
  95%     17
  98%     45
  99%     65
 100%    242 (longest request)
Link to comment
Share on other sites

Probably this won't be  an percona issue, it will be your AWS, Can you adjust the percona settings like follows, Please rememeber it is only a sample , the value trylly deppend on your hardware and access. So tweaking mysql take time.

[mysqld]
innodb_file_per_table=1
local-infile=0
tmpdir =/dev/shm
old-passwords = 1
max_connections = 650
max_user_connections=300
key_buffer_size = 512M
myisam_sort_buffer_size = 64M
join_buffer_size = 1M
read_buffer_size = 1M
sort_buffer_size = 256K
table_cache = 4000
thread_cache_size = 384
wait_timeout = 20
connect_timeout = 10
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 64M
net_buffer_length = 16384
max_connect_errors = 10
thread_concurrency = 16
concurrent_insert = 2
#table_lock_wait_timeout only for mysql5
#table_lock_wait_timeout = 30
read_rnd_buffer_size = 786432
bulk_insert_buffer_size = 8M
query_cache_limit = 7M
query_cache_size = 64M
query_cache_type = 1
query_prealloc_size = 262144
query_alloc_block_size = 65536
transaction_alloc_block_size = 8192
transaction_prealloc_size = 4096
max_write_lock_count = 16
#for mysql 5.1
#skip-external-locking

[mysqld_safe]
open_files_limit = 8192

[mysqldump]
quick
max_allowed_packet = 16M

[myisamchk]
key_buffer = 384M
sort_buffer = 384M
read_buffer = 256M
write_buffer = 256M
Link to comment
Share on other sites

  • 1 month later...

I wanted to update this thread. We moved off Amazon four weeks ago and now we are on a dedicated server with solid state drives. It serves about 21 million impressions per day and has not had any problems. Thanks to Snoork hosting for convincing me to make the change.

Link to comment
Share on other sites

  • 1 month later...

I wanted to update this thread. We moved off Amazon four weeks ago and now we are on a dedicated server with solid state drives. It serves about 21 million impressions per day and has not had any problems. Thanks to Snoork hosting for convincing me to make the change.

I'm glad everything is working out for you fine. Thank you for the update :)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...