jwolfe Posted April 4, 2014 Report Posted April 4, 2014 We are hosting revive on Amazon Web Services. There are three web servers behind a load balancer all using the same RDS MySQL database. According to the database metrics, the server usually averages less than 20 connections at any instant. Then all of a sudden the number of DB connections are maxed out. Running SHOW FULL PROCESSLIST reveals that they are all running similar INSERT/UPDATE statements. INSERT INTO rv_data_bkt_m (interval_start, creative_id, zone_id, count) VALUES ('2014-04-04 23:00:00', '90', '14', '1') ON DUPLICATE KEY UPDATE count = count + 1 The server instance has been increased several times. It currently maxes out around 3,000 DB connections. It seems ridiculous to keep scaling up at this point. Is there an easy way around this? The only option I have right now is to hack into the core code. I'm thinking about using memcache or flat log files and updating rv_data_bkt_m in a batch mode. Any other ideas? Quote
Snoork Hosting Posted April 5, 2014 Report Posted April 5, 2014 How are you 3 web servers connecting to the MySQL database? 1. Do you have remote MySQL server where the 3 web servers instantly send the data over to the MySQL database? or 2. Do you have MySQL replication setup where data first goes to the MySQL database on each web server and then gets send over to the remote MySQL server? Also what storage are you using on your web servers and MySQL? SATA or SSD? Quote
jwolfe Posted April 7, 2014 Author Report Posted April 7, 2014 It is only using 1 DB instance and it is not the SSD type. It is not using replication. That is something we were thinking about looking into, but not sure if that will solve this particular issue. Quote
Matteo Beccati Posted April 7, 2014 Report Posted April 7, 2014 Well, it is fairly simple. That's a symptom that the the database database can't keep up with the current Revive traffic. You normally should look into tweaking it or upgrading it, especially with a faster disk subsystem, but I don't know what are the options for RDS. I'm not a huge fan of the "cloud" as in my experience performance has always been pretty bad when it comes to database systems, but maybe you can upgrade to have more guaranteed IOPS? Quote
jwolfe Posted April 7, 2014 Author Report Posted April 7, 2014 Thanks. We have upgraded the database to an instance with the most memory to get more max_connections. We serve about 10,000,000 impressions per day. About once per day a log jam forms and all the queries are the above query. Would the same problem happen if I set up a local database on each server and then replicated it to a larger instance? Quote
Snoork Hosting Posted April 8, 2014 Report Posted April 8, 2014 Yeah, I'm not a big fan of cloud servers for ad serving either, as performance is not that great. That is what your problem is. Your data has to travel over the network to the database each time the ad is served. You need to get pure SSD storage on your web servers and setup MySQL replication. Serving 10,000,000 impressions per day with 3 web servers is pretty low. We have customers that serve over 20,000,000 impressions per day with 3 web servers. Using SSD and setting up MySQL replication should help a lot. Quote
ernesto_vargas Posted April 9, 2014 Report Posted April 9, 2014 Make sure you are using the InnoDB engine too. Are the connections idle or waiting or locked? Can you share a screen shoot of 'SHOW PROCESSLIST' when the 3,000 connections are there ? Quote
jwolfe Posted April 9, 2014 Author Report Posted April 9, 2014 Thanks for the suggestions. We are looking into changing our setup. I don't have a screenshot, but I did capture 'SHOW FULL PROCESSLIST' in spreadsheet form. Most of the connections have a State of 'update'. I parsed it and put it in spreadsheet form to try to see if there was one creative_id or zone_id that was more common that others. I'll spare you all 2,945 rows, so here are the first 50: Id User Host db Command Time State date creative zone count 126053658 revive ip-10-146-162-42.ec2.internal:37375 revive Query 62 query end 2014-04-04 22:00:00' '110' '14' '1' 126053706 revive ip-10-146-162-42.ec2.internal:37414 revive Query 61 update 2014-04-04 22:00:00' '59' '1' '1' 126053723 revive ip-10-146-162-42.ec2.internal:37427 revive Query 61 query end 2014-04-04 22:00:00' '59' '1' '1' 126053724 revive ip-10-146-162-42.ec2.internal:37428 revive Query 61 update 2014-04-04 22:00:00' '81' '14' '1' 126053738 revive ip-10-69-5-228.ec2.internal:54802 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053742 revive ip-10-69-5-228.ec2.internal:54804 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053744 revive ip-10-69-5-228.ec2.internal:54806 revive Query 60 update 2014-04-04 22:00:00' '112' '15' '1' 126053748 revive ip-10-180-0-49.ec2.internal:48851 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053751 revive ip-10-146-162-42.ec2.internal:37435 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053756 revive ip-10-180-0-49.ec2.internal:48855 revive Query 60 update 2014-04-04 22:00:00' '112' '15' '1' 126053763 revive ip-10-69-5-228.ec2.internal:54813 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053765 revive ip-10-69-5-228.ec2.internal:54815 revive Query 60 query end 2014-04-04 22:00:00' '41' '11' '1' 126053768 revive ip-10-180-0-49.ec2.internal:48858 revive Query 60 query end 2014-04-04 22:00:00' '60' '2' '1' 126053769 revive ip-10-180-0-49.ec2.internal:48859 revive Query 60 update 2014-04-04 22:00:00' '81' '14' '1' 126053770 revive ip-10-180-0-49.ec2.internal:48860 revive Query 60 update 2014-04-04 22:00:00' '59' '1' '1' 126053772 revive ip-10-69-5-228.ec2.internal:54818 revive Query 60 update 2014-04-04 22:00:00' '59' '1' '1' 126053778 revive ip-10-69-5-228.ec2.internal:54821 revive Query 60 update 2014-04-04 22:00:00' '59' '1' '1' 126053789 revive ip-10-180-0-49.ec2.internal:48866 revive Query 60 query end 2014-04-04 22:00:00' '81' '14' '1' 126053791 revive ip-10-69-5-228.ec2.internal:54826 revive Query 60 update 2014-04-04 22:00:00' '59' '1' '1' 126053798 revive ip-10-180-0-49.ec2.internal:48872 revive Query 60 query end 2014-04-04 22:00:00' '112' '15' '1' 126053799 revive ip-10-146-162-42.ec2.internal:37444 revive Query 60 query end 2014-04-04 22:00:00' '71' '11' '1' 126053801 revive ip-10-180-0-49.ec2.internal:48874 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053802 revive ip-10-180-0-49.ec2.internal:48875 revive Query 60 update 2014-04-04 22:00:00' '41' '11' '1' 126053804 revive ip-10-69-5-228.ec2.internal:54829 revive Query 60 query end 2014-04-04 22:00:00' '14' '7' '1' 126053805 revive ip-10-146-162-42.ec2.internal:37445 revive Query 60 query end 2014-04-04 22:00:00' '111' '13' '1' 126053807 revive ip-10-69-5-228.ec2.internal:54830 revive Query 60 update 2014-04-04 22:00:00' '81' '14' '1' 126053814 revive ip-10-69-5-228.ec2.internal:54833 revive Query 60 update 2014-04-04 22:00:00' '60' '2' '1' 126053821 revive ip-10-69-5-228.ec2.internal:54834 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053822 revive ip-10-180-0-49.ec2.internal:48883 revive Query 60 update 2014-04-04 22:00:00' '60' '2' '1' 126053823 revive ip-10-180-0-49.ec2.internal:48884 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053826 revive ip-10-69-5-228.ec2.internal:54835 revive Query 60 update 2014-04-04 22:00:00' '112' '15' '1' 126053828 revive ip-10-146-162-42.ec2.internal:37453 revive Query 60 update 2014-04-04 22:00:00' '110' '14' '1' 126053829 revive ip-10-146-162-42.ec2.internal:37454 revive Query 60 update 2014-04-04 22:00:00' '41' '11' '1' 126053833 revive ip-10-146-162-42.ec2.internal:37457 revive Query 60 update 2014-04-04 22:00:00' '112' '15' '1' 126053835 revive ip-10-146-162-42.ec2.internal:37459 revive Query 60 update 2014-04-04 22:00:00' '59' '1' '1' 126053839 revive ip-10-146-162-42.ec2.internal:37462 revive Query 59 update 2014-04-04 22:00:00' '14' '7' '1' 126053841 revive ip-10-146-162-42.ec2.internal:37464 revive Query 59 update 2014-04-04 22:00:00' '110' '14' '1' 126053842 revive ip-10-146-162-42.ec2.internal:37465 revive Query 59 update 2014-04-04 22:00:00' '110' '14' '1' 126053845 revive ip-10-146-162-42.ec2.internal:37467 revive Query 59 update 2014-04-04 22:00:00' '110' '14' '1' 126053848 revive ip-10-146-162-42.ec2.internal:37470 revive Query 59 query end 2014-04-04 22:00:00' '90' '14' '1' 126053849 revive ip-10-146-162-42.ec2.internal:37471 revive Query 59 update 2014-04-04 22:00:00' '112' '15' '1' 126053853 revive ip-10-180-0-49.ec2.internal:48889 revive Query 59 update 2014-04-04 22:00:00' '110' '14' '1' 126053854 revive ip-10-146-162-42.ec2.internal:37475 revive Query 59 update 2014-04-04 22:00:00' '72' '10' '1' 126053860 revive ip-10-146-162-42.ec2.internal:37481 revive Query 59 query end 2014-04-04 22:00:00' '38' '11' '1' 126053862 revive ip-10-146-162-42.ec2.internal:37483 revive Query 59 update 2014-04-04 22:00:00' '59' '1' '1' 126053868 revive ip-10-146-162-42.ec2.internal:37488 revive Query 59 update 2014-04-04 22:00:00' '41' '11' '1' 126053869 revive ip-10-146-162-42.ec2.internal:37489 revive Query 59 update 2014-04-04 22:00:00' '60' '2' '1' 126053870 revive ip-10-146-162-42.ec2.internal:37490 revive Query 59 update 2014-04-04 22:00:00' '59' '1' '1' 126053871 revive ip-10-146-162-42.ec2.internal:37491 revive Query 59 update 2014-04-04 22:00:00' '90' '14' '1' Quote
Snoork Hosting Posted April 20, 2014 Report Posted April 20, 2014 Were you able to implement the changes? Quote
jwolfe Posted April 22, 2014 Author Report Posted April 22, 2014 We haven't made any changes to our set up yet. We moved into a new office this week and we've spent most of our time packing and unpacking. Quote
Ian Posted April 26, 2014 Report Posted April 26, 2014 You should also look into tweaking your MySQL config. I see in your full processlist hostnames, which means that with every mysql connection you make it does a nameserver lookup; making it even slower. Look into : skip-name-resolve (also check your mysql permissions if you edit it) Quote
marcis Posted May 13, 2014 Report Posted May 13, 2014 we served about 2milj banner impressions per hour on VPS with 8 cores, 8 gb ram, nginx, php-fpm with load 2.x just need fine tune server - mysql, php-fpm, nginx + some openx code speed improvements, like for random. Quote
techs Posted May 28, 2014 Report Posted May 28, 2014 Coluds are in virtualized environment. I don't recommend to use clouds or vps. Also you can siwtch from Mysql to Percona and see the difference by your self. Can you post the setting from your my.cnf ? Quote
jwolfe Posted June 9, 2014 Author Report Posted June 9, 2014 Thanks for all the responses. I understand the concerns about cloud and virtualized servers. I have been testing on two AWS instances to compare Percona to the out-of-the-box MySQL. I don't see any improvements with the Percona server when it is doing writes. I see a big improvement when doing reads, but that isn't the issue. I can cache reads on the application server to solve the read issue. I tested using sysbench: sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=test --mysql-user=root --mysql-password='root' --max-time=60 --oltp-read-only=off --oltp-index-updates=on --oltp-non-index-updates=on --max-requests=0 --num-threads=8 run On the Percona server this was the result: sysbench 0.4.12: multi-threaded system evaluation benchmark No DB drivers specified, using mysql Running the test with following options: Number of threads: 8 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! Time limit exceeded, exiting... (last message repeated 7 times) Done. OLTP test statistics: queries performed: read: 351064 write: 75228 other: 50080 total: 476372 transactions: 25004 (416.63 per sec.) deadlocks: 72 (1.20 per sec.) read/write requests: 426292 (7103.06 per sec.) other operations: 50080 (834.45 per sec.) Test execution summary: total time: 60.0152s total number of events: 25004 total time taken by event execution: 479.9083 per-request statistics: min: 6.51ms avg: 19.19ms max: 209.38ms approx. 95 percentile: 40.04ms Threads fairness: events (avg/stddev): 3125.5000/33.28 execution time (avg/stddev): 59.9885/0.01 And this was the result on the MySQL server: sysbench 0.4.12: multi-threaded system evaluation benchmark No DB drivers specified, using mysql Running the test with following options: Number of threads: 8 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! Time limit exceeded, exiting... (last message repeated 7 times) Done. OLTP test statistics: queries performed: read: 640752 write: 137304 other: 91536 total: 869592 transactions: 45768 (762.66 per sec.) deadlocks: 0 (0.00 per sec.) read/write requests: 778056 (12965.19 per sec.) other operations: 91536 (1525.32 per sec.) Test execution summary: total time: 60.0112s total number of events: 45768 total time taken by event execution: 479.6347 per-request statistics: min: 3.11ms avg: 10.48ms max: 522.89ms approx. 95 percentile: 17.23ms Threads fairness: events (avg/stddev): 5721.0000/16.02 execution time (avg/stddev): 59.9543/0.00 This is the my.cnf on the Percona server: [mysql] # CLIENT # port = 3306 socket = /var/lib/mysql/mysql.sock [mysqld] # GENERAL # user = mysql default-storage-engine = InnoDB socket = /var/lib/mysql/mysql.sock pid-file = /var/lib/mysql/mysql.pid # MyISAM # key-buffer-size = 32M myisam-recover = FORCE,BACKUP # SAFETY # max-allowed-packet = 16M max-connect-errors = 1000000 skip-name-resolve sql-mode = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY sysdate-is-now = 1 innodb = FORCE innodb-strict-mode = 1 # DATA STORAGE # datadir = /var/lib/mysql/ # BINARY LOGGING # log-bin = /var/lib/mysql/mysql-bin expire-logs-days = 14 sync-binlog = 1 # CACHES AND LIMITS # tmp-table-size = 32M max-heap-table-size = 32M query-cache-type = 0 query-cache-size = 0 max-connections = 500 thread-cache-size = 50 open-files-limit = 65535 table-definition-cache = 1024 table-open-cache = 2048 # INNODB # innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 256M innodb-flush-log-at-trx-commit = 1 innodb-file-per-table = 1 innodb-buffer-pool-size = 12G # LOGGING # log-error = /var/log/mysql/mysql-error.log log-queries-not-using-indexes = 1 slow-query-log = 1 slow-query-log-file = /var/log/mysql/mysql-slow.log Am I missing something or does Percona not really help on very write intensive applications like the Revive ad server? Quote
jwolfe Posted June 9, 2014 Author Report Posted June 9, 2014 I also tested a real Revive query using Apache's ab tool. This is the index.php file what did the query: <?php $db = array("host"=>"localhost","db"=>"test","user"=>"root","pass"=>"root"); $db = new PDO('mysql:dbname='.$db['db'].';host='.$db['host'],$db['user'],$db['pass']); $creative_ids = array("3","4","7","8","10","11","12","13","14","15","16","17","18","22","23","24","25","26","27","32","33","34","35","36","37","38","39","40","44","45","46","47","48","53","54","55","56","57","58","59","60","61","62","63","64","71","72","73","74","75","76","81","82","83","84","90","91","92","99","100","101","110","111","112","119","120","121","151","152","175","176","177","186","205","206","207","210","211","212","214","215","216","217","218","219","227","228","230","231","232","233","234","235","236","237","254","256","269","270","271","272","273","274","275","283","284","285","286","287"); $zone_ids = array("4","5","8","7","9","11","10","12","6","1","2","3","14","13","15","17","18","28","38","23"); $zone_id = $zone_ids[array_rand( $zone_ids )]; $creative_id = $creative_ids[array_rand( $creative_ids )]; //$zone_id="4"; //$creative_id="3"; //echo 'zone: ' . $zone_id .'<br>creative: ' . $creative_id; $sql = "INSERT INTO revive1 (interval_start, creative_id, zone_id, count) VALUES ('2014-06-09 11:30:00', '" . $creative_id . "', '" . $zone_id . "', '1') ON DUPLICATE KEY UPDATE count = count + 1"; $db->query($sql); ab -n 10000 http://localhost/ On the MySQL server: Server Software: Apache/2.4.7 Server Hostname: localhost Server Port: 80 Document Path: / Document Length: 0 bytes Concurrency Level: 1 Time taken for tests: 33.712 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1840000 bytes HTML transferred: 0 bytes Requests per second: 296.63 [#/sec] (mean) Time per request: 3.371 [ms] (mean) Time per request: 3.371 [ms] (mean, across all concurrent requests) Transfer rate: 53.30 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 1 3 1.7 3 33 Waiting: 1 3 1.7 3 33 Total: 1 3 1.7 3 33 Percentage of the requests served within a certain time (ms) 50% 3 66% 3 75% 4 80% 4 90% 4 95% 5 98% 7 99% 10 100% 33 (longest request) On the Percona server: Server Software: Apache/2.2.22 Server Hostname: localhost Server Port: 80 Document Path: / Document Length: 0 bytes Concurrency Level: 1 Time taken for tests: 115.514 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 2120000 bytes HTML transferred: 0 bytes Requests per second: 86.57 [#/sec] (mean) Time per request: 11.551 [ms] (mean) Time per request: 11.551 [ms] (mean, across all concurrent requests) Transfer rate: 17.92 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 6 11 11.3 9 242 Waiting: 6 11 11.3 9 242 Total: 6 12 11.3 9 242 Percentage of the requests served within a certain time (ms) 50% 9 66% 10 75% 11 80% 12 90% 13 95% 17 98% 45 99% 65 100% 242 (longest request) Quote
Ian Posted June 10, 2014 Report Posted June 10, 2014 You do know that Amazon AWS its Disk I/O sucks monkey weener right ? It's no wonder you dont get much out of it with Percona (or MySQL). Quote
techs Posted June 16, 2014 Report Posted June 16, 2014 Probably this won't be an percona issue, it will be your AWS, Can you adjust the percona settings like follows, Please rememeber it is only a sample , the value trylly deppend on your hardware and access. So tweaking mysql take time. [mysqld] innodb_file_per_table=1 local-infile=0 tmpdir =/dev/shm old-passwords = 1 max_connections = 650 max_user_connections=300 key_buffer_size = 512M myisam_sort_buffer_size = 64M join_buffer_size = 1M read_buffer_size = 1M sort_buffer_size = 256K table_cache = 4000 thread_cache_size = 384 wait_timeout = 20 connect_timeout = 10 tmp_table_size = 64M max_heap_table_size = 64M max_allowed_packet = 64M net_buffer_length = 16384 max_connect_errors = 10 thread_concurrency = 16 concurrent_insert = 2 #table_lock_wait_timeout only for mysql5 #table_lock_wait_timeout = 30 read_rnd_buffer_size = 786432 bulk_insert_buffer_size = 8M query_cache_limit = 7M query_cache_size = 64M query_cache_type = 1 query_prealloc_size = 262144 query_alloc_block_size = 65536 transaction_alloc_block_size = 8192 transaction_prealloc_size = 4096 max_write_lock_count = 16 #for mysql 5.1 #skip-external-locking [mysqld_safe] open_files_limit = 8192 [mysqldump] quick max_allowed_packet = 16M [myisamchk] key_buffer = 384M sort_buffer = 384M read_buffer = 256M write_buffer = 256M Quote
jwolfe Posted July 28, 2014 Author Report Posted July 28, 2014 I wanted to update this thread. We moved off Amazon four weeks ago and now we are on a dedicated server with solid state drives. It serves about 21 million impressions per day and has not had any problems. Thanks to Snoork hosting for convincing me to make the change. Quote
Snoork Hosting Posted September 2, 2014 Report Posted September 2, 2014 I wanted to update this thread. We moved off Amazon four weeks ago and now we are on a dedicated server with solid state drives. It serves about 21 million impressions per day and has not had any problems. Thanks to Snoork hosting for convincing me to make the change.I'm glad everything is working out for you fine. Thank you for the update Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.