AsmBB

Power
Login Register

Asmbb engine high CPU usage with stress test

#15932 (ツ) ganuonglachanh
Created 29.07.2019, read: 1536 times

Hi johnfound,

I'm just doing the stress test in my localhost setup with webslap, with rwasa server. The setting is n=10000, c=5000 (I know it is very high, but it's stress TEST anyway rofl ). Noted that the Test passed with n=10000 and c=1000, system load average is 1.1 ;-)

The engine is using high CPU, that normal but ALL connections were failed later, try accessing URL via browser take forever. Even when I stop the rwasa, the engine still loading with high CPU usage (will many threads/instances created - I can see them in htop command).

Is this a potential bugs? Why would the server is stop but the engine still running with high CPU usage for a very long time (hours). Eventually I must kill all engine processes to stop them :'-(

#15934 (ツ) johnfound
Created 29.07.2019, read: 1534 times
ganuonglachanh

Hi johnfound,

I'm just doing the stress test in my localhost setup with webslap, with rwasa server. The setting is n=10000, c=5000 (I know it is very high, but it's stress TEST anyway rofl ). Noted that the Test passed with n=10000 and c=1000, system load average is 1.1 ;-)

The engine is using high CPU, that normal but ALL connections were failed later, try accessing URL via browser take forever. Even when I stop the rwasa, the engine still loading with high CPU usage (will many threads/instances created - I can see them in htop command).

Is this a potential bugs? Why would the server is stop but the engine still running with high CPU usage for a very long time (hours). Eventually I must kill all engine processes to stop them :'-(

Well, I don't know actually what happened. Need to make some tests on my side. But as a first guess it is something related to the SQLite, simultaneous connections and the memory use. As long as almost every request to the engine writes to the database and with SQLite only one writer to the database is possible, the all 5000 connections allocate the needed memory and then wait in the queue for writing.

Of course, such hangs are not normal. I will try to diagnose and fix the problem ASAP.

Can you give some details about your hardware and experimental setting?

#15935 (ツ) ganuonglachanh
Created 29.07.2019, read: 1532 times
johnfound

Well, I don't know actually what happened. Need to make some tests on my side. But as a first guess it is something related to the SQLite, simultaneous connections and the memory use. As long as almost every request to the engine writes to the database and with SQLite only one writer to the database is possible, the all 5000 connections allocate the needed memory and then wait in the queue for writing.

Of course, such hangs are not normal. I will try to diagnose and fix the problem ASAP.

Can you give some details about your hardware and experimental setting?

Thank you, my local setup is a laptop with Core i5 5300U, RAM 4GB, SSD 120GB, running ubuntu, performance is monitored by htop command. rwasa and webslap use 1 cpu setting, asmbb tested with default database with 1 user and 1 post, URL is the url of the post.

#15943 (ツ) ganuonglachanh
Created 06.08.2019, read: 1469 times

Hi johnfound

Did you find something new? :-D

#15944 (ツ) johnfound
Created 06.08.2019, read: 1468 times
ganuonglachanh

Hi johnfound

Did you find something new? :-D

Unfortunately no, so far. But I am working on it.

#16035 (ツ) ganuonglachanh
Last edited: 12.03.2020 by ganuonglachanh , read: 650 times

Hi johnfound

I tested again this bug, it still there. This is how I done:

1. Set up local host with rwasa and asmbb

2. Set up a test:


time ./webslap -cpu 1 -n 10000 -c 1000 -noui http://localhost/

3. Using webbrowser go to http://localhost/

4. Page took a lot of time to render: Page processing time: 18867.277 ms

#16036 (ツ) johnfound
Created 12.03.2020, read: 648 times

Do you make the step 3 after the webslap test finished? Or when it is in processing?

#16037 (ツ) ganuonglachanh
Created 12.03.2020, read: 646 times
johnfound

Do you make the step 3 after the webslap test finished? Or when it is in processing?

when it is in processing

#16038 (ツ) johnfound
Last edited: 12.03.2020 by johnfound, read: 641 times

Well, I was able to reproduce the problem. It is not a real hanging - all requests are processed at the end, but very, very slow. The problem is that SQLite allows only one writer at a time.

The solution is to have "high-load" detector that to disable the most of the writing requests to the database (mainly not important logging information) when the requests are too many.

Will see what can be done.

#16039 (ツ) ganuonglachanh
Created 13.03.2020, read: 629 times

I remove some function that write to the DB, the result is better but may be only 5-10%, so may be the bottle neck is some where else.

InsertGuest in command.asm

LogUserActivity in users_online.asm

#16040 (ツ) johnfound
Created 13.03.2020, read: 624 times
ganuonglachanh

I remove some function that write to the DB, the result is better but may be only 5-10%, so may be the bottle neck is some where else.

InsertGuest in command.asm

LogUserActivity in users_online.asm

If you want to remove all possible bottle necks, you should look at the sse_service.asm - especially the AddEvent procedure and the whole SSE thread.

#16041 (ツ) ganuonglachanh
Created 13.03.2020, read: 619 times

Thanks, I did remove the AddEvent and

stdcall ThreadCreate, sseServiceThread, 0

the result is better but only 5-10% improvement??

Another interesting find out is that: after the stress test had complete (due to timeout of 10,000 request), the engine continue to run that 10,000 request handler??? It keep printing log out when no new request is made.

#16042 (ツ) johnfound
Last edited: 13.03.2020 by johnfound, read: 616 times
ganuonglachanh

Thanks, I did remove the AddEvent and

stdcall ThreadCreate, sseServiceThread, 0

the result is better but only 5-10% improvement??

Another interesting find out is that: after the stress test had complete (due to timeout of 10,000 request), the engine continue to run that 10,000 request handler??? It keep printing log out when no new request is made.

Well, the performance under very high load need to be optimized further. But if I want to do it really well, the whole engine must to be reworked to use non-blocking sockets, limited number of threads and some other database. Or hard limiting of the write operations in SQLite. And if so, this will be totally different engine. ;-)

P.S. Ah, yes you can also comment out the line:

cinvoke sqliteBusyTimeout, [hMainDatabase], 5000

in the procedure SetDatabaseMode.

This will make all the write attempts to the database to fail immediately with SQLITE_BUSY and not to wait for the database to be released.

#16043 (ツ) ganuonglachanh
Created 14.03.2020, read: 594 times
johnfound

Well, the performance under very high load need to be optimized further. But if I want to do it really well, the whole engine must to be reworked to use non-blocking sockets, limited number of threads and some other database. Or hard limiting of the write operations in SQLite. And if so, this will be totally different engine. ;-)

P.S. Ah, yes you can also comment out the line:

cinvoke sqliteBusyTimeout, [hMainDatabase], 5000

in the procedure SetDatabaseMode.

This will make all the write attempts to the database to fail immediately with SQLITE_BUSY and not to wait for the database to be released.

Thanks johnfound, rewriting the whole new engine might not an option since you has been dev AsmBB for years.

May be a cache solution can help for now.

Thank you for your efforts! rofl

#16044 (ツ) johnfound
Last edited: 14.03.2020 by johnfound, read: 592 times
ganuonglachanh

Thanks johnfound, rewriting the whole new engine might not an option since you has been dev AsmBB for years.

Well, 90% of the engine has been written for approximately a month in 2016. So, it is not so impossible mission. The main problem is that I am lazy and don't want to do the same task from scratch again.

Anyway, I think I found the main problem with these engine hangs and think will be able to improve the things a lot. Of course, it will not be able to process 1000 simultaneous requests, but will not hang trying to do it.

#16045 (ツ) johnfound
Created 14.03.2020, read: 586 times

OK, @ganuonglachanh. Some kind of DDOS protection has been implemented. Check the latest trunk commit. It work pretty decently on my tests with 1000 or even more concurrent connections. Of course, the overall performance is affected and during the load, all the requests are processed slower, but the engine will not hang anymore and after the burst of connections, the normal work restores almost instantly.

#16046 (ツ) ganuonglachanh
Created 15.03.2020, read: 576 times
johnfound

OK, @ganuonglachanh. Some kind of DDOS protection has been implemented. Check the latest trunk commit. It work pretty decently on my tests with 1000 or even more concurrent connections. Of course, the overall performance is affected and during the load, all the requests are processed slower, but the engine will not hang anymore and after the burst of connections, the normal work restores almost instantly.

Thanks johnfound, this post made my day, will check it out rofl

#16047 (ツ) ganuonglachanh
Created 15.03.2020, read: 575 times

Hi johnfound

Your work is amazing, I could stress test up to 10,000 threads, it was a little bit slower but in the end everything finished rofl

But there is a tiny bug: when under stress test, user can't login because we skip the LogUserActivity when under heavy load, so no ticket is create for login process, which lead to !message/login_missing_data/ error.

My fix is: we allow LogUserActivity event under heavy load when user is logging in (.activity == uaLoggingIn)


proc LogUserActivity, .pSpecialData, .activity, .param
.stmt dd ?
begin
        pushad

--->    cmp [.activity], uaLoggingIn ; we don't exit if user is logging in, we need save ticket for login process
--->    je      .noskiplogin

        cmp     [ThreadCnt], MAX_THREAD_CNT/2
        jae     .finish

--->.noskiplogin:

        mov     esi, [.pSpecialData]

        lea     eax, [.stmt]
        cinvoke sqlitePrepare_v2, [hMainDatabase], sqlLogUserActivity, sqlLogUserActivity.length, eax, 0
        cmp     eax, SQLITE_OK
        jne     .finish

One again, amazing work sir rofl

#16048 (ツ) johnfound
Created 15.03.2020, read: 574 times
ganuonglachanh

Hi johnfound

Your work is amazing, I could stress test up to 10,000 threads, it was a little bit slower but in the end everything finished rofl

But there is a tiny bug: when under stress test, user can't login because we skip the LogUserActivity when under heavy load, so no ticket is create for login process, which lead to !message/login_missing_data/ error.

My fix is: we allow LogUserActivity event under heavy load when user is logging in (.activity == uaLoggingIn)


proc LogUserActivity, .pSpecialData, .activity, .param
.stmt dd ?
begin
        pushad

--->    cmp [.activity], uaLoggingIn ; we don't exit if user is logging in, we need save ticket for login process
--->    je      .noskiplogin

        cmp     [ThreadCnt], MAX_THREAD_CNT/2
        jae     .finish

--->.noskiplogin:

        mov     esi, [.pSpecialData]

        lea     eax, [.stmt]
        cinvoke sqlitePrepare_v2, [hMainDatabase], sqlLogUserActivity, sqlLogUserActivity.length, eax, 0
        cmp     eax, SQLITE_OK
        jne     .finish

One again, amazing work sir rofl

Oh! I forgot it. Will use your fix. Thanks.

Asmbb engine high CPU usage with stress test

AsmBB v2.8 (check-in: 6348f13102432a47); SQLite v3.31.1 (check-in: 3bfa9cc97da10598);
©2016..2020 John Found; Licensed under EUPL. Powered by Assembly language Created with Fresh IDE