3
Endless loading / Not Responding
Problem reported by echoDreamz - 10/2/2024 at 3:58 PM
Submitted
I've seen it as well as a quite a few customers have reported issues with the web interface where it just stops responding. Login, "Welcome back ...." and then it just sits there, sometimes for hours until you give up and refresh and then you move on. Sometimes it's opening an email, the spinner comes up and just sits, refresh the browser and everything works fine again. Or open an account setting and none of the options populate, until you refresh.

Our control panel will also experience random timeouts when getting info from the API or after waiting several minutes, an empty response is returned instead of JSON data (our control panel talks directly to SM, thru the Kestrel server, not IIS).

Server CPU etc. is very low, it's not being overworked, our external disk array usage is light as well.

Anyone else seen anything similar? Never experienced this with the older fwk-based SmarterMail or before the 8893 update.

25 Replies

Reply to Thread
1
Zach Sylvester Replied
Employee Post
Hello Echo, 

When you see this again can you open the inspect element window and see if there are any errors or failed network requests? Just for basic troubleshooting can you test the following. 

  1. Check the users computer time. Make sure its exact. 
  2. Disable all browser extensions. 
  3. Make sure that the browser is up to date. 
If all of this looks good please open a ticket with the support department. 

Kind Regards, 
Zach Sylvester Software Developer SmarterTools Inc. www.smartertools.com
1
Nageswara Rao Anumolu Replied
It can be related to what we are experiencing with one of our domains.
1
echoDreamz Replied
Worked with 2 customers on the issue. Time on their machines was "exact" and 1 user had a password manager extension and the other had no extensions at all. Both were using Chrome.

Nothing of anything useful in the dev console, no errors or anything. However,... the email preview pane would get stuck like this. No matter how many emails you'd change to or view, the email would load, but behind a greyed-out window with the chaser spinning.

0
What build are you on ?  I think I remember seeing that and it had to do with a session time out or something on Chrome It looks like you are logged in, but when you go to do an action it just spins.  If you close the tab and then reopen it, it worked, but I think it made you log in again too. I think it was 8797 we were seeing that, but we are on 9014 now and have not seen it since.

www.HawaiianHope.org - Providing technology services to non profit organizations, low income families, homeless shelters, clean and sober houses and prisoner reentry programs. Since 2015, We have refurbished over 11,000 Computers !
0
Zach Sylvester Replied
Employee Post
Hello,

In the newest version of SmarterMail we now log when there is a client with a time difference of 15 seconds or higher in the error log. Just make sure your error logs are set to detailed. 

Best Regards,
Zach Sylvester Software Developer SmarterTools Inc. www.smartertools.com
1
echoDreamz Replied
Client was not more than 15 seconds difference. They provided a screenshot of time.is and indicated they were "exact".

As well as my rig is also "exact" time as well, this is not a time-related issue.
0
echoDreamz Replied
The issue went away for a few weeks after the 9056 update was installed on the 20th of October, and now has returned. It seems like something with the internal SM web server after it has been running for a period of time.
0
Brian Bjerring-Jensen Replied
Cache not flushing?
1
echoDreamz Replied
Not sure, the issue goes away if we bounce SM, but I can guarantee it will return within a few weeks.
0
Brian Bjerring-Jensen Replied
If its a windows server you can import this XML file as a task. It runs every 1 minute and empties the standby memory like RAMMap but does so automatically.

<?xml version="1.0" encoding="UTF-16"?>
<Task version="1.4" xmlns="http://schemas.microsoft.com/windows/2004/02/mit/task">
  <RegistrationInfo>
    <Date>2021-03-12T18:57:28.205157</Date>
    <Author>SERVERNAME\USERNAME</Author>
    <URI>\Empty Standby Cache</URI>
  </RegistrationInfo>
  <Triggers>
    <CalendarTrigger>
      <Repetition>
        <Interval>PT1M</Interval>
        <StopAtDurationEnd>false</StopAtDurationEnd>
      </Repetition>
      <StartBoundary>2021-03-12T00:00:00</StartBoundary>
      <Enabled>true</Enabled>
      <ScheduleByDay>
        <DaysInterval>1</DaysInterval>
      </ScheduleByDay>
    </CalendarTrigger>
  </Triggers>
  <Principals>
    <Principal id="Author">
      <UserId>S-1-5-18</UserId>
      <RunLevel>HighestAvailable</RunLevel>
    </Principal>
  </Principals>
  <Settings>
    <MultipleInstancesPolicy>StopExisting</MultipleInstancesPolicy>
    <DisallowStartIfOnBatteries>true</DisallowStartIfOnBatteries>
    <StopIfGoingOnBatteries>false</StopIfGoingOnBatteries>
    <AllowHardTerminate>true</AllowHardTerminate>
    <StartWhenAvailable>false</StartWhenAvailable>
    <RunOnlyIfNetworkAvailable>false</RunOnlyIfNetworkAvailable>
    <IdleSettings>
      <StopOnIdleEnd>true</StopOnIdleEnd>
      <RestartOnIdle>false</RestartOnIdle>
    </IdleSettings>
    <AllowStartOnDemand>true</AllowStartOnDemand>
    <Enabled>true</Enabled>
    <Hidden>false</Hidden>
    <RunOnlyIfIdle>false</RunOnlyIfIdle>
    <DisallowStartOnRemoteAppSession>false</DisallowStartOnRemoteAppSession>
    <UseUnifiedSchedulingEngine>true</UseUnifiedSchedulingEngine>
    <WakeToRun>false</WakeToRun>
    <ExecutionTimeLimit>PT1H</ExecutionTimeLimit>
    <Priority>7</Priority>
  </Settings>
  <Actions Context="Author">
    <Exec>
      <Command>C:\Install\BATCH\EmptyStandbyList.exe</Command>
    </Exec>
  </Actions>
</Task>


Make sure to create a folder called C:\Install\BATCH\

You can get the small.exe file from our server.

https://cloudpros.dk/dk/uploads/EmptyStandbyList.exe 

I run it every minute and the server remains responsive. The file is clean.
1
echoDreamz Replied
Oddly enough, this fixes client reported loading issues as well as our PRTG timeouts talking to SM's API.

Something is wrong though, shouldn't need to run a utility to cleanup RAM, to me this seems like an underlying leak or an issue with SM not cleaning it's cache up properly.

Our PRTG was going nuts all day with timeouts, and my browser was randomly hanging on SM login, executed the EmptyStandbyList exe and issues immediately went away. I am not running it regularly though, want to see how long it takes for the issue to return.
1
Brian Bjerring-Jensen Replied
You shouldnt but youre welcome :)
2
Matt Petty Replied
Employee Post
We've ran some tests with .NET 9 and it looks promising. There is an underlying unused memory issue where dotnet is holding onto memory we no longer use, we've run tests with manually running the GC and nothing has triggered the memory to be cleaned up until the machine gets low on memory or memory pressure increases. After running .NET 9 some of this behavior seems to be much better but the tests were just quick and preliminary last week.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
2
Jay Dubb Replied
We've seen the same behavior ourselves, and quite a few users have complained.  Log in, and the screen sits for up to a few minutes on the spinning donut.  But, if you log in and hit Refresh as soon as the donut starts spinning, the page loads instantly.  It occurs randomly, not something we can replicate on-demand.  Customers have interpreted this as an overloaded server and accused us of not having enough resources dedicated to the service, which is not true.  On the "linux era" builds we see max sustained RAM usage around 63-65 GB on a 96 GB system, CPU is under 15%, and there is plenty of available disk IO capacity, so we know it's not a lack of resources.
 
0
echoDreamz Replied
Awesome Matt! Unfortunately, the EmptyStandbyList fix has not fixed it, we are still getting random timeouts and customers reporting infinite spinner.
0
Daniel Replied
Hello we do have the same behavor, the sntandby memory "fix" seems to do even worse for us.

So shall we install .net9 or do we have to wait until its in the installer ? (I have no idea about .net so i don't know if there is anythink like compability or dependencies ...)
0
Zach Sylvester Replied
Employee Post
Hi Daniel,

I wanted to let you know that installing the .NET 9 hosting bundle will not currently affect SmarterMail. We are in the process of releasing a version that specifically supports this version. Once it's ready, .NET 9 will be included in the installer.

Kind regards,
Zach Sylvester Software Developer SmarterTools Inc. www.smartertools.com
0
Daniel Replied
Hello,

It seems to mee that it is IIS that causes this if i use port 17017 and 9888 i have those issues on port 9888 but not on 17017 (the buildin webservice).

For IIS when i load the webmail site the performencemonitor -> "WebService Cache" "Current URIs Cached" jumps up.

Edit: just disabled logging in IIS and it seems much more fluid now 
0
echoDreamz Replied
For us, our monitoring system and control panel bypass IIS and talks directly to SM's web port, the timeout is occurring with SM directly.

We also do not have IIS logging enabled.
0
Daniel Replied
Does someone have added more than 1 in ISS "Maximum Worker Processes" ? (does Smartermail scales by this ?)
0
Tony Scholz Replied
Employee Post
Hello, 

Pre linux there was a System Requirement that you only have a single IIS worker Process. I would recommend sticking with just the one. 
Tony Scholz System/Network Administrator SmarterTools Inc. www.smartertools.com
0
Daniel Replied
But there must be something to tweak on IIS, if I use localhost:17017 everything works quite good but if I use localhost:9998 I have those freezes/hangs.
0
Brian Bjerring-Jensen Replied
Are the bindings in order?
0
Daniel Replied
Well to test i just reduced them to only 2 (localhost on port 9998 and * on port 9888) as i have haproxy in front (does the ssl part).

Before i had autodiscover.domain.tld mail.domain.tld webmail.domain.tld on port 9998.

And i dont see any diffrence. (but its also laggy if i am on the server with localhost:9998 (cpu runs on  9% memory about 20% and the longest latency on disk is 4ms)

If i then use the localhost:17017 the lags are gone (wich leads me to iis)



2
echoDreamz Replied
Did a new install of Server 2022 with IIS etc. matching our SM server, compared config files and everything was same until I decided to dump the IIS registry service path.

Somehow, the registry keys for QUIC and HTTP/3 were created and enabled at

Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HTTP\Parameters
Keys "EnableAltSvc" and "EnableHttp3" were both set to 1.

There was also a global IIS response header set for "alt-svc"

We've never created these keys, none of our audit logs showing it was ever created by any of our techs. So it is unclear how they were created in the first place.

Keys were set to 0 and the header removed from IIS. As soon as that was done, the issues went away. It's been 12 hours now with not a single complaint from those who were reporting issues. Server was also rebooted to make sure that IIS fully stopped HTTP/3 and QUIC.

We are still having issues with the SM API returning blank responses though, specifically with the domain/user reporting API requests.

Reply to Thread