Just as I get my head above water, some other problem seems to crop up. Most recently I encountered a problem connecting to a SQL Server database that I've been using for the past 3 years without difficulty.
We thought this might be related to a problem with the VM Server as that was demonstrating a problem, and in fact after a restart of the VM Server, I was able to access a different vendor's database product on a different server without problems that was also causing me some grief, but I still couldn't access SQL Server.
Here were the symptoms:
If you google "prelogin error" you'll see a ton of not quite helpful stuff because for the most part, nobody seems to get to the root causes of my particular problem, but I finally managed to do so.
Here's what I discovered:
My firewall was blocking port 1434, which is the port that the SQL Server Browser service uses to enable named instances to find the right SQL Server service to connect to. But even after opening that port, things were still not working right, the connection was failing with a "prelogin error".
One of the posts on the Interwebs pointed me to a Microsoft diagnostic tool used to verify SQL Server Communications. The output of that tool contained something like the following:
What this told me was that the server I wanted to access was listening on a port other than 1433. And of course, that port was blocked (which explains why Wireshark wasn't helping me, because I was looking just at port 1433 traffic). Setting up a firewall rule to allow access to any port used by the SQL Server service resolved the issue (since I couldn't be sure the dynamically assigned port would be used again the next time the server was restarted).
I think part of the reason that nobody has been able to clearly state a solution is because if I'd been trying to connect to SQL_SERVER_1, the already existing rule I had for port 1433 would have been JUST fine, and I wouldn't have needed another rule. And so the published solutions worked for maybe half the users, but not others. And some published solutions suggested multiple different ways to configure Firewall rules, some of which would have worked some of the time, and others (like mine) would work all of the time.
I realize this has nothing to do with standards, but at least half of those of you who read this blog have had your own run-ins with SQL Server.
Now, you might scratch your head and wonder how this worked before, and what happened to the Firewall rules that enabled it to work. For that I have a simple answer. We had recently rebuilt the VM so that it had more resources to do some larger tests, and so the system was redeployed under the same name to a new operating system environment. And my challenge happened to overlap a) the redeployment, and b) the VM having to have been rebooted.
Root cause analysis is a royal PITA, but having invested quite a few hours with it for this problem, I'll never have it for more than a few minutes again, and now hopefully, you won't either.
Keith
We thought this might be related to a problem with the VM Server as that was demonstrating a problem, and in fact after a restart of the VM Server, I was able to access a different vendor's database product on a different server without problems that was also causing me some grief, but I still couldn't access SQL Server.
- Connections worked locally on the Server.
- Connections worked inside the Firewall.
- Connections that were tunneled through some form of VPN weren't working at all (with two different kinds of VPN).
I was pretty well able to diagnose the problem as being firewall related, but there's at least four between me and that server, and I only have access to two of them, and unfortunately could find no information in the firewall logs to help. Wireshark might have been helpful except for some reason I couldn't get it to access my network traffic on the 1433 port that I knew SQL Server used.
If you google "prelogin error" you'll see a ton of not quite helpful stuff because for the most part, nobody seems to get to the root causes of my particular problem, but I finally managed to do so.
Here's what I discovered:
My firewall was blocking port 1434, which is the port that the SQL Server Browser service uses to enable named instances to find the right SQL Server service to connect to. But even after opening that port, things were still not working right, the connection was failing with a "prelogin error".
One of the posts on the Interwebs pointed me to a Microsoft diagnostic tool used to verify SQL Server Communications. The output of that tool contained something like the following:
Sending SQL Server query to UDP port 1434...
Server's response:
ServerName SERVER1
InstanceName SQL_ SERVER_1
IsClustered No
tcp 1433
ServerName SERVER1
InstanceName SQL_SERVER_2
IsClustered No
tcp 59999
What this told me was that the server I wanted to access was listening on a port other than 1433. And of course, that port was blocked (which explains why Wireshark wasn't helping me, because I was looking just at port 1433 traffic). Setting up a firewall rule to allow access to any port used by the SQL Server service resolved the issue (since I couldn't be sure the dynamically assigned port would be used again the next time the server was restarted).
I think part of the reason that nobody has been able to clearly state a solution is because if I'd been trying to connect to SQL_SERVER_1, the already existing rule I had for port 1433 would have been JUST fine, and I wouldn't have needed another rule. And so the published solutions worked for maybe half the users, but not others. And some published solutions suggested multiple different ways to configure Firewall rules, some of which would have worked some of the time, and others (like mine) would work all of the time.
I realize this has nothing to do with standards, but at least half of those of you who read this blog have had your own run-ins with SQL Server.
Now, you might scratch your head and wonder how this worked before, and what happened to the Firewall rules that enabled it to work. For that I have a simple answer. We had recently rebuilt the VM so that it had more resources to do some larger tests, and so the system was redeployed under the same name to a new operating system environment. And my challenge happened to overlap a) the redeployment, and b) the VM having to have been rebooted.
Root cause analysis is a royal PITA, but having invested quite a few hours with it for this problem, I'll never have it for more than a few minutes again, and now hopefully, you won't either.
Keith
0 comments:
Post a Comment