Failing drive, no notice

How, what, where and why - when using the software.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Failing drive, no notice

Post by Skirge »

This is the first time that–at least at first glance–HDS has appeared to let me down. I receive status emails from HDS at 15:00 every day and there were no reported issues, even today. In fact, all drives were at 100%. I noticed some slowdowns accessing my NAS during video playback a couple days ago. I figured the server was busy doing something and wrote it off. Today, it got significantly worse, so I checked the server. It took forever to log in. I checked Event Viewer and saw tons of "The device, \Device\Harddisk5\DR5, has a bad block." messages, beginning on the 27th. I did some checking to figure out which drive it was and knocked it offline in my storage array*. As soon as I did so, Windows sped up and HDS went nuts, telling me the drive was at 3% health. I've already salvaged what I needed to from the drive and my backups are in decent shape, so I'm not going to lose anything valuable.

* I use FlexRAID's tRAID software. This isn't your typical RAID setup.

Now, I know it's unlikely, but is there any way to figure out why HDS didn't pick up on the pending failure? Thanks!
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

I'm terrible sorry about the situation and hear what happened, generally this is exactly what we are trying to avoid in all possible ways.

While most problems and degadations can be detected, noticed and we can immediately see the results, some issues may not be always possible to predict and notice too early.

In most cases, when the hard disk reaches a problematic sector / area on the disk surface, minor problems and some % health decrease quickly detected, recognised and reported. So usually before a such huge health decrease, there may be some issues detected (some sector errors, 1-2% health drop etc) - but in some rare situations the disk drive may reach higher amount of problematic sectors and then you may immediately notice a higher drop in the health as now.

This depends on lots of factors: can be related to the operating environment (power, operating temperature, cables, connections, the age of the disk drive, actual usage, possible mechanical damage/shock) - or even it is possible that the hard disk already had the problematic sectors, but on an area which never used / accessed previously by the hard disk drive - and this way not yet recognised / detected / reported. Such problems can remain unnoticed for long time (even months/years) until the hard disk attempts to read/write.

Not sure, but did you perform regular disk tests (eg. Disk menu -> Extended self test) or at least was the hard disk examined/tested with intensive tests before using in the current configuration, as suggested at https://www.hdsentinel.com/faq.php#tests ?

Even if sounds surprising, sometimes new hard disk drives are not perfect, they may have issues "hidden" on the surface. So I usually recommend to perform intensive testing even new drives before using for real data storage, exactly to reveal such - or confirm if they are *really* perfect.
Without these, we may experience similar to your case or the situation described under Support -> Knowledge Base -> Hard Disk Cases -> Bad sectors:
https://www.hdsentinel.com/hard_disk_ca ... ectors.php
where the hard disk drive was operating for years with no issues (and excellent health) until the problematic area is reached.

Generally Hard Disk Sentinel can detect/read the error counters and possible issues which are "known" by the hard disk drive. It's important to let the hard disk drive scan the surface regularly (or at least once before using for data storage) exactly to reveal issues even sooner - otherwise a coming problem can quickly drop the health this way. I hope most of the data can be still saved this stage.

If possible, I'd suggest to use Report menu -> Send test report to developer option about the disk drive now.
Examining the status and the error counters (and generally the situation) can help, may give further information about what happened and what can be done to improve the situation both with your disk drive - and also to offer earlier warning(s) in all possible ways to prevent/avoid similar.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Re: Failing drive, no notice

Post by Skirge »

For all the times HDS has saved me, this one "failure" is no big deal. As requested, I sent you a report, with the hope that it might help you figure out why this was missed. As you can see, the drive is quite old (ancient, even). According to the stats, I performed an extended test back in 2016 and that completed successfully. Based on that, it would seem that the drive had no issues from the factory. You did make me check to see if there was a way to run these tests on a schedule and I just discovered that the projects area does allow such a thing. I'm hoping I can have the short test run overnight every so often while a drive is still, technically, in operation. If so, I'll stagger the tests and schedule each drive to be tested approximately every 45-90 days.
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

Thanks for the report, I examined it and see the health and the issues of the hard disk drive. As I see, it has 9% now, "thanks" to the bad sectors and weak sectors reported in the text description.

To be honest, yes, I'm not surprised about the problems coming and the high drop of the health. As you wrote, the hard disk is really old (ancient) and the reported

Power on time . . . . : 2165 days, 23 hours

confirms that it's already over the designed lifetime. Probably the "estimated remaining lifetime" value slowly but surely decreased in the years and then reached "100 days" to suggest that the drive reached the designed lifetime and then it may be better to consider a planned replacement.

Even if the hard disk stays at 100% health for many years, we can't say for sure how long it will work. It may work for additional years (there are numerous drives working for many more years) but generally after the end of designed lifetime, the chances of unforeseen, sudden failure (or "just" very high drop of health and high number of problems reported at once) are higher.

So after 2000+ days, somehow we can expect problems....
The decreasing estimated lifetime suggests that in a mission critical environment, it may be better to consider/plan replacement.

Maybe you can still attempt to perform a Disk menu -> Surface test -> Reinitialise Disk Surface test (after a backup as it clears all data) to improve the health, usability and stability of the disk drive, but personally I'd only recommend to use for secondary storage (for not critical data) only.
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

Yes, it is possible to create a project to perform the internal self test (short self test or extended self test) on any hard disk drive(s).

In Hard Disk Sentinel Pro, you can configure to start the project on any day(s) of the week at any given time, but can schedule on a weekly basis.

However, with an easy trick you can overcome this:

- select Configuration -> Operations page and on the Projects tab (after configured the project) click on the small button showing a "shortcut" to Hard Disk Sentinel icon next to the "Start now" button. This will create a shortcut to the desktop which can be used to start the project manually any time.
- on your desktop, locate this icon and after right click, you can check its properties: the command line used to launch the project
- in Windows Scheduler, you can create a scheduled task based on the actual requirements by using the command line. You can even delete the shortcut from the desktop after that.

Personally I configure extended self test to be launched on mission critical drives by Hard Disk Sentinel Pro once a week: I scheduled to Saturday evening/night, surely not busy time for the drives to pass a complete, extended self test.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Re: Failing drive, no notice

Post by Skirge »

Thanks for looking into this and thanks for the tips on scheduling projects. Most of my drives are quite large, so having them run a weekly test for several hours at a time might not be the right solution for me. However, I will definitely be using the scheduler in some fashion to ensure they're tested regularly from now on!

Given that "100 days" seems to be an important threshold, as does reaching 2000 hours of power on time, I was surprised that neither of these are listed as options in the alerts section. These also aren't pieces of information sent in the daily status emails. Would you consider adding either or both of these? If my opinion carries any weight, my preference would be an actual alert. :D I wasn't aware my drives were that old (obviously, a failure on my part), but, even if I did, I don't know that I would've considered those to be indicators that they were now at a significantly greater risk for sudden failure. In fact, I have 4 more drives close to or above one or more of those thresholds right now, so I'm working to get them out of service ASAP. Between scheduling drive tests and being able to monitor these thresholds, I think I can avoid any further unexpected failures.
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

The power on time value is over 2000 _days_ (not 2000 hours) ;)

So this means 2000 x 24 = 48000 total hours powered (your hard disk drive reported even more: 2165 days, so almost 52000 hours).

Yes, this does not really trigger alarm/alert, just noticeable from the displayed/reported decreasing "estimated remaining lifetime" value.

But yes, I agree: maybe would be good idea to increase attention to this end-of-life situation better, to suggest a planned replacement to avoid problems / possible sudden, unforeseen failures. Thanks for the tip, I'm sure it will be added in a later version.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Re: Failing drive, no notice

Post by Skirge »

hdsentinel wrote:The power on time value is over 2000 _days_ (not 2000 hours) ;)
:D LOL! That's what I get for trying to write something too quickly. Good lord... imagine drives having only a 3 month life span?!?!
hdsentinel wrote:Yes, this does not really trigger alarm/alert, just noticeable from the displayed/reported decreasing "estimated remaining lifetime" value.

But yes, I agree: maybe would be good idea to increase attention to this end-of-life situation better, to suggest a planned replacement to avoid problems / possible sudden, unforeseen failures. Thanks for the tip, I'm sure it will be added in a later version.
Excellent! I'm glad to hear that.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Re: Failing drive, no notice

Post by Skirge »

I created the projects to test my drives, but when I try to create the desktop shortcuts, I get the message, "Error while creating the shortcut for this project." Any idea why this might be? I'm running 5.61 PRO as an administrator, so I'm not sure why it can't create it. Any ideas?
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

Sounds interesting, generally there should be no problem when creating the shortcut. I just tried, works without problems all systems including latest Win 10 too.
Maybe related to something in your Windows installation (language, regional settings or so)? Just an idea...
If you use Report menu -> Send test report to developer option, it may give some thoughts.

Generally the shortcut would have this kind of command line:

C:\Program Files (x86)\Hard Disk Sentinel\HDSAction.exe /RUN="DiskTest"

where "DiskTest" is the name of the project to be started. So if you name the project differently, then you'd need to specify the name of the project there.
Make sure that to start disk test command you'd need to have admin privileges.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Re: Failing drive, no notice

Post by Skirge »

I sent a test report off to you.

I did manage to create the shortcut manually, as you described, but haven't tested it yet. I did get an error message for illegal characters, so I needed to rename the projects, but I still got the error message from HDS after they had acceptable characters.
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

Yes, I was thinking that probably invalid (non-standard) characters could cause troubles.
I originally thought that may be related to Windows language/regional setting or similar, but as I see from the report, you use English Windows, so should be no problem.

Please try to create project with very simple name (only use English A-Z letters and maybe numbers) just to be safe, as then hopefully there will be no problems.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Re: Failing drive, no notice

Post by Skirge »

You may have missed the end of what I wrote in my previous reply. I did recreate them with simpler names (e.g. test or Test1), but I'm still getting the error message from HDS.
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

Did you start the command with admin user rights?

If you manually create a shortcut with the proper command line, you'd need to use right click -> Run as administrator in order to start (or in right click -> Properties window, you can specify that it should run as administrator).
Otherwise the command can't be started, the test command is blocked due to insufficient user rights.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Re: Failing drive, no notice

Post by Skirge »

I had 2 extended drive tests fail to run recently when triggered by Task Manager. The history in Task Manager shows that it thinks it completed successfully, since it did attempt to trigger HDS to run the tests. However, the email notification after each run says HDS was unable to test the hard disk. I then went in to manually trigger the task and it also failed with the same notification. But, starting the extended self test from within HDS immediately after that, it began running as exptected.

Any idea what might be preventing it from running? I know I have some MS updates scheduled to be installed, so maybe they already screwed something up before the update gets installed? :roll: Let me know if you want me to send a report.
Skirge
Posts: 27
Joined: 2013.10.10. 17:47

Re: Failing drive, no notice

Post by Skirge »

Surprised I didn't get a reply to this, as your support is normally excellent, so I hope everything is okay on your end.

Anyhow, I had another issue with the extended test not running on a drive when manually started in the app. Well, it did start and complete successfully, but it took just 30 seconds on a 6TB drive, which was also exactly how long the short test took. So, I decided to try reinstalling HDS and that seems to have fixed the issue with the extended test taking just 30 seconds. I'm hoping it will also fix the tests no running via Task Scheduler. I'll know when this drive finishes its test and the next task gets triggered in a few days.
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

I did not answer before because I was not sure if I understand the situation, how "Task Manager" can be here, how it's related to Hard Disk Sentinel and/or the tests at all...

Did you use Windows Scheduler to start/launch a disk test (a configured hard disk test project, eg. by its command line as described previously in this topic)?
Then there must be some kind of user rights/elevation issue, as the scheduled task may not have the proper admin rights to perform the test.
This can be the reason why the Windows Scheduler (not the Task Manager if I'm correct) shows that the task started - but of course can't perform the test.

The same happens if you try to manually start the test by the shortcut/command line - except if you may right click -> Run as administrator, to give the proper rights to the task.
If you launch from Hard Disk Sentinel itself, then there should be no problem - as Hard Disk Sentinel itself has the rights to access the disk drive to launch the self test.

If you prefer, you can use Report menu -> Send test report to developer option as then I can check the current version (and current Windows OS) and this may give some thoughts, but if I'm correct, this is not really related to Windows version/update.

If you mean something else so I did not understand the situation correctly, please send some images/screenshots in e-mail to info (at) hdsentinel (dot) com as then I can check.
User avatar
hdsentinel
Site Admin
Posts: 3010
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Failing drive, no notice

Post by hdsentinel »

I was not sure about the situation, so waited for a report or an e-mail with some clarification ;)
E-mail is generally much faster and preferred ;)

Generally (as described in different topics) the short self test, extended self tests are hardware self tests, running "inside" the hard disk drives, completely out of control of any software.

Hard Disk Sentinel can only
- start the test and
- read the progress
- check and interpret the results

But the test is done "inside" the disk drive, so everything (for example how long it runs) depends on the disk drive only. Sometimes other factors (eg. disk controller or its driver or USB adapter if used) can affect the response of the drive. Sometimes a different controller, different USB adapter or just a chipset update may change the behaviour.
Usually a software re-install makes no difference, so probably it was a coincidence only.

This is also described in the Help: sometimes these hardware self tests have their limitations and in this case a different type of test (Disk menu -> Surface test -> Read test) is recommended instead.

As usual, if you use Report menu -> Send test report to developer option, I can check the actual situation: examine the disk connection and driver used - and this may give some thoughts too.
Post Reply