THE LINUX STUFF: File System Bugs or Errors

Mount Points and /etc/fstab

/etc/fstab is referenced each time the system boots. It consists fields like device name, mount point, file system type, fsck order like 0 = ignore, 1=fist, 2-9 = second, and third, etc. After the filesystem is created in /etc/fstab, it's important to use the mount-a to mount the filesystem just created to check for errors.

umount /apps: To unmount the file system

fuser -v /apps: For some reasons you cannot mount a filesystem that has open files, you can display information about the processes using a filesystem.

fuser -km /apps: To kill all actions on a filesystem

mount -o remount,rw / : To mount in a single operation. This is common during recovery operation.

--------------------------------------------------

Red Hat Bugzilla – Bug 494927

Read-only filesystem after 'ext3_free_blocks_sb: bit already cleared for block' errors

Last modified: 2011-10-06 10:44:36 EDT

Bug 494927 - Read-only filesystem after 'ext3_free_blocks_sb: bit already cleared for block' errors

Status:	CLOSED ERRATA

Aliases:	None (edit)

Product:	Red Hat Enterprise Linux 5
Component(s):	kernel (Show other bugs)
Version(s):	5.3
Platform:	All Linux

Priority:	low Severity:urgent
Target Release:	--- Target Milestone: rc
Assigned To:	Eric Sandeen
QA Contact:	Eryu Guan

URL:
Whiteboard:
Keywords:	Reopened

Depends on:	696843
Blocks:
	Show dependency tree

Reported:	2009-04-08 14:20 EDT by Gordon Messmer
Modified:	2011-10-06 10:44 EDT (History)
CC List:	31 users (show)

See Also:

Fixed In Version:
Technical Notes:

Clone Of:
Environment:
Last Closed:	2011-07-21 05:46:27

Attachments	(Terms of Use)
Output from "fsck -n -f /dev/hda3" (1.77 KB, text/plain) 2009-04-14 13:59 EDT, Gordon Messmer	no flags	Details
ext3 error after sync (6.34 KB, text/plain) 2009-07-20 05:56 EDT, Daniel Brnak	no flags	Details
script to generate dirty snapshots (3.58 KB, application/octet-stream) 2011-04-20 22:14 EDT, Eric Sandeen	no flags	Details
script to drive dirty_snapshot script (933 bytes, application/octet-stream) 2011-04-20 22:19 EDT, Eric Sandeen	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

External Trackers

Description Gordon Messmer 2009-04-08 14:20:39 EDT

Description of problem:

In the last week, I've seen the same error reported in bug #476533 on two hosts

running kernel-2.6.18-128.1.6.el5 (CentOS dev hosts).  One host was a KVM

guest, the other was a physical system with a 3ware controller.

Version-Release number of selected component (if applicable):

kernel-2.6.18-128.1.6.el5

Comment 1 Gordon Messmer 2009-04-13 15:55:38 EDT

Fortunately, on the second host, the /var partition wasn't the one that went

read-only, so a little more information is available from /var/log/messages:

Apr  7 01:39:54 bucket kernel: Linux version 2.6.18-128.1.6.el5

(mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat

4.1.2-44)) #1 SMP Wed

Apr 1 09:10:25 EDT 2009

...

Apr  8 03:41:54 bucket kernel: EXT3-fs error (device sdc1):

ext3_free_blocks_sb: bit already cleared for block 170872073

Apr  8 03:41:54 bucket kernel: Aborting journal on device sdc1.

Apr  8 03:41:54 bucket kernel: ext3_abort called.

Apr  8 03:41:54 bucket kernel: EXT3-fs error (device sdc1):

ext3_journal_start_sb: Detected aborted journal

Apr  8 03:41:54 bucket kernel: Remounting filesystem read-only

Apr  8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in

ext3_reserve_inode_write: Journal has aborted

Apr  8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in ext3_truncate:

Journal has aborted

Apr  8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in

ext3_reserve_inode_write: Journal has aborted

Apr  8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in ext3_orphan_del:

Journal has aborted

Apr  8 03:41:54 bucket kernel: EXT3-fs error (device sdc1) in

ext3_reserve_inode_write: Journal has aborted

Apr  8 03:41:54 bucket kernel: __journal_remove_journal_head: freeing

b_committed_data

Apr  8 03:41:54 bucket last message repeated 18 times

Comment 2 Gordon Messmer 2009-04-14 13:58:27 EDT

I set up a test environment under KVM on Fedora 10.  The test system was

running CentOS 5.3 with the latest kernel:

# uname -a

Linux backup 2.6.18-128.1.6.el5 #1 SMP Wed Apr 1 09:19:18 EDT 2009 i686 athlon

i386 GNU/Linux

This is an identical setup to the VM that I first saw fail.  Overnight, I ran

two instances of bonnie++ in a loop in different directories under /var/tmp.

Additionally, I ran a loop copying /usr to a directory under /var/tmp and then

removing it.  After around 12 hours, the rsync loop quit when it could not

remove one of the files.  The kernel recorded:

Apr 14 09:10:44 backup kernel: attempt to access beyond end of device

Apr 14 09:10:44 backup kernel: hda3: rw=0, want=1719008864, limit=3357585

I booted the system into single user mode to run fsck on the filesystems and

found a considerable amount of corruption on the /var partition, presumably

caused by the two "bonnie++" loops and the rsync loop.  Running "fsck

/dev/hda3" reported that the filesystem was clean, despite the fact that the

kernel had noticed an attempt to access beyond the end of the device.  I ran

"fsck -n -f /dev/hda3" to get a list of the errors on the filesystem.  Its

output will be attached.  The damaged filesystem is still available if any

further information about it would be helpful.

While the systems exhibiting the problem are not RHEL systems, I believe it is

likely that they are affected.  Please help me confirm this bug.

Comment 3 Gordon Messmer 2009-04-14 13:59:14 EDT

Created attachment 339534 [details]

Output from "fsck -n -f /dev/hda3"

Comment 4 Eric Sandeen 2009-04-16 17:00:53 EDT

the problem in comment #2 looks like a different problem, FWIW - at least a

different end result.  Both could be caused by any sort of corruption,

including kernel in-memory corruption, or hardware problems.  Or bugs ...

I can try the same test mentioned in #2 to see if I can reproduce on RHEL5.3.

Comment 5 Jack Bauer 2009-06-04 14:02:10 EDT

this problem also exists in kernel version 2.6.18-128.1.10.el5xen

Comment 6 Gatis Visnevskis 2009-06-27 09:58:37 EDT

Similar problems show up on RHEL4.7 (kernel 2.6.9-78.0.17smp).

Also verified with "fsstress" utility. (thanks to LKML)

Any ext3 filesystem shows errors if run with arguments like this:

mkfs.ext3 /dev/ram0

mount -t ext3 /dev/ram0 /mnt/test_file_system

mkdir -p /mnt/test_file_system/work

fsstress -d /mnt/test_file_system/work -p 3 -l 0 -n 100000000

Will try to test other kernel versions in short time.

Comment 7 Gatis Visnevskis 2009-06-27 10:50:01 EDT

Verified on latest RHEL5 update:

2.6.18-128.1.14.el5 #1 SMP Mon Jun 1 15:52:36 EDT 2009 i686 i686 i386 GNU/Linux

problem exists.

Verified on old unpatched RHEL3:

2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 i686 i386 GNU/Linux

kernel panic trying to umount ext3 fs.

Verified on very old RedHat 7.2 :)

2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown

no problems. ok, unfair, there is only ext2.

Comment 8 Eric Sandeen 2009-06-27 13:24:05 EDT

(In reply to comment #7)

> Similar problems show up on RHEL4.7 (kernel 2.6.9-78.0.17smp).

Similar, or same?  bug #476533 reports this should be fixed in 78.0.14.EL so

perhaps there is yet another root cause if it's the exact same problem.

(In reply to comment #7)

> Verified on latest RHEL5 update:

> 2.6.18-128.1.14.el5 #1 SMP Mon Jun 1 15:52:36 EDT 2009 i686 i686 i386 GNU/Linux

> problem exists.

This kernel also has the same fix as bug #476533, fixed in bug #225172

You did the same test as in comment #6?

Thanks,

-Eric

Comment 9 Gatis Visnevskis 2009-06-27 18:23:27 EDT

OK, short summary:

We have pre-production system RHEL4.7 (kernel -78.0.17).

Got error "ext3_free_blocks_sb: bit already cleared for block" on two different

servers.

Database server produces lot of oracle archivelog files. 100gb filesystem

sometimes get full, then older files are deleted. Error appears after 2 months

of testing.

Application server writes to some 20-30 logfiles simultaneously. Error appears

after performance testing, within couple days.

This error "ext3_free_blocks_sb: bit already cleared for block" should be fixed

in 78.0.14, right?

Then i started to search for test tools.

Found this:

http://lkml.org/lkml/2008/11/14/121

fixed in vanilla kernel 2.6.29

Changelog here:

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29

Exaple:

# uname -a

Linux gasha-rhel5 2.6.18-128.1.14.el5

fsck errors:

/dev/ram2: Inode 251, i_size is 205449, should be 995328.  FIXED.

/dev/ram2: Inode 3416, i_size is 654163, should be 1212416.  FIXED.

/dev/ram2: Inode 4032, i_size is 868310, should be 1138688.  FIXED.

/dev/ram2: Inode 3530, i_size is 1458533, should be 1900544.  FIXED.

Compiled fresh vanilla 2.6.30 kernel with "make oldconfig".

gasha-rhel5 kernel: Linux version 2.6.30 (root@gasha-rhel5) (gcc version 4.1.2

20080704 (Red Hat 4.1.2-44)) #1 SMP Sun Jun 28 00:39:51 EEST 2009

No fsck errors.

I cannot say, that these two error messages have the same roots. Then i need to

run these stress tests longer.

Comment 10 Eric Sandeen 2009-06-27 23:34:43 EDT

Ok, this bug is ranging all over now...  It was opened for a very distinct

error message:

Read-only filesystem after 'ext3_free_blocks_sb: bit already cleared for block'

errors

But the testcase in comment #9, while it does seem to be an issue in RHEL5 that

I'll investigate, looks like it exposes a different problem.  It also appears

to provoke a problem yet upstream, though manifesting differently.

However, I'm still trying to reproduce anything on RHEL5 that looks like the

original bug reported here.

-Eric

Comment 11 Gordon Messmer 2009-06-29 15:01:49 EDT

Unfortunately, I think this bug started out ranging all over, and I take

responsibility for that.  I believe that I initially saw this bug on a system

early in my deploying 5.3.  I was working on a system whose RAID array had

failed, and didn't save information on the failure because I suspected the

hardware.  Subsequent hardware tests did not reveal any problems.  This system

wasn't mentioned in my initial report, because I had no data.

Later, I saw a similar failure on another system with a 3ware controller, which

I did mention.  I've recently determined that this system had hardware

problems.  This data should be ignored.  That was "bucket", mentioned in

comment #1.  Unfortunately, since I opened the ticket with information from

this host, the summary and half of the information I provided is incorrect.

I'm still fairly confident that the KVM guest mentioned in comments #2 and #3

was a legitimate bug in Linux.  Since the nature of the bug isn't clear to me,

I'm not sure what kinds of corruption may result, so it may not be possible to

determine whether or not all of the commenters are seeing the same corruption

bug.  We should probably focus on the test case in comment #9.

Please let me know what additional information I can provide, or what testing I

can do.

Comment 12 Scott Mohnkern 2009-07-16 10:14:00 EDT

Another data point

------------------

We've seen a similar problem (I think) on one RHEL4 machine (we've got 7

identical ones in terms of hardware, and its just this one.

What we see is that the /var partition will just suddenly go read only.  Of

course the problem is, the minute it goes read only it can't write to the log

files.

A mount -o remount,rw /var does not clear the problem, only a reboot will do

it.

We're currently running the 2.6.9-89 SMP kernel.

Comment 13 Eric Sandeen 2009-07-16 11:14:13 EDT

Scott, if there's any chance for a serial console, or a look at a physical

console, maybe you can get more information about what's gone wrong?

Or is "bit already cleared for block" the only info...

Thanks,

-Eric

Comment 14 Scott Mohnkern 2009-07-16 11:20:44 EDT

We're actually not even seeing that, it just suddenly flips to read only.

Next time it happens I'll look at the physical console and see what I see.

Unfortunately, it appears to be non-repeatedly, and inconsistent as to when it

happens.

Comment 15 Daniel Brnak 2009-07-20 05:56:48 EDT

Created attachment 354320 [details]

ext3 error after sync

we found ext3 errors 4 identical Linux machines (installed via jumpstart,

kernel 2.6.18-128.el5, x86_64)

always in /var, which was remounted read only

attached console messages, on serial console, there is nothing more produced.

Comment 16 Gatis Visnevskis 2009-07-20 06:25:22 EDT

I'm still trying to reproduce error using "fsstress" utility on dedicated

stress system. No luck.

I think it has wrong pattern. Real thing should be 2-cpu server, and and

simultaneous writes to MANY open files, at the same time. Some kind of race

condition.

Comment 17 Devin Nate 2009-08-04 00:38:27 EDT

For what it's worth, I've been poking around this all day today (i.e. testcase

in comment #9 with fsstress). This is what I've found so far:

Testing with:

fsstress -d /mnt/test_file_system/work -p 3 -l 0 -n 100000000 -X. Some tests

re-run with -p 1 to remove the possibility of parallel tasks impacting the

test.

RHEL 5.4 beta kernel version 2.6.18-157.el5 (otherwise a stock 5.3 install). No

other processes are accesting the test_file_system.

RHEL is running as a vmware esxi 4.0 guest, 2vcpu, 2GB ram. No errors in dmesg.

1. fails on ext2, blocksizes 1024, 2048, 4096, when mounted on a loop device

(loop0) losetup against a file in /dev/shm (tmpfs).

2. fails on ext2, blocksizes 1024, 2048, 4096, when mounted on a ramdisk (ram0)

using the new brd driver (part of rhel 5.4).

3. fails on ext3, blocksizes 1024, 2048, 4096, when mounted on a loop device

(loop0) losetup against a file in /dev/shm (tmpfs), ext3 mount options

data=writeback and data=ordered

4. fails on ext3, blocksizes 1024, 2048, 4096, when mounted on a ramdisk (ram0)

using the new brd driver (part of rhel 5.4), ext3 mount options data=writeback

and data=ordered

5. fails on ext3, blocksize default, when mounted on a loop device (loop0)

losetup against a file in / (exiting ext3 filesystem).

6. succeeds on ext3 for the above, with data=journal mount option.

7. Basically, as best I can tell, ext3 with data=journal always works, and

nothing else does.

8. Doesn't appear to be limited to ramdisk (or at least, loop devices on the

actual hard drive fail also).

9. I tried ext4.. it also failed. I don't recall which variation I used above.

It was mounted with default mount options.

note:

"Fail" is evidenced by some combination of errors similar to:

+ e2fsck -fvp /dev/ram0

/dev/ram0: HTREE directory inode 61596 has an invalid root node.

HTREE INDEX CLEARED.

/dev/ram0: HTREE directory inode 61604 has an invalid root node.

HTREE INDEX CLEARED.

/dev/ram0: HTREE directory inode 61632 has an invalid root node.

HTREE INDEX CLEARED.

/dev/ram0: HTREE directory inode 61644 has an invalid root node.

HTREE INDEX CLEARED.

/dev/ram0: HTREE directory inode 61660 has an invalid root node.

HTREE INDEX CLEARED.

/dev/ram0: Inode 69395, i_size is 902294, should be 1906688.  FIXED.

/dev/ram0: Inode 69723, i_size is 1277952, should be 1280000.  FIXED.

/dev/ram0: Inode 69781, i_size is 0, should be 939008.  FIXED.

/dev/ram0: Inode 70098, i_size is 662593, should be 1267712.  FIXED.

/dev/ram0: Inode 70227, i_size is 0, should be 330752.  FIXED.

/dev/ram0: Inode 70418, i_size is 892928, should be 896000.  FIXED.

/dev/ram0: Inode 70541, i_size is 380928, should be 382976.  FIXED.

/dev/ram0: Inode 71272, i_size is 503808, should be 506880.  FIXED.

/dev/ram0: Inode 71663, i_size is 2002944, should be 2004992.  FIXED.

/dev/ram0: Inode 72421, i_size is 348160, should be 350208.  FIXED.

/dev/ram0: Inode 73392, i_size is 958464, should be 960512.  FIXED.

/dev/ram0: Inode 73980, i_size is 434176, should be 437248.  FIXED.

/dev/ram0: Inode 74468, i_size is 466944, should be 470016.  FIXED.

/dev/ram0: Inode 76077, i_size is 200704, should be 202752.  FIXED.

/dev/ram0: Inode 71114, i_size is 0, should be 103424.  FIXED.

/dev/ram0: Inode 70462, i_size is 0, should be 72704.  FIXED.

/dev/ram0: Entry 'c638' in /work/p0/d1/d141/d3d7/d4ed (61644) has an incorrect

filetype (was 3, should be 1).

/dev/ram0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

        (i.e., without -a or -p options)

"Success" was evidenced by no errors after repeated iterations (in my case, I

let it run a couple hundred runs).

Furthermore, I put a bunch of links into a comment I put against the RHEL 5.4

brd driver info discussing various aspects I found today. The info is viewable

from rhel regarding the use of the new brd ramdisk driver in rhel 5.4:

https://bugzilla.redhat.com/show_bug.cgi?id=480663

Thanks,

Devin

Comment 18 Ric Wheeler 2009-08-04 11:11:25 EDT

This bug has attracted several seemingly unrelated issues - please open

distinct bugzilla tickets when you change the storage/virt/etc stack since all

of these can produce unique issues.

Comment 19 Devin Nate 2009-08-04 11:19:57 EDT

I'll open 30 tickets if necessary ;) I was kind of hoping that since Red Hat

can see evidence of corruption in ext2, ext3, and ext4, that your team might be

more interested in the internal processes than me.

Please confirm you want new tickets for each combo? FYI, I'm now testing

against a physical box (IBM server) as well. Same results.

Thanks,

Devin Nate

Comment 20 Eric Sandeen 2009-08-04 11:35:44 EDT

Devin, your testcase & error seem to be on-disk-corruption and/or fsck related,

with no mention of "bit already cleared for block" in the kernel messages,

which was the original error report for this bug.  That's why we'd like a new

bug to track this seemingly distinct problem.

If your failure case is always more or less the same basic corruption found,

I'd be happy with 1 new bug from you, in which you clearly describe the

testcase (attaching or pasting the test script you used would be great), and

test environment, and also the problematic results you encountered after the

tests.

Thanks,

-Eric

Comment 21 Devin Nate 2009-08-04 11:45:44 EDT

Hi Eric, thank you very much. What/where/how would you like it opened? I

appreciate the response, just want to make sure it gets appropriate attention

(and really have no idea how to open a bug to accomplish that - in fact I've

never opened a new rhel bug before). Any guidance would be helpful, and I'll

upload all info.

Currently compiling a stock kernel 2.6.30.4 to see there, based on the rhel 5.4

.config for the kernel.

Thanks,

Devin

Comment 22 Eric Sandeen 2009-08-04 12:09:50 EDT

Devin, you can go here:

https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%205

and file a bug agains the kernel component.

Or if you're a RHEL customer, going through your support contacts is often

encouraged as well.

Comment 23 Devin Nate 2009-08-04 12:57:05 EDT

Thanks Eric. We are a RHEL customer.. have no idea who our support contact is.

I've opened a new bug:

https://bugzilla.redhat.com/show_bug.cgi?id=515529

If I could trouble you to take a glance at it to make sure I've opened it

properly for redhat.

Thanks,

Devin

Comment 24 Justin Gargano 2009-08-06 21:00:20 EDT

I've just deployed a system with RedHat 5.3 and we are seeing the same issue:

------------------------------------------------------------------------

EXT3-fs error (device sdb1): ext3_free_blocks_sb: bit already cleared for block

731165448

Aborting journal on device sdb1.

EXT3-fs error (device sdb1): ext3_free_blocks_sb: bit already cleared for block

731165449

EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted

EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted

EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted

EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted

EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted

EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted

EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted

EXT3-fs error (device sdb1) in ext3_truncate: Journal has aborted

EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted

EXT3-fs error (device sdb1) in ext3_orphan_del: Journal has aborted

EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted

EXT3-fs error (device sdb1) in ext3_delete_inode: Journal has aborted

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

ext3_abort called.

EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal

Remounting filesystem read-only

------------------------------------------------------------------------

The issue occurs under heavy disk i/o while performing a Oracle RMAN restore.

Our configuration is on a Dell with a Perc 6 (LSI/megasas) raid controller with

an external eSATA storage device, MD1000(JBOD).

Please let me know if there is anything I can do to help provide some insight

into the problem.

Comment 25 Eric Sandeen 2009-08-06 21:30:00 EDT

As a general plea for folks who are RHEL customers, please also ping your

support channel about this problem, it'll help prioritize the bug.

Thanks,

-Eric

Comment 26 Eric Sandeen 2009-08-06 21:35:52 EDT

Oh, and for anyone hitting the i_size problems on fsck post-fsstress, that

issue has been moved over to bug #515529,  if that's your issue you might want

to cc: over there.

Justin, I'll try to make some headway on this original bug soon, too.

Comment 27 Eric Sandeen 2009-08-07 16:22:38 EDT

And for folks w/ RHEL4 problems, you need to open a different bug, or get your

support folks to do it.

Sorry but we have to keep things focused here ;)

And now, back to "bit already cleared for block" on RHEL5....

Thanks,

-Eric

Comment 28 Eric Sandeen 2009-08-10 18:54:01 EDT

For those hitting the bit already cleared for block error:

Let's try a jprobe to see what path we're hitting the error on, it may at least

discern between types of blocks being freed and give a clue to what's

happening.

http://people.redhat.com/esandeen/bz494927/

has a pre-built jprobe module for x86_64, and source & a Makefile that could be

rebuilt on any arch by putting the Makefile and ext3_error_jprobe.c in a

directory, cd into that dir, and:

# make -C /lib/modules/`uname -r`/build M=`pwd`

to build the module.  Then you can:

# insmod ./ext3_error_jprobe.ko

It will dump a stack backtrace to dmesg whenever ext3_error() is hit, while the

module is loaded.

Thanks,

-Eric

Comment 29 Eric Sandeen 2009-08-10 18:58:06 EDT

Oh, for the above "make" to work you need kernel-devel for the kernel matching:

# uname -r

installed as well.

If anyone gets a nice backtrace when the bit already cleared for block error

pops up, please add it to this bug.

Thanks,

-Eric

Comment 35 Justin Gargano 2009-10-28 07:28:07 EDT

(In reply to comment #24)

We tracked this problem down to a faulty PERC 6 Raid controller.  Once it was

replaced, the error went away.

Comment 40 Issue Tracker 2009-11-04 19:27:09 EST

Event posted on 11-05-2009 11:27am EST by mgoodwin

Hi all,

Comparing the previous event (Aug 28 2009) with this new event

(Oct 31 2009), a smoking gun has emerged. Both events occurred at

exactly the same time (23:53:01) on exactly the same filesystem

(/var mounted on vg00-var, which is dm-7), and both reported free

list corruption for exactly the same block (149827).

Further more, in /etc/cron.d/sysstat, we have :

# generate a daily summary of process accounting at 23:53

53 23 * * * root /usr/lib64/sa/sa2 -A

This site is running sysstat-7.0.2-3.el5-x86_64.

Here are the two log snippets :

This event, Oct 31 :

Oct 31 23:51:04 srpamp35 dhclient: DHCPREQUEST on eth2 to 148.126.2.5 port

Oct 31 23:51:43 srpamp35 last message repeated 3 times

Oct 31 23:52:47 srpamp35 last message repeated 5 times

Oct 31 23:52:52 srpadm62 snmpd[3108]: Connection from UDP:

[148.126.25.56]:1725

Oct 31 23:53:01 srpamp74 kernel: jprobe: ext3_error

Oct 31 23:53:01 srpamp74 syslogd: /var/log/messages: Read-only file

system

Oct 31 23:53:01 srpamp74 kernel:

Oct 31 23:53:01 srpamp74 kernel: Call Trace:

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8850b017>]

:ext3_error_jprobe:jext3_error+0x17/0x20

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8804a93c>]

:ext3:ext3_free_blocks_sb+0x31b/0x48b

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8804aada>]

:ext3:ext3_free_blocks+0x2e/0x79

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8804f07c>]

:ext3:ext3_free_data+0x84/0xfe

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8804f839>]

:ext3:ext3_truncate+0x567/0x8e1

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8003fa10>]

pagevec_lookup+0x17/0x1e

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff88032152>]

:jbd:journal_start+0xc9/0x100

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff88050256>]

:ext3:ext3_delete_inode+0x7d/0xd4

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff880501d9>]

:ext3:ext3_delete_inode+0x0/0xd4

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8002f49e>]

generic_delete_inode+0xc6/0x143

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8003c75c>]

do_unlinkat+0xd5/0x141

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8005dde9>]

error_exit+0x0/0x84

Oct 31 23:53:01 srpamp74 kernel:  [<ffffffff8005d116>]

system_call+0x7e/0x83

Oct 31 23:53:01 srpamp74 kernel:

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7):

ext3_free_blocks_sb: bit already cleared for block 149827

Oct 31 23:53:01 srpamp74 kernel: Aborting journal on device dm-7.

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_free_blocks_sb: Journal has aborted

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_free_blocks_sb: Journal has aborted

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_reserve_inode_write: Journal has aborted

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_truncate: Journal has aborted

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_reserve_inode_write: Journal has aborted

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_orphan_del: Journal has aborted

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_reserve_inode_write: Journal has aborted

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_delete_inode: Journal has aborted

Oct 31 23:53:01 srpamp74 kernel: __journal_remove_journal_head: freeing

b_committed_data

Oct 31 23:53:01 srpamp74 kernel: journal commit I/O error

Oct 31 23:53:01 srpamp74 kernel: ext3_abort called.

Oct 31 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7):

ext3_journal_start_sb: Detected aborted journal

Oct 31 23:53:01 srpamp74 kernel: Remounting filesystem read-only

Previous event, on August 28 :

Aug 28 23:51:26 srpamp74 kernel: scsi(0): Asynchronous PORT UPDATE ignored

0000/0007/0b00.

Aug 28 23:53:01 srpamp74 syslogd: /var/log/messages: Read-only file

system

Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7):

ext3_free_blocks_sb: bit already cleared for block 149827

Aug 28 23:53:01 srpamp74 kernel: Aborting journal on device dm-7.

Aug 28 23:53:01 srpamp74 kernel: ext3_abort called.

Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7):

ext3_journal_start_sb: Detected aborted journal

Aug 28 23:53:01 srpamp74 kernel: Remounting filesystem read-only

Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_free_blocks_sb: Journal has aborted

Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_reserve_inode_write: Journal has aborted

Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_truncate: Journal has aborted

Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_reserve_inode_write: Journal has aborted

Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_orphan_del: Journal has aborted

Aug 28 23:53:01 srpamp74 kernel: EXT3-fs error (device dm-7) in

ext3_reserve_inode_write: Journal has aborted

Aug 28 23:53:01 srpamp74 kernel: __journal_remove_journal_head: freeing

b_committed_data

Aug 28 23:53:01 srpamp74 syslogd: /var/log/maillog: Read-only file system

Aug 28 23:53:01 srpamp74 syslogd: /var/log/cron: Read-only file system

Aug 28 23:53:01 srpamp74 auditd[4261]: Record was not written to disk

(Read-only file system)

Aug 28 23:53:01 srpamp74 auditd[4261]: write: Audit daemon detected an

error writing an event to disk (Read-only file system)

Aug 28 23:53:01 srpamp74 auditd[13430]: Audit daemon failed to exec

(null)

Aug 28 23:53:01 srpamp74 auditd[13430]: The audit daemon is exiting.

John,

Known bugs in the /usr/lib64/sa/sa2 script could be related here,

maybe BZ 101574?  (that BZ has an easy reproducer recipe, maybe

give it a try).

Also, since /dev/vg00/var is less than 1G, could we please get a

compressed dd image? Might be worth investigating if the /var/log/sa

directory inode is stored near block 149827.

Cheers

-- Mark Goodwin

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by mgoodwin

 issue 347967

Comment 51 Albert S. 2010-07-26 12:10:52 EDT

Same problem here in a production virtual server running on vSphere 4.0.0 build

208111 servers.

Virtual server info:

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

kernel 2.6.18-128.2.1.el5

Issues only appears in filesystems mounted on SAN.

Today I have three issues in two different filesystems (each one a LUN on EMC

SAN).

- message log:

Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1): ext3_free_blocks_sb:

bit already cleared for block 70722697

Jul 26 12:29:02 nobel kernel: Aborting journal on device sde1.

Jul 26 12:29:02 nobel kernel: ext3_abort called.

Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1):

ext3_journal_start_sb: Detected aborted journal

Jul 26 12:29:02 nobel kernel: Remounting filesystem read-only

Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in

ext3_reserve_inode_write: Journal has aborted

Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in ext3_truncate:

Journal has aborted

Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in

ext3_reserve_inode_write: Journal has aborted

Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in ext3_orphan_del:

Journal has aborted

Jul 26 12:29:02 nobel kernel: EXT3-fs error (device sde1) in

ext3_reserve_inode_write: Journal has aborted

Jul 26 12:29:02 nobel kernel: __journal_remove_journal_head: freeing

b_committed_data

Jul 26 12:35:24 nobel kernel: __journal_remove_journal_head: freeing

b_committed_data

Jul 26 12:35:24 nobel kernel: ext3_abort called.

Jul 26 12:35:24 nobel kernel: EXT3-fs error (device sde1): ext3_put_super:

Couldn't clean up the journal

==> Command to check filesystem: fsck.ext3 -f -C 0 -y /dev/sde1

Jul 26 14:43:05 nobel kernel: ext3_abort called.

Jul 26 14:43:05 nobel kernel: EXT3-fs error (device sdg1):

ext3_journal_start_sb: Detected aborted journal

Jul 26 14:43:05 nobel kernel: Remounting filesystem read-only

Jul 26 14:52:50 nobel kernel: ext3_abort called.

Jul 26 14:52:50 nobel kernel: EXT3-fs error (device sdg1): ext3_put_super:

Couldn't clean up the journal

==> Command to check filesystem: fsck.ext3 -C 0 -y /dev/sdg1

Jul 26 15:48:29 nobel kernel: EXT3-fs error (device sdg1): ext3_lookup:

unlinked inode 22093862 in dir #18956368

Jul 26 15:48:29 nobel kernel: Aborting journal on device sdg1.

Jul 26 15:48:29 nobel kernel: __journal_remove_journal_head: freeing

b_committed_data

Jul 26 15:48:29 nobel kernel: ext3_abort called.

Jul 26 15:48:29 nobel kernel: EXT3-fs error (device sdg1):

ext3_journal_start_sb: Detected aborted journal

Jul 26 15:48:29 nobel kernel: Remounting filesystem read-only

Jul 26 16:05:06 nobel kernel: ext3_abort called.

Jul 26 16:05:06 nobel kernel: EXT3-fs error (device sdg1): ext3_put_super:

Couldn't clean up the journal

Jul 26 16:05:39 nobel kernel: kjournald starting.  Commit interval 5 seconds

Jul 26 16:05:39 nobel kernel: EXT3-fs warning (device sdg1):

ext3_clear_journal_err: Filesystem error recorded from previous mount: IO

failure

Jul 26 16:05:39 nobel kernel: EXT3-fs warning (device sdg1):

ext3_clear_journal_err: Marking fs in need of filesystem check.

Jul 26 16:05:39 nobel kernel: EXT3-fs warning: mounting fs with errors, running

e2fsck is recommended

Jul 26 16:05:39 nobel kernel: EXT3 FS on sdg1, internal journal

Jul 26 16:05:39 nobel kernel: EXT3-fs: recovery complete.

Jul 26 16:05:39 nobel kernel: EXT3-fs: mounted filesystem with ordered data

mode.

==> Command to check filesystem: fsck.ext3 -f -C 0 -y /dev/sdg1

Comment 52 Josef Bacik 2010-07-26 14:26:59 EDT

The original poster to this bugzilla said that fsck did in fact fix this

problem.  For the person who commented in c #51, please open a support ticket

so they can debug the problem further.

Comment 58 Eric Sandeen 2011-04-07 13:53:16 EDT

Benefits of being at a storage conference, I was just reminded of this fix

upstream:

commit f25f624263445785b94f39739a6339ba9ed3275d

Author: Jan Kara <jack@suse.cz>

Date:   Mon Jul 12 21:04:31 2010 +0200

    ext3: Avoid filesystem corruption after a crash under heavy delete load

    It can happen that ext3_free_branches calls ext3_forget() for an indirect

block

    in an earlier transaction than a transaction in which we clear pointer to

this

    indirect block. Thus if we crash before a transaction clearing the block

    pointer is committed, we will see indirect block pointing to already freed

    blocks and complain during orphan list cleanup.

    The fix is simple: Make sure ext3_forget() is called in the transaction

    doing block pointer clearing.

    This is a backport of an ext4 fix by Amir G.

<amir73il@users.sourceforge.net>

    Signed-off-by: Jan Kara <jack@suse.cz>

which may well be the problem here.  It would be interesting to investigate the

image with debugfs and see if it is indeed an indirect block which points to a

freed block.

-Eric

Comment 59 Eric Sandeen 2011-04-07 14:08:11 EDT

ext4 version upstream:

commit 40389687382bf0ae71458e7c0f828137a438a956

Author: Amir G <amir73il@users.sourceforge.net>

Date:   Tue Jul 27 11:56:05 2010 -0400

    ext4: Fix block bitmap inconsistencies after a crash when deleting files

    We have experienced bitmap inconsistencies after crash during file

    delete under heavy load.  The crash is not file system related and I

    the following patch in ext4_free_branches() fixes the recovery

    problem.

    If the transaction is restarted and there is a crash before the new

    transaction is committed, then after recovery, the blocks that this

    indirect block points to have been freed, but the indirect block

    itself has not been freed and may still point to some of the free

    blocks (because of the ext4_forget()).

    So ext4_forget() should be called inside ext4_free_blocks() to avoid

    this problem.

    Signed-off-by: Amir Goldstein <amir73il@users.sf.net>

    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Comment 61 RHEL Product and Program Management 2011-04-07 14:19:51 EDT

This request was evaluated by Red Hat Product Management for inclusion in a Red

Hat Enterprise Linux maintenance release.  Product Management has requested

further review of this request by Red Hat Engineering, for potential

inclusion in a Red Hat Enterprise Linux Update release for currently deployed

products.  This request is not yet committed for inclusion in an Update

release.

Comment 71 Mark Goodwin 2011-04-11 20:29:04 EDT

Eric, I'm setting up a bridged VM for you running RHEL 5.6, with the

unpacked image available. Be ready in ~ 1 hour.

Comment 76 Eric Sandeen 2011-04-13 16:12:00 EDT

There are plenty of ways to hit this warning; I keep finding new ones.

For example, from yet another image:

Multiply-claimed block(s) in inode 32802: 69654 69654

If that inode is unlinked, it will try to block 69654 twice, and the 2nd one

would issue this error.

There is more than one root cause which could get us to this error message.

Comment 80 Eric Sandeen 2011-04-20 11:34:19 EDT

In my testing, commit f1015c447781729060c415f5133164c638561f25 has reduced a

1%-2% occurence of journal replay corruption to 0 (out of about 3500 replays).

In most cases it's mild corruption but over hours of testing I have seen bit

already cleared errors and some other more severe things.

I will be pushing that commit for RHEL5 soon and I expect that it will resolve

most of these issues, at least for customers who have the problem at any time

after experiencing a crash.  (Note that the corruption could "like dormant"

until it was stumbled on again at runtime.  Think for example of a rotated log

that only gets deleted after 6 months, and -that- is when the problem shows up,

6 months after the crash...)

-Eric

Comment 84 RHEL Product and Program Management 2011-04-20 21:59:26 EDT

This request was evaluated by Red Hat Product Management for inclusion in a Red

Hat Enterprise Linux maintenance release.  Product Management has requested

further review of this request by Red Hat Engineering, for potential

inclusion in a Red Hat Enterprise Linux Update release for currently deployed

products.  This request is not yet committed for inclusion in an Update

release.

Comment 85 Eric Sandeen 2011-04-20 22:14:49 EDT

Created attachment 493679 [details]

script to generate dirty snapshots

Invocation:

./dirty_snapshot -s $SOURCE_LVNAME /dev/$COW_DEVICE $SNAPSHOTNAME i.e.

./dirty_snapshot -s testvg-testlv /dev/sdb3 testlv_snap

to create a new dirty snapshot called "testlv_snap" from

/dev/mapper/testvg-testlv, with /dev/sdb3 as the cow device.

./dirty_snapshot -d testvg-testlv /dev/sdb3 testlv_snap

removes it again.

Comment 86 Eric Sandeen 2011-04-20 22:19:52 EDT

Created attachment 493681 [details]

script to drive dirty_snapshot script

This script runs fsstress (which I grabbed from xfstests) constantly, while

creating snapshots that require recovery.

It checks the recovered snapshot and if it's corrupted, makes a copy of it, and

tries mounting it and removing all files to hit runtime errors.

You'll need to edit the invocation of the dirty snapshot script to give it the

right parameters.  It also expects to have fsstress in the same dir, and

mountpoints mnt/ and mnt2/ as well.  Sorry, it's a bit rough.

The main lv I tested was 128M in size, made with 4k blocks.

I saw some sort of corruption in about 1%-2% of the replayed filesystems.

Comment 87 Lachlan McIlroy 2011-04-20 22:24:34 EDT

Eric, you mentioned the following commits - do you still think we may need

them?

40389687382bf0ae71458e7c0f828137a438a956

86963918965eb8fe0c8ae009e7c1b4c630f533d5

Comment 88 Eric Sandeen 2011-04-20 22:55:35 EDT

(In reply to comment #87)

> Eric, you mentioned the following commits - do you still think we may need

> them?

> 40389687382bf0ae71458e7c0f828137a438a956

> 86963918965eb8fe0c8ae009e7c1b4c630f533d5

Ultimately they probably should go in, but not as a last-minute 5.7 update, I

think.

40389687382bf0ae71458e7c0f828137a438a956 requires a crash right in the middle

of restarting a transaction while unlinking a large file, if I read it right.

86963918965eb8fe0c8ae009e7c1b4c630f533d5 is also explicitly related to crashes

while deleting.

In short, much narrower windows, and the commit I sent is most likely to be

hit.

Mark tells me you have your own testcase, though, so if you can still hit any

problems with the one commit, we can talk again.  :)

Comment 104 Jarod Wilson 2011-05-13 18:17:37 EDT

Patch(es) available in kernel-2.6.18-261.el5

You can download this test kernel (or newer) from

http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 109 alberto 2011-05-27 10:07:10 EDT

Hello,

i have the same problem but with RHEL 5.5 x86. Ocurried 3 times in a server in

less than two weeks in production. Here is the lines in /var/log/message

May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): ext3_lookup:

unlinked inode 3918926 in dir #5570561

May 27 02:31:16 hostname kernel: Aborting journal on device sdh.

May 27 02:31:16 hostname kernel: __journal_remove_journal_head: freeing

b_committed_data

May 27 02:31:16 hostname kernel: ext3_abort called.

May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh):

ext3_journal_start_sb: Detected aborted journal

May 27 02:31:16 hostname kernel: Remounting filesystem read-only

To solve this only works rebooting the machine.

Is this the same problem?

Comment 110 Eric Sandeen 2011-05-27 11:05:43 EDT

(In reply to comment #109)

> Hello,

> i have the same problem but with RHEL 5.5 x86. Ocurried 3 times in a server in

> less than two weeks in production. Here is the lines in /var/log/message

> May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): ext3_lookup:

> unlinked inode 3918926 in dir #5570561

> May 27 02:31:16 hostname kernel: Aborting journal on device sdh.

> May 27 02:31:16 hostname kernel: __journal_remove_journal_head: freeing

> b_committed_data

> May 27 02:31:16 hostname kernel: ext3_abort called.

> May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh):

> ext3_journal_start_sb: Detected aborted journal

> May 27 02:31:16 hostname kernel: Remounting filesystem read-only

> To solve this only works rebooting the machine.

> Is this the same problem?

Probably not.  You might try to run e2sfck -f on the fs to see if there is

corruption.  If the error is ongoing, please open a case with your RHEL support

contacts.

Thanks,

-Eric

Comment 111 alberto 2011-05-30 03:47:01 EDT

(In reply to comment #110)

> (In reply to comment #109)

> > Hello,

> >

> > i have the same problem but with RHEL 5.5 x86. Ocurried 3 times in a server in

> > less than two weeks in production. Here is the lines in /var/log/message

> >

> >

> > May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh): ext3_lookup:

> > unlinked inode 3918926 in dir #5570561

> > May 27 02:31:16 hostname kernel: Aborting journal on device sdh.

> > May 27 02:31:16 hostname kernel: __journal_remove_journal_head: freeing

> > b_committed_data

> > May 27 02:31:16 hostname kernel: ext3_abort called.

> > May 27 02:31:16 hostname kernel: EXT3-fs error (device sdh):

> > ext3_journal_start_sb: Detected aborted journal

> > May 27 02:31:16 hostname kernel: Remounting filesystem read-only

> >

> > To solve this only works rebooting the machine.

> >

> > Is this the same problem?

> Probably not.  You might try to run e2sfck -f on the fs to see if there is

> corruption.  If the error is ongoing, please open a case with your RHEL support

> contacts.

> Thanks,

> -Eric

Really this error only saw one time, the other two times the system crashed and

it freezed all, ping works, ssh request login and password but when type

password don´t work.

I think the other times the system was in read only mode in "/var" and "/"

volume and i could not see the same error because it could not  write into log.

This system was in tests during three weeks and connected via fibre channel to

DAS, always OK without any error. Two weeks in production and three crashes.

I have kernel 2.6.18-194.el5PAE .

Thanks

Comment 112 errata-xmlrpc 2011-07-21 05:46:27 EDT

An advisory has been issued which should help the problem

described in this bug report. This report is therefore being

closed with a resolution of ERRATA. For more information

on therefore solution and/or where to find the updated files,

please follow the link below. You may reopen this bug report

if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html

Comment 113 Albert S. 2011-10-06 04:12:53 EDT

Same problem again in a production virtual server running on vSphere 4.0.0

build

208167 servers.

Virtual server info:

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

Issues only appears in filesystems mounted on SAN.

We installed patched kernel 2.6.18-265.el5PAE on 23th of May.

An issue appeared on 30th of September and server had installed the patched

kernel 2.6.18-265.el5PAE (almost 3)

That day I installed patched kernel 2.6.18-285.el5PAE.

But another issue appeared yesterday (with patched kernel 2.6.18-285.el5PAE

installed).

Then I installed the last patched kernel: 2.6.18-286.el5PAE

- message log:

EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block

43055512

Aborting journal on device sdh1.

EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted

EXT3-fs error (device sdh1) in ext3_truncate: Journal has aborted

EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted

EXT3-fs error (device sdh1) in ext3_orphan_del: Journal has aborted

EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted

EXT3-fs error (device sdh1) in ext3_delete_inode: Journal has aborted

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

ext3_abort called.

EXT3-fs error (device sdh1): ext3_journal_start_sb: Detected aborted journal

Remounting filesystem read-only

==> Procedure to solve the problem:

1. Unmounting the file system (/dev/sdc14)

2. Nuke the ext3 journal

 tune2fs -O ^has_journal /dev/sdc14

3. check disk

        nohup e2fsck -fy -C 0  /dev/sdc14 &

4. Rebuild the journal

tune2fs -j /dev/sdc14

5. Mounting the file system (/dev/sdc14)

Comment 114 Eric Sandeen 2011-10-06 10:44:36 EDT

It's not clear to me why you would need to nuke the journal...?

In any case, this bug is closed; based on our investigations we have shipped a

fix for the root cause encountered by the original reporter. As Josef already

mentioned in comment #52, if you have a new issue, please contact your Red Hat

support channel and get a new issue logged.

Thanks,

-Eric

Note You need to log in before you can comment on or make changes to this bug.

Only users in any of the selected groups can view this bug:

Unchecking all boxes makes this a more public bug.

Private Group (Bug is not viewable by public)
Security Sensitive Bug (Check if this is a security related issue and should not be public)

Only members of a group can change the visibility of a bug for that group.

THE LINUX STUFF

Thursday 24 November 2011

File System Bugs or Errors

Mount Points and /etc/fstab

No comments:

Post a Comment