Discussion:
[ADMIN] Database corrupted
(too old to reply)
Yann ROBIN
2011-12-05 19:29:56 UTC
Permalink
Hi,

Earlier this afternoon, our database crash with a stacktrace.
We killed hardly the remaining postgres process left.
Since then it's been hell !!!

First Postgres told us that there was a corrupted index and we needed
to reindex it.
We couldn't do it because there was duplicate id in the table.
So I decided to make a copy of the table and then try to remove
data/pkey constraint.

The database then crashed and couldn't restart. There was an xlog
flush request error.
Based on what we saw on internet we launch a pg_resetxlog.

Database started nicely but data was still corrupted. So we launch a
REINDEX command and got this error in the log (1000 times per second)
:
WARNING : concurrent delete in progress within table

We waited 30minutes but the message was still there and the reindex not done.

We couldn't find any help online.
Any idea what to do to get the database back ?


Thanks,
--
Yann
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Kevin Grittner
2011-12-05 20:03:38 UTC
Permalink
Post by Yann ROBIN
Earlier this afternoon, our database crash with a stacktrace.
First things first: Before you do anything else, shut down
PostgreSQL and make a copy of the data directory tree.

http://wiki.postgresql.org/wiki/Corruption

Second, please post information about your environment. What
version of PostgreSQL is this? What is your OS? What hardware is
it running on?

Now, please copy from the log at the time of the crash and post all
messages, plus any possibly relevant messages in the clients and the
OS logs. Is there a core file from the crash? Can you get a
backtrace from it?
Post by Yann ROBIN
We killed hardly the remaining postgres process left.
It's best to keep notes of exactly what was done. What was the
process description of what you killed? Which signal did you use?
Keep notes as you go.
Post by Yann ROBIN
First Postgres told us that there was a corrupted index and we
needed to reindex it.
We couldn't do it because there was duplicate id in the table.
So I decided to make a copy of the table and then try to remove
data/pkey constraint.
The database then crashed and couldn't restart. There was an xlog
flush request error.
Copy/paste it?
Post by Yann ROBIN
Based on what we saw on internet we launch a pg_resetxlog.
Database started nicely but data was still corrupted. So we launch
a REINDEX command and got this error in the log (1000 times per
second)
WARNING : concurrent delete in progress within table
We waited 30minutes but the message was still there and the
reindex not done.
How big is the table? What kind of column(s) in the index?
Post by Yann ROBIN
We couldn't find any help online.
There is this list. For advice on how to get the most useful advice
from it, you might want to read this page:

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

You might want to consider contracting for professional support:

http://www.postgresql.org/support/professional_support/

-Kevin
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Yann ROBIN
2011-12-05 20:18:43 UTC
Permalink
Post by Kevin Grittner
First things first: Before you do anything else, shut down
PostgreSQL and make a copy of the data directory tree.
http://wiki.postgresql.org/wiki/Corruption
Did it.
Post by Kevin Grittner
Second, please post information about your environment.  What
version of PostgreSQL is this?  What is your OS?  What hardware is
it running on?
Version 9.0.4 on debian squeeze 6.0 running in KVM with virtio (kernel 2.6.32)
I think we have a hard drive issue.
Post by Kevin Grittner
Now, please copy from the log at the time of the crash and post all
messages, plus any possibly relevant messages in the clients and the
OS logs.  Is there a core file from the crash?  Can you get a
backtrace from it?
I rm the log file has it was taking too much space because of the 1000
warning per seconds.
I feel dumb right now. Sorry.
Post by Kevin Grittner
It's best to keep notes of exactly what was done.  What was the
process description of what you killed?  Which signal did you use?
Keep notes as you go.
kill -9 of the writer process
Post by Kevin Grittner
Copy/paste it?
Sorry lost it but it was xlog flush request 0/xxxx is not satisfied
--- flushed only to xxxx
Post by Kevin Grittner
How big is the table?  What kind of column(s) in the index?
primary key index on int32, table have 1.2M lines
--
Yann
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Scott Marlowe
2011-12-05 20:31:19 UTC
Permalink
Post by Yann ROBIN
Post by Kevin Grittner
First things first: Before you do anything else, shut down
PostgreSQL and make a copy of the data directory tree.
http://wiki.postgresql.org/wiki/Corruption
Did it.
Post by Kevin Grittner
Second, please post information about your environment.  What
version of PostgreSQL is this?  What is your OS?  What hardware is
it running on?
Version 9.0.4 on debian squeeze 6.0 running in KVM with virtio (kernel 2.6.32)
I think we have a hard drive issue.
Post by Kevin Grittner
Now, please copy from the log at the time of the crash and post all
messages, plus any possibly relevant messages in the clients and the
OS logs.  Is there a core file from the crash?  Can you get a
backtrace from it?
I rm the log file has it was taking too much space because of the 1000
warning per seconds.
I feel dumb right now. Sorry.
Post by Kevin Grittner
It's best to keep notes of exactly what was done.  What was the
process description of what you killed?  Which signal did you use?
Keep notes as you go.
kill -9 of the writer process
Are you sure you killed all the postgres backends before restarting
the server? If other backends are still running, with a dead
postmaster, and you restart the server, instant corruption.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Kevin Grittner
2011-12-05 20:46:00 UTC
Permalink
Post by Scott Marlowe
Post by Yann ROBIN
Version 9.0.4 on debian squeeze 6.0 running in KVM with virtio (kernel 2.6.32)
I don't know anything about KVM or vertio, so hopefully others will
step up.
Post by Scott Marlowe
Post by Yann ROBIN
I think we have a hard drive issue.
If at all possible, I would try to sort that out before continuing
recovery. You could keep piling up one disk error on top of
another; and each can make the others harder to sort out and fix.
Post by Scott Marlowe
Post by Yann ROBIN
kill -9 of the writer process
Are you sure you killed all the postgres backends before
restarting the server? If other backends are still running, with
a dead postmaster, and you restart the server, instant corruption.
Yeah, I would make *absolutely* sure there isn't an orphaned
postgres process still running before trying any other recovery
steps.

Once you are sure that your storage system isn't nibbling away at
your data and there isn't an old postgres process running, you might
want to list the rows with the duplicate key value, and then delete
them. The safest course, once you've rebuilt the index, is to us
pg_dump and psql (or pg_restore) to rebuild the database.

Be sure to keep that copy of the data directory tree for at *least*
a few weeks after everything seems to be running fine. You may well
belatedly discover a reason to go back and fish for some data.

-Kevin
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Tom Lane
2011-12-05 20:53:52 UTC
Permalink
Post by Scott Marlowe
Post by Yann ROBIN
kill -9 of the writer process
Are you sure you killed all the postgres backends before restarting
the server? If other backends are still running, with a dead
postmaster, and you restart the server, instant corruption.
There are interlocks against that ... although if you were foolish
enough to manually remove postmaster.pid, you could defeat them :-(

regards, tom lane
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Yann ROBIN
2011-12-05 20:59:03 UTC
Permalink
Post by Tom Lane
There are interlocks against that ... although if you were foolish
enough to manually remove postmaster.pid, you could defeat them :-(
We didn't remove postmaster.pid
--
Yann
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Scott Marlowe
2011-12-05 21:32:32 UTC
Permalink
Post by Tom Lane
Post by Scott Marlowe
Post by Yann ROBIN
kill -9 of the writer process
Are you sure you killed all the postgres backends before restarting
the server?  If other backends are still running, with a dead
postmaster, and you restart the server, instant corruption.
There are interlocks against that ... although if you were foolish
enough to manually remove postmaster.pid, you could defeat them :-(
I've had to remove it once or twice in the past (has that behavior
changed in more recent versions, where it's smarter about it?) but I
knew to check for orphaned backends as well. If someone did and
didn't respectively then they'd definitely be seeing odd behaviour and
a corrupted database.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Continue reading on narkive:
Loading...