Postgres restoring replication, timeline conflict

postgresql database-administration replication

12,920

As mentioned by Craig Ringer, I did a new backup and checked and after setting up the slave server it worked.

But while I was doing all that I also remembered that there was an old server which was also being a slave from the old master db (A) (that server should have not been running and that is why I did not initially think of it). Anyway, after taking the old slave down, and did back up and restore again, it simply worked.

As I said, I initially thought it was because of a bad backup, but it ended up being an error message being produce by a third server (second slave db). Just to prove my point I started the old server and got the error messages again.

2015-10-31 10:26:37 CET ERROR:  requested starting point 19/FE000000 on timeline 1 is not in this server's history
2015-10-31 10:26:37 CET DETAIL:  This server's history forked from timeline 1 at 19/FDCF9BA0.

So, it seems that the replication was working all alone but this error messages done by a second replication were throwing me off.

Again, thanks Craig for the help.

12,920

Keyjote

On a daily basis I am a developer, dba and scrum master. I mostly enjoy coding in c#, python and sql. I am db abnostic, but use Ms SQL Server and Postgres at my current job. As a hobby I like playing and learning python and postgres/postgis databases. But foremost I am a code-firefighter. I take care of bugs and sanity in our applications.

Updated on September 18, 2022

Comments

Keyjote over 1 year

I have a postgres database (version 9.4) with streaming replication (master, slave configuration). Lets call master db A and slave db B.

The server running A failed and we had to do a switchover, where we promoted B to be the new master. Until now it is all good and working fine.

Now I have recovered the broken server and want to set up again the replication so A can be the new slave. So, I take a backup from B, put it in server A, set up the recovery file and start it. The problem here is that it doesn't really work any more, as it says they are in two different time lines.

Here are the messages from A (new slave):

2015-10-30 14:28:04 LOG:  database system was shut down in recovery at 2015-10-30 14:27:28 CET 
2015-10-30 14:28:04 LOG:  entering standby mode 
2015-10-30 14:28:04 LOG:  redo starts at 1A/5802B1A8 
2015-10-30 14:28:04 LOG:  consistent recovery state reached at 1A/581FA248 
2015-10-30 14:28:04 LOG:  record with zero length at 1A/581FA248 
2015-10-30 14:28:04 LOG:  database system is ready to accept read only connections 
2015-10-30 14:28:05 LOG:  started streaming WAL from primary at 1A/58000000 on timeline 2 
2015-10-30 14:28:07 ERROR:  requested starting point 19/FE000000 on timeline 1 is not in this server's history 
2015-10-30 14:28:07 DETAIL:  This server's history forked from timeline 1 at 19/FDCF9BA0. 
2015-10-30 14:28:12 ERROR:  requested starting point 19/FE000000 on timeline 1 is not in this server's history 
2015-10-30 14:28:12 DETAIL:  This server's history forked from timeline 1 at 19/FDCF9BA0.

my recovery file looks like:

standby_mode = 'on'
primary_conninfo = 'host=serverB port=5432 user=replication-user'
restore_command = 'copy "Z:\\pg_xlog\\%f" "%p"'
archive_cleanup_command = '"C:\\Program Files\\PostgreSQL\\9.4\\bin\\pg_archivecleanup" "Z:\\pg_xlog" "%r"'
trigger_file = 'Z:\\trigger\\pgsql.trigger.sekasto021'
recovery_target_timeline = 'latest'

Googling I found almost the same question here but with no answers. Found a page from Michael Paquier who does describes what is happening to me (although he says it is a no issue from version 9.3). He says:

FATAL:  timeline 2 of the primary does not match recovery target timeline 1
This can only be solved by copying the WAL segments from the master node or using a WAL archive.

But sadly I don't know what he means by copying the wal segments from the master using wall archive.

Any help/guidance is welcomed. Thanks

Update: I posted this question on stackoverflow and was asked to put it here instead

Admin over 8 years

x-posted to stackoverflow.com/q/33437732/398670

Keyjote over 8 years

Thank you thank you thank you. My backup was incorrect. I was backing the B server correctly, but from the other server I confused the locations and was copying a backup from another replicated database and that is why the timelines did not match up.