軟件環境:Windows Server 2008 R2 sp1 SQL Server 2008 R2 sp2
架構:ServerA(主體)+ServerB(鏡像)+ServerSub(訂閱)+ServerDist(分發)
特別注意:ServerDist分發服務器為單機,故開啟了Sync_With_Backup選項。
關于Sync_With_Backup:
初衷:簡單來說,開啟該選項,使得單機分發庫在宕機時,利用最近的系統數據庫和用戶數據庫備份,還原到一臺新的服務器上,即可使復制繼續工作成為可能。
需要注意的是:在分發庫開啟該選項,意味著發布庫的日志只有在分發庫被備份后才會截斷。
查看該選項是否開啟:
SELECT name,is_sync_with_backup FROM sys.databases WHERE name LIKE 'distribution%'開啟該選項:
sp_replicationdboption 'distribution','sync with backup','true'
場景:
在生產環境中,分發庫備份為半小時一次。
將主體從ServerA切換至ServerB時,Logreader Agent報錯:
直觀上可以看到,Logreader 認為dist_backup_lsn和dist_last_sn不一致,導致其無法繼續工作。下面我們實際去看一下這兩者到底是否一致。
下面是Logreader、Distribution Agent工作的步驟列表,為了方便,我都貼出來:
Log Reader Agent (logread.exe) – Sequence of Steps
1.Calls sp_MSadd_LogReader_History to write to MSLogReader_History – “Starting Agent”
2.sp_MShelp_logreader_agentid – Obtain log reader agent specific information for that publication
3.sp_MShelp_profile – Obtains profile information for the Log Reader
4.MSadd_logreader_history to write MSlogreader_history – “Initializing”
5.sp_MSget_last_transaction – determine where the log reader agent left off reading the log.
6.Read the transaction log – sp_replcmds
7.Processes the returned commands from the sp_replcmds in batches by calling sp_MSadd_repl_commands
8.Marks this transaction as committed in distribution database by using sp_repldone procedure
9.Adjusts the identity range if necessary and if you are using Automatic Identity Range Management y calling sp_MSpub_adjust_identity
10.Calls sp_MSget_last_transaction to check the last transaction read and stored in MSReplication_transactions table
11.When all transactions are read, LogRead.exe calls sp_MSAdd_logreader_history and writes message to MSLogReader_history “1 transactions with 9 commands were delivered”
•Distribution Agent (distrib.exe) - Sequence of Steps
1.master.db.sp_msget_jobstate – get the status of the job (if it is already started)
2.sp_Msadd_distribution_history – MSDistribution_history – Starting agent
3.sp_MSSubscription_Status – whether subscription has expired or the snapshot is ready
4.sp_server_info- determines the collation
5.sp_mshelp_subscriber_info – retrieve subscriber information
6.sp_mshelp_subscription_agentid – determine the name of the distribution agent
7.sp_Msadd_distribution_history – Initializing message – Msrepl_distribution_history
8.sp_Msadd_distribution_history – Connecting to Subscriber - Msrepl_distribution_history
9.so_datatype_info – to determine the data type mapping necessary to create the tracking table necessary for the Distribution agent
10.sp_MScheck_subscribe on subscription database – verifies that SQL Server Agent account is in sysadmin and db_owner role in subscription database
11.sp_mscreate_sub_tables on subscriber in subscription database – creates MSSusbcription_agents and MSreplication_subscriptions tables
12.Sp_MSinit_Subscription_agent – updates the Subscription agent information on subscription database
13.Retrieves transaction_timestamp and subscription_guid to determine what Distribution agent has already replicated to the Subscriber. Transaction_timestamp correlates to xact_seqno column in MSReplication_transactions table in distribution database. All values large than the xact_seqno will be replicated
14.If we are doing initial sync, Distribution Agent calls sp_MSupdatelastsyncinfo which updates MSreplication_susbcriptions and MSSusbcription_agents table
15.Starts to retrieve all transactions and their corresponding commands from MSReplication_transactions and MSreplication_commands table where transaction_timestamp column in subscription database < xact_seqno column in MSreplication_transactions table. Applies the transaction using sp_MS_get_repl_commands procedure
16.Issues dynamic SQL to update the MSreplication_subscriptions table with the last delivered transaction ID
17.sp_MSDistribution_history to write the MSrepl_distribution_history table with status message “nn transaction(S) with nn command(s) were delivered”在步驟5中,我們得知,Logreader用存儲過程sp_Msget_last_transaction來定位發布庫日志中最后一個被寫入到分發庫中的事務,我們找到這個存儲過程的源碼:
CREATE PROCEDURE sys.sp_MSget_last_transaction
(
@publisher_id int = NULL,
@publisher_db sysname,
@publisher sysname = NULL,
@max_xact_seqno varbinary(16) = NULL output
,@for_truncate bit = 0
)
AS
begin
declare @publisher_database_id int
declare @max_xact_id varbinary(16)
declare @sync_bit int
declare @sync_with_backup bit
set nocount on
-- security check
-- only db_owner can execute this
if (is_member ('db_owner') != 1)
begin
raiserror(14260, 16, -1)
return (1)
end
SELECT @sync_bit = 32
if @publisher_id is NULL
select @publisher_id = srvid from master.dbo.sysservers where
UPPER(srvname) = UPPER(@publisher)
-- Get publisher database id.
SELECT @publisher_database_id = id from MSpublisher_databases where publisher_id = @publisher_id and
publisher_db = @publisher_db
if exists ( select * from master.dbo.sysdatabases where
name = db_name() and
category & @sync_bit = 0)
select @sync_with_backup = 0
else
select @sync_with_backup = 1
if @for_truncate = 0
begin
select top 1 @max_xact_id = rt.xact_id, @max_xact_seqno = rt.xact_seqno
from
MSrepl_transactions rt
where
rt.publisher_database_id = @publisher_database_id and
not xact_id = 0x0
order by xact_seqno desc
end
-- If (1) requesting truncate lsn (distbackuplsn), (2) sync with backup is set
-- query the values from MSrepl_backup_lsn
else if @sync_with_backup = 1
begin
-- Get the last backed up lsn if available.
select top 1 @max_xact_id = valid_xact_id, @max_xact_seqno = valid_xact_seqno
from
MSrepl_backup_lsns
where
publisher_database_id = @publisher_database_id
end
-- If @publisher is not null, we are calling this sp from sp_replrestart
-- Don't return result set.
if @publisher is null
select @max_xact_id, @max_xact_seqno, @publisher_database_id
-- Don't return any result when requsting a truncate lsn and
-- the database is not in 'sync with backup' mode, which signal the
-- distribution agent to use last dist lsn to call sp_repldone.
where not (@sync_with_backup = 0 and @for_truncate = 1)
end
存儲過程中,代碼
else if @sync_with_backup = 1
begin
-- Get the last backed up lsn if available.
select top 1 @max_xact_id = valid_xact_id, @max_xact_seqno = valid_xact_seqno
from
MSrepl_backup_lsns
where
publisher_database_id = @publisher_database_id
end
可見,報錯中的dist_backup_lsn取自表MSrepl_backup_lsns
我們來查詢一下該表:
select * from MSrepl_backup_lsns列valid_xact_id為最后備份的事務ID,valid_xact_seqno為該事務中的LSN,即valid_xact_seqno為報錯中顯示的dist_backup_lsn,(請勿與圖中貼出的錯誤進行對比,上圖僅為錯誤示例截圖)。
我們再去分發庫MSrepl_transactions表中找到當前已經被讀取到分發庫的最大事務的LSN
SELECT TOP 1 * from
[dbo].[MSrepl_transactions] as t
where t.publisher_database_id = 5
ORDER BY t.entry_time DESC表中給出的最大事務號是:0x00000026000001B60004,而上面dist_backup_lsn為:0x0002030D00003E7A0004,果然不一致。
但是我們細想,MSrepl_backup_lsns表中的記錄半小時才更新一次(跟隨分發庫備份頻率),而MSrepl_transactions中的記錄是實時變化的,他們兩者在比較繁忙的OLTP系統中是不太可能一致的!!但是為什么在切換的時候會去檢查一致不一致?而我如果不切換,只將Logreader重啟,它是不會去檢查的。
所以我想:這個微軟也許最后給我的答案就是他們常用的“By Design”。
先不說這個,我們來繼續找一下到底是哪個存儲過程在對比這兩者,先找到該錯誤信息所屬的MessageID:
SELECT * FROM sys.messages WHERE text LIKE '%分發數據庫中可能存在%'然后我們去系統表中尋找使用了這個message_id的系統存儲過程:
SELECT b.name,a.definition FROM sys.all_sql_modules a
INNER JOIN sys.all_objects b ON b.object_id = a.object_id
WHERE a.definition LIKE '%18846%'
Oh,shit! 線索中斷了,我們找不到一個系統存儲過程使用了該message_id,所以這個判斷應該是在系統擴展存儲過程中。
總結:說到現在,不知道各位看官是否看明白,當你在分發庫開啟了sync_with_backup選項,在進行鏡像切換時,如果出現:
分發數據庫中可能存在不一致的狀態: dist_backup_lsn {00020191:000022ff:0004},dist_last_lsn {00020191:00002435:0004}
。請執行"sp_repldone NULL, NULL, 0, 0, 1",然后執行sp_replflush。請重新初始化對發布的所有訂閱。
這樣的錯誤,請不要慌張,執行分發庫備份作業即可。
文章列表