Here today i try to explain in more general fassion, Transport failure detection depends on the deployment of the Network. I will explain this with the help of an Example.
Example
Suppose their are two nodes Node-1 and Node-2 , Peer connection is already established between them and they are exchanging messages on that connection. Now Node-1 sends a message MESSAGE-X to Node-2 and doesnot receive the response for the MESSAGE-X. So how long Node-1 should WAIT for the Respose (say 10ms) or should Node-1 retry (say YES) or How many time NOde-1 should retry (say 2-Times) all these things are deployment specific.
After satisfying all the deployment specific conditions Node-1 would check whether there is break in network connection or not. So for this Node-1 send DWR message to Node-2 and does not receive the DWA in specific period of time then it will retry the DWR for 3 time (include in the first DWR). If DWA is not received for any the DWR then it will take this situation as the Connection Failure. and Send the Other Messages to the Secondary Peer.
If Node-1 will receive the DWA with the Error does not mean that Connection Failure, because Node-1 has received the DWA on that Network for which Node-1 was checking whether the Transport-connection was there or not. DWA with error may contain Diameter_too_Busy or any other Error message is just to inform the Node-1 the status of Node-2.
Failover
The process of detecting the Transport connection failure with its peer and forwarding the all pending messages to the Secondary Peer Node (Alternate Node) is known as failover.
Avp Structure of DWR and DWA
Device-Watchdog-Request
<DWR> ::= < Diameter Header: 280, REQ >
{ Origin-Host }
{ Origin-Realm }
[ Origin-State-Id ]
Device-Watchdog-Answer
<DWA> ::= < Diameter Header: 280 >
{ Result-Code }
{ Origin-Host }
{ Origin-Realm }
[ Error-Message ]
* [ Failed-AVP ]
[ Original-State-Id ] = [ Origin-State-Id ]
Example
Suppose their are two nodes Node-1 and Node-2 , Peer connection is already established between them and they are exchanging messages on that connection. Now Node-1 sends a message MESSAGE-X to Node-2 and doesnot receive the response for the MESSAGE-X. So how long Node-1 should WAIT for the Respose (say 10ms) or should Node-1 retry (say YES) or How many time NOde-1 should retry (say 2-Times) all these things are deployment specific.
After satisfying all the deployment specific conditions Node-1 would check whether there is break in network connection or not. So for this Node-1 send DWR message to Node-2 and does not receive the DWA in specific period of time then it will retry the DWR for 3 time (include in the first DWR). If DWA is not received for any the DWR then it will take this situation as the Connection Failure. and Send the Other Messages to the Secondary Peer.
If Node-1 will receive the DWA with the Error does not mean that Connection Failure, because Node-1 has received the DWA on that Network for which Node-1 was checking whether the Transport-connection was there or not. DWA with error may contain Diameter_too_Busy or any other Error message is just to inform the Node-1 the status of Node-2.
Failover
The process of detecting the Transport connection failure with its peer and forwarding the all pending messages to the Secondary Peer Node (Alternate Node) is known as failover.
Avp Structure of DWR and DWA
Device-Watchdog-Request
<DWR> ::= < Diameter Header: 280, REQ >
{ Origin-Host }
{ Origin-Realm }
[ Origin-State-Id ]
Device-Watchdog-Answer
<DWA> ::= < Diameter Header: 280 >
{ Result-Code }
{ Origin-Host }
{ Origin-Realm }
[ Error-Message ]
* [ Failed-AVP ]
[ Original-State-Id ] = [ Origin-State-Id ]
Avp Description
Failed-AVP:- is a grouped avp provide the Debugging information in case of reject or Error during the processing such as AVP not supported etc.
Error-Message:- provides the Error in human readable form.
Original-State-Id:- is misprinted in RFC. It is basically Origin-State-Id.
Origin-State-Id :- Origin-State-Id is used to infer the session/connection between two nodes. Whenever there is change is state due break/disconnection in session or transport because of reboot for instance, Then rebooted node will increase the value so that other node become aware of the fact that state of peer is changed and all previous session are no more valid. Origin-State-Id is stored on non-volatile memory on all nodes.
Every time the session fails or the node is rebooted this Origin-State-Id is monotonically increased. Both nodes that are communicating stores or maps this id for mapping the Answer-Message with proper Request-Message.
Your Comments /Suggestions and Questions are always welcome.I would try to clarify doubts with best of my knowledge. So feel free to put Questions.
Failed-AVP:- is a grouped avp provide the Debugging information in case of reject or Error during the processing such as AVP not supported etc.
Error-Message:- provides the Error in human readable form.
Original-State-Id:- is misprinted in RFC. It is basically Origin-State-Id.
Origin-State-Id :- Origin-State-Id is used to infer the session/connection between two nodes. Whenever there is change is state due break/disconnection in session or transport because of reboot for instance, Then rebooted node will increase the value so that other node become aware of the fact that state of peer is changed and all previous session are no more valid. Origin-State-Id is stored on non-volatile memory on all nodes.
Every time the session fails or the node is rebooted this Origin-State-Id is monotonically increased. Both nodes that are communicating stores or maps this id for mapping the Answer-Message with proper Request-Message.
Your Comments /Suggestions and Questions are always welcome.I would try to clarify doubts with best of my knowledge. So feel free to put Questions.
Hi Vinay,
ReplyDeleteThanks for this article. I've a query though.
Please let me know when no Origin-State-Id is sent in the DWR, then what Origin-State-Id value should we expect in the DWA message?
I'm facing an issue, where invalid AVP bits of Origin-State-Id is received in DWA when NO Origin-State-Id is sent in the DWR. Error is shown below:-
#### <> <> <> <1322126427209>
180.20.100.90
origin.com
N/A
2001
Regards,
Rishi
Hi Rishi,
DeleteIf there is no Origin-State-Id in DWR then there should not be any Origin-State-Id in DWA.
Thanks for your query.
Happy to help you again.
Team-Diameter
Hi Vinay,
ReplyDeleteThank you for the article.
Let's take peer 1 configured to send a DWR every 30 seconds if no traffic is detected.
Peer 2 is configured the same way.
I'd like to verify something:
At t0 peer 1 sends DWR
at t0+30 peer2 sends DWR
at T0+60 peer1 sends DWR
Do you think the DWR is considered as a traffic and in this case peer1 when receiveing the DWR at T0+30 would wait another 3à to send the second DWR, that is at T0+60?
Thank you
Nicolas.
Hi Nicolas
DeleteDWR message exchange happens when there is no traffic between two nodes for a given period of time (i.e suppose we have configured 30 secs as DWR time then if there no message is exchange between considered nodes for 30 secs then DWR will be triggered.)
Hence in Load condition there will not be any case where message is not exchanged for such a long time (i.e. TIME configured for DWR generally 2-5 secs) Therefore DWR is not part of LOAD.
Under Load condition system will be busted with the message there fore DWR will not occur.
Thanks for your query.
Happy to help you again.
Team-Diameter
Hi,
DeleteI think Nicolas was asking about if DWR itself is considered a traffic message which could reset the other peer watchdog timer. If not, the DWR frecuency would be not influenced by the other peer and you would have something like this:
At t0 peer 1 sends DWR
at t0+10 peer2 sends DWR
at T0+30 peer1 sends DWR
at t0+40 peer2 sends DWR
...
Could you clarify ?
BRs
Hello,
DeleteBoth peers independently send DWR messages, in case that there is no traffic.
BR
Aleksandar
Hello Vinay,
ReplyDeleteThanks for the nice article. lets there is a x-request message and waiting for y-answer message. How long the device will wait for the answer, is it application specific or session specific(depends on particular session say IP-CAN session for Gx)?
Hi Moumita Barman,
DeleteIt should wait till it timed-out.
Operator shall mention a time (generally in milliseconds)at client node, that how long client should wait for reply from Server. If Client receives answer/reply from Server after a given time frame then it shall discard the answer because as soon as it timedout session id corresponding to Request message is no more valid.
Thanks for your query.
Happy to help you again.
Team-Diameter
Hi,
DeleteI've a situations where DWR/DWA is happening b/w diameter stack and peer, but peer is not responding other messages like CCR.
DWR is configured as 20sec, from stats I see DWR/DWA is happening but CCR is not responded. How this can happen as both DWR and CCR are TCP stream packets i.e how peer can respond DWR and not CCA ?
Thanks,
Achal
Another query:
DeleteDWR is configured as 20sec.
1st DWR is sent at T0sec
Client shall wait for DWA till T20sec ? where can in RFC I found DWR retransmission logic.
Hi i am kavin, its my first time to commenting anyplace, when i read this
ReplyDeletepost i thought i could also create comment due to this sensible paragraph.
My web page ... piano lessons
Hi Vinay,
ReplyDeleteWatchdog timer need to enable separately or DWR/DWA are triggered by default?
Hi Kamal,
DeleteIt is Diameter Stack dependent thing. It totally depends on stack vendor, how they provide it. Generally there is a provision to change default time-span value of DWR/DWA message.
Standard says two Nodes shall check whether Link is UP or Not.
Hi,
ReplyDeleteFor First DWR got DWA MESSAGE and after immediately getting DWA message client sending 2nd DWR again after that getting error as SCTP : ABORT : User Initiated Abort. issue will be at DWR timer vlaue or Association ?
Hi Bharath
DeleteDWR/DWA messages are used to check whether SCTP/TCP Link is UP or not Between two nodes (Specifically TCP Link because TCP has no mechanism of health check of link)
There is no association of DWR time and with SCTP Abort. If for a certain period of time (DWR Time) no message is exchanged then node shall send DWR to check whether LINK and Other node is up or not
Thanks for your query.
Happy to help you again.
Team-Diameter
Visited so many blogs, I find this a very unique and interesting, glad to be here -inventhistory
ReplyDeleteClothing
Communication
Entertainment
Electric
Financial
Food Preparation
Green Technology
Software
Warfare
Transportation
Instruments
Office
Hi,
ReplyDeleteWhat if the node-1 do not send de DWR?? it only send CER and recive CEA and that all.
I have a problem, the node-1 does not send the DWR, someone know what happen? Node-1 send de CER and recive de CEA, but thats all, the conections does not establish.
ReplyDeleteHi Cruz,
DeleteThis issue happens because of the one of the following reasons.
1) CEA doesn't come with DIAMETER_SUCCESS or No Common Application.
Kindly check CEA, or post the trace using tshark, following link shall help you.
http://diameter-protocol.blogspot.in/2013/04/capture-diameter-messages-without-wire.html
2)Any two peer node of NODE-1 or NODE-2 shall have same DIAMETER Identity.
In this case it shall toggle; basically it drops the earlier connection, now earlier connection retries then it drops new connection.
Kindly check DIAMETER Identity of each Node.
3) (Un-usual case) Receives any other message before the CEA; then some times goes in unknown state.
If you could share some more details then it would be better for whole world to solve it. Some times these issues are implementation specific.
Thanks for your query.
Happy to help you again.
Team-Diameter
question on transport failure detection in Diameter.
ReplyDeleteSay I have a Diameter peer connection established and my watchdog timer is 30seconds.
Now if I do a ifconfig down on that IP interface over which the peer connection is established.
How long will it take my local Diameter layer to detect that the IP interface has gone down? Will this be immediate or will it have to do the watchdog procedure
thanks,
Vijaya
Hi VV,
DeleteI consider following cases
1) LOAD Condition: Under the Load condition, Watchdog request does not come into the picture, As state in article Watchdog happens only when there is no message exchange between Peers for 30 seconds(Watchdog Time). But system is heavily loaded there-fore; In this case Transport connection would be immediate.
2) LEAN Hour Condition: If there is no message exchange between nodes for 30 seconds then failure would only be detected with DWR message. i.e. either DWR won't be initiated by STACK or DWR would timeout, Bcz DWA won't be received in expected time. SO then detection time would be 30secs + timeout sec.
Regards
Ajay
If Origin-State-Id is sent in CER with value 0, is it mandatory to send the Origin-State-Id set to value 0 in the CEA message?
ReplyDeleteHi Vijay,
DeleteOrigin-State-Id set to Zero shall be inferred as Origin-State-Id not present in request.
Hi Vinay, I've a couple of questions re: transport failure
ReplyDeleteLets say as per your example we have Node 1 and Node 2 connected and exchanging messages.
If I understand the RF3539 correctly the Tw timer is reset (with Jitter) for every Answer message. So as you say in the busy hour the DWR is never sent.
So lets say Node 1 has sent a CCR request to Node 2 and response-timeout (10ms in your example expires) Node 1 looks to see if it should retry ('Yes' & twice as per your example) so we would see two more attempts completed before Node 1 stops retrying, the request. Each retry would reset the Tw timer.
Couple of things I need some help with
- I'm not sure I understand why after 3 failures (as per local config) the DWR would be initiated? Assume this is because Tw is reset on Answers and not requests so although there may be more requests sent the lack of answers means that Tw will expire
- How does the Credit Control Tx timer overlay onto the base response-timeout i.e. if Tx was 5ms and we set the Credit Control application to Terminate no further attempts are made, does this override the base config?
- Lean hour vs Busy hour RFC 3539 suggests that in a busy hour it may take 2Tw to fail over I assume this is because only a DWR/DWA failure can be used to infeer the peer is down?
Kind regards Jim
Hi Jim
DeleteWe hope, that we are not deviating you from your point and correctly understood your point of view.
If DWA is not received of a DWR in given time (TIME-OUT time), then it is implies that there is a transport layer failure between two adjacent node called as PEERs.
In strict Implementation of RFC-6733
If CCA is not received doesn't imply the transport failure between peer. because there can be a case in which there is an intermediate node is present between CCR client and CCR server. For CCR client peer is Intermediate node.
following link can help you.
http://diameter-protocol.blogspot.in/2013/08/diameter-connection-establishment.html
Our team has also inserted an IMAGE on this blog explaining DWR
Thanks for your query.
Happy to help you again.
Team-Diameter
Many thanks much appreciated
DeleteKind regards
Jim
I want to understand how the DWR exchange is different from the SCTP HEARTBEAT mechanism? A diameter protocol using SCTP as transport layer will any how detect the transport failures using the HEARTBEAT messages exchanged between the two SCTP nodes, then why there is a need to exchage DWR/DWA messages still to detect transport failures?
ReplyDeleteYes Vijay
DeleteYou are right.
If we are using TCP then there no heartbeat mechanism on TCP. DIAMETER Node can use any transport. that is why DWR is there in DIAMETER implementation.
Thank you Ethan for the clarification. Does this mean that a Diameter node using SCTP as transport layer need/should not use DWR/DWA messages? May I know if this is documented anywhere in the RFC?
Delete@ Ethan
DeleteYour clarification is correct.
@ Vijay
DWR is proactive solution to detect transport failure. No Reference document telling SCTP should not implement it.
Being a server a NODE MUST support TCP and SCTP connection. Client can be TCP or SCTP.
Ok, got it. Thank you.
DeleteI have a query regarding the Failed-AVP AVP content to be encoded whenever a diameter node returns DIAMETER_MISSING_AVP error. RFC describes the following:
ReplyDelete7.1.5. Permanent Failures
DIAMETER_MISSING_AVP 5005
The request did not contain an AVP that is required by the Command
Code definition. If this value is sent in the Result-Code AVP, a
Failed-AVP AVP SHOULD be included in the message. The Failed-AVP
AVP MUST contain an example of the missing AVP complete with the
Vendor-Id if applicable. The value field of the missing AVP
should be of correct minimum length and contain zeroes.
7.5. Failed-AVP AVP
……
A Diameter message SHOULD contain one Failed-AVP AVP, containing the
entire AVP that could not be processed successfully. If the failure
reason is omission of a required AVP, an AVP with the missing AVP
code, the missing Vendor-Id, and a zero-filled payload of the minimum
required length for the omitted AVP will be added.
I am confused about the value to be encoded as defined in the above two sections(one section says as it should be filled with zeros and other section says it should be a zero-filled payload??).
May I know what is the expected result? Is it that the Value field be left empty or encode the value field with the value "00" which is one byte and append the padding bytes?
Failed-AVP is a group AVP.
DeleteIt is implied that Data field of Missing AVP shall be filled with ZERO up-to minimum length.
::= < AVP Header: 279 >
1* {Missing-AVP Header: - - - [Data]} Data shall be filled be ZERO
Thanks for your query.
Happy to help you again.
Team-Diameter
Ok, can you confirm if the following encoding is correct, for example for "Origin-Realm" AVP this would look like as below:
Delete+ Failed-AVP
::= < AVP Header: 279 >
::= Origin-Realm
AVP Code: 296
AVP Flags: 0x40
AVP Length: 8
---> Data field is empty
Wireshark/tshark is the tool to check format.
DeleteThanks for your query.
Happy to help you again.
Team-Diameter
Hello,
ReplyDeleteIn the example above, if there is an underlying transport link failure between Node-1 and Node-2, but Node-2 has not been seen as suspect Diameter peer by Node-1 because Tw has not expired between Node-1 and Node-2; also DWR/DWA process has not taken place to conclude that Node-2 is suspect and there is a transport link failure.
Questions:
1) I believe in Node-1 Tx timer keeps expiring and it will keep sending CCR to Node-2 setting T-bit at re-transmission each time, until the number of configurable re-transmission times is reached by Node-1?
2) If during this time window, Tw expires and Node-1 starts to send DWR towards Node-2; and Node-1 has not exhausted the number of its configurable re-transmission times for CCR; can CCR and DWR be sent by Node-1 towards Node-2 simultaneously?
Thanks.
Sam
This comment has been removed by the author.
ReplyDeleteHi all ,
ReplyDeletecan any one help on this
1)have you ever used seagull tool as a client for pumping Sy call flow
when i am using seagull as a client ,as per my requirement i need to put timeout .In that time DWR message is receiving from server to seagull client and seagull response back with DWA,after that subsequent DWR message is sending from server but seagull never sends DWA
is any one faced this problem .kindly provide the solution for this
2)actually when no traffic exchanged in between two nodes with in 30 min DWR and DWA will be initiated is this time configurable in both server and client ?
point 2 is applicable to 3GPP standards ,can we configure time for DWR and DWA both client and server side ?
plese correct me if i am wrong
Thanks in advance
Hi Team-Diameter,
ReplyDeleteI have two questions.
1. If already a connection is established to diameter server. and if we try to open second connection to diameter server using same client identity. How will server react?
2. If 'new Origin-State-Id > older Origin-State-Id' in CER, will the server clear any old socket with same diameter client (if any, and where server is using watchdog mechanism to figure out the connection state, but watchdog timer still has not expired).
Hi Devesh,
DeleteImplementation of our suggestion could be vary in different vendor's DIAMETER stack, here we would explain what RFC-6733 say,
1) If a DIAMETER server receives CER message again on established connection with same DIAMETER identity, then server would respond to second CER with CEA and establish the diameter connection on the basis of second CER, It shall disconnect First connection created by first CER. Because in this scenario Server would assume that client might have been rebooted and sending a fresh request to create DIAMETER connection with same DIAMETER IDENTITY. As we know CER is the first message exchanged to establish a DIAMETER connection.
Following things we have observed with different vendor stacks in context with above explanation, do share if any thing new you people have observed.
a) Stack would not allow to connect another node with same DIAMETER IDENTITY.
b) Diameter connection fluctuates between two clients, because second client breaks the connection of first by sending CER with same identity and first client retries for its broken connection shall break the connection created by second client.
2) working on it.
we hope our suggestions would help you,
Thanks for your query.
Happy to help you again.
Team-Diameter
Hi Devesh,
DeleteIf a Diameter entity receives, new Origin-State-Id higher than previous, it is an indication that all previous sessions don't exist now. Resources associated with previous sessions can be freed.
Thanks for your query.
Happy to help you again
Team-Diameter
This is the same scenario I’m facing with one of my node.
DeleteIn case of failover with node-1, it will try to establish the connection with node-2(CER) with the same host name, but node-2 is not probably accepting the connection & sessions are dropping.
The answer you posted based on RFC, can you please give the exact reference for that? (RFC/section?)
“If a DIAMETER server receives CER message again on established connection with same DIAMETER identity, then server would respond to second CER with CEA and establish the diameter connection on the basis of second CER, It shall disconnect First connection created by first CER. Because in this scenario Server would assume that client might have been rebooted and sending a fresh request to create DIAMETER connection with same DIAMETER IDENTITY. As we know CER is the first message exchanged to establish a DIAMETER connection.”
Hi,
DeleteAfter server restart, client is initiating CER . Can it use same Origin-State-Id or Client shall use incremental Origin state id.
br,
Neeraj Surana
Hi Team,
ReplyDeleteActually I am getting the "DIAMETER_LOGOUT" error.
Could you please anyone let me know what would be the reason.
Regards,
Harish
Hi Harish,
DeleteAs far as our understanding of scenario. you people are working on session based application, and client is logged out(sign-out) therefore sending client is send STR Session-Terminating-Request to server with reason in Termination-Cause AVP i.e User is logged out, indicating to server to close the session.
Thanks for your query.
Happy to help you again.
Team-Diameter
HI Team ,
ReplyDeleteI have a scenario where Node A sent Exchange capability request and Node B sent Exchange capability answer with diameter success result code .Now after 29.79 sec Node B initiates watchdog request and but Node A didnt send any response for the watchdog request.
As well as after 30.28 sec Node initiates the SCTP abort with error code user-initiated ABORT.
User Initiated Abort (12)
Cause of error
--------------
This error cause MAY be included in ABORT chunks which are send
because of an upper layer request. The upper layer can specify
an Upper Layer Abort Reason which is transported by SCTP
transparently and MAY be delivered to the upper layer protocol
at the peer.
now questions :)
1. Why node A sent SCTP-abort ( user-initiated ) ?Is it because the uppe layer ie diameter didnt received watchdog-request ,so diameter request sctp to initiate SCTP abort.
2. what can be the reason for diameter request SCTP to initiate SCTP abort ( is it transport layer failure dected by diameter ) ?
3. After successful exchange capability request and answer which node will initiates the watchdog request if there is no diameter traffic .
Thanks in advance .
Regards
Victor
HI Team,
DeleteI was really expecting a answer on this . It will be great help to me if i recive some comments .
Regards
Victor
Hi Victor
DeleteSorry for delayed response.
3) it is immaterial that which node first initiates DWR. DWR is used to check the status of Transport.
Here we see a strange thing, Why do you have DWR time set to so long 29.79 messages.
As we know DIAMETER is an application layer protocol that runs over Transport Layer protocol (TCP or SCTP) so we need to first check whether Transport is working or not.
So kindly tell us what all SCTP messages have been exchanged during 29.79 seconds.
Check whether SCTP heart beat message is exchanged or not
Kindly try to reduce DWR time to some milii-seconds.
Kindly revert
Thanks for your query.
Happy to help you again.
Team-Diameter
HI Team,
ReplyDeleteThanks for your reply . I agree with you that there is some problem with transport layer .
Yes there is sctp heartbeat message sent from Node A which Node B didnt respond to.
SCTP message exchanged between two nodes are
node A Node B
init-------------------------->
< ------------------------init_ack
cookie_echo--------------->
<------------------------cookie_ack
after this diameter establised
CER-------------------------------->
<----------------------------CEA
SCTP heartbeat ------------->
<-----------------------DWR
SCTP abort ------------------>
so from above as nodeB didnt responded to sctp heartbeat message that why Node a sends SCTP abort message .
but just one last question :) why node A didnt responded to DWR is it because of transport layer that is node A didnt recived the DWR message and same could be the reason for Node B didnt responded to heartbeat message .
Am i right ? kindly let me know your views too .
Thanks and regards
Victor
This is the one of the best and informatic blogspot i ever seen.thanks for such a nice and unique content with many tips , ideas and guide to other traveler.Thanks again.Car service in Fayetteville GA
ReplyDeleteWe appreciate you support.
DeleteThanks for your valuable time.
Team-Diameter
Hi,
ReplyDeleteSince DWR/DWA are not part of the load is it possible for a diameter peer to combine a DWA with another application message?
I do see such combined messages in the same packet.
Thank you
Hi Dave
DeleteKindly share the usecase for above. So that we could understand your point of view.
Thanks for your query.
Happy to help you again.
Team-Diameter
Thank you for your reply.
ReplyDeleteIt's a case where for example I see the ULR message and within the same packet I also see the DWR/A so it appears (in Wireshark) as follows:
DIAMETER 970 cmd=3GPP-Update-Location Answer(316) flags=-P-- appl=3GPP S6a/S6d(16777251) h2h=1326b498 e2e=1326b498 | cmd=Device-Watchdog Answer(280) flags=---- appl=Diameter Common Messages(0) h2h=2987f1 e2e=2987f1 |
Is this normal?
Hi Dave,
DeleteWe see above mentioned issue as filter issue. We feel you have not applied a detailed filter. Kindly use below.
tshark -R diameter -V | grep 'Frame\|Arrival Time:\|Internet Protocol Version\|Src Port:\|Diameter Protocol\|Request:\|Command Code:\|AVP:\|Result-Code'
http://diameter-protocol.blogspot.in/2013/04/capture-diameter-messages-without-wire.html
Thanks for your query.
Happy to help you again.
Team-Diameter
HI, What happens when you have a DRA in place? because NODE1 sends message to NODE2 through DRA, if NODE2 is down, NODE1 has no idea about NODE2?
ReplyDeleteHI, What happens when you have a DRA in place? because NODE1 sends message to NODE2 through DRA, if NODE2 is down, NODE1 has no idea about NODE2?
ReplyDeleteHi Naseem Rahman
DeleteDRA shall return the reply with result code set to Unable to deliver. Following link shall help you.
http://diameter-protocol.blogspot.in/2011/05/diameter-errors.html
Thanks for your query.
Happy to help you again.
Team-Diameter
As per RFC 3539 section 3.4.1:
ReplyDeleteSuppose there are 2 nodes - Node A and B. Now Node A has detected an inactivity (No request/response received upto Tw time) and it initiated DWR to Node B. Suppose Node A hasn't received anything in another Tw time (SO 2 Tw time has elapsed ). Now what should be the behaviour of Node A:
1. Node A should fail-over the traffic towards secondary node(if available)
2. Node A should fail-over the traffic and again initiate a DWR and won't break the transport connection (with primary node B)
3. Node A should fail-over the traffic and directly break the transport connection ( and then it will try re-connecting this node)
I presume that this behaviour is same for TCP and SCTP.
Hi Gaurav,
DeleteAny of the above mentioned behavior is possible. Behavior of Node-A totally depends on vendor's node configuration and deployment strategy.
Thanks for your query.
Happy to help you again.
Team-Diameter
Hi,
ReplyDeleteSuppose node A send DWR to node B. Suppose Node A hasn't received anything in another Tw time and set pending flag.
After that, node A send another DWR and hasn't receved anything in another Tw time but receved CCR message continuously in Twtime.
In this case according to RFC implementation, Tw will be continuously reseted but pending flag still set. and will be failover sometime after.
I think pending flag should be reset when receiving non-dwa messages. What do you think?
Hi s.c Yang,
DeleteIf NodeA is receiving the CCR before Tw time expire then NodeA should not initiate next DWR message, rather work to process CCR.
Basic idea of DWR is check whether Transport connection between node is up or not. if any node is continuously receiving messages from other node then DWR doesn't come in picture.
Thanks for your query.
Happy to help you again
Team-Diameter
Yes, i know that.
ReplyDeleteBut i supposed first dwr failed and second dwr sent but not respond dwa but still receiving other msg at that time.
Its the suppose of certain case.
I am curious this rfc implementation logic.
I am working in telecom company.
Hi S.C.Yang,
DeleteRFC-6733 gives us just constraints, implementation is deployment specific to a vendor.
Here in above described situation, Kindly do following to reach to cause of mis-behaviour.
1) Kindly capture trace and share
2) How are you so sure DWR from node-A is reaching to Node-b.
3) Kindly share logs of both nodes.
If above is hypothetical situation then there can be multiple ways to handle it.
1) Process CCR straight way and send response, as you are saying CCR is received continuously even before Tw time expire.
Receiving other message continuously before Tw time means Transport is up. No DWR in Load case.
Thanks for your query.
Happy to help you again
Team-Diameter
Really Thank you for the fast response.
ReplyDeleteI didn't mean argue just curious and understand vender specific implementation. Just wonder RFC failover algorism.
Here is hypothetical situation.
This is just hypothetical situation for verifying failover algorism based on RFC.(not real situation)
if two DWR fail then failover occurred
Node-1 Node-2
----------------------------
pending flag=0
(no load on link)
DWR_1 ------------->
<---x(fail)--- DWA_1
pending flag=1
(Sudden load applied ex) continous CCR or CCA incoming)
(continuous timer reset so no second DWR_2 will be triggered
but pending flag still set to 1)
(( 1 month later ))
pending flag=1
(no load on link)
DWR_2 ------------->
<---x(fail)--- DWA_2
pending flag=2
(failover occured because pending flag setted already
1 month earlyer)
Problem is, just one DWR fail cause failover situation because of the setted pending flag 1month earlyer.
Hello S.C. Yang,
Deleteyour hypothetical situation is good, but again you missed the basic idea of DWR/DWA.
When DWR is missed first time, it should be re-transmitted weather load arrives or not.
Other logic: When you missed first DWR then on sudden load you must reset the flag as your transport has been established successfully and messages are being exchanged.
Hopefully it helped you to understand the situation.
Very good explaination based on Good understanding of RFC. Thank you :)
DeleteI am able to understand the significance of Origin-State-Id in CER. But I am not able to understand how it will be handled, if it is sent in DWR. What is the significance of sending the same in DWR (or in fact any other application message like CCR etc).
ReplyDeleteThe Origin-State-Id AVP (AVP Code 278), of type Unsigned32, is a monotonically increasing value that is advanced whenever a Diameter entity restarts with loss of previous state, for example, upon reboot. Origin-State-Id MAY be included in any Diameter message, including CER.
DeleteUse of Origin-State-Id:
1. To allows other Diameter entities to infer that sessions associated with a lower Origin-State-Id are no longer active. If an access device does not intend for such inferences to be made, it MUST either not include Origin-State-Id in any message or set its value to 0.
2. An access device/client can also include the Origin-State-Id in request messages other than the CER if there are relays or proxies in between the access device and the server.
This comment has been removed by the author.
ReplyDeleteHello ,
ReplyDeleteMy query in standard protocol site , its been mentioned in rfc6733 page 66 that when transport detection is detected that DWR message MUST NOT be sent to alternate peer ? could you please elaborate this.
Hello Team-Diameter .
ReplyDeleteI have a simple question .
can diameter server such a credit control server send DWR to the client ?
Client team said they node cannot accept DWR from server , is that true ?
I looked for an answer in the RFCs documents, but I did not find any reference for that .
Best regards
Golan
Hi,
ReplyDeleteI am getting error at the time of exchange of diameter messages. I'm acting as a diameter server. Proper CER/CEA exchange happens, so does DWR/DWA but in between I am getting the error.. connection reset by peer. What can be the possible reason behind this
Nice article.
ReplyDeleteHow the detection will happened transport layer failure, in case of DWR timeout?
Can you pls explain in DPR the 3 cause is configurable in DRA/DSC. what are the possible reasons with example
ReplyDelete?
Hi Pankaj Pandey,
DeleteKindly explain your requirement in detail.
3-Causes of DPR are explained in following link.
https://diameter-protocol.blogspot.com/2011/05/diameter-peer-connection-and.html
Cause DONOT_WANT_TO_TALK : can be sent to connection if agreement between any two operators has end with respect to policies or validity. It totally depends on operator's requirement.
Thanks for your query.
Happy to help you again.
Team-Diameter
Hi,
ReplyDeleteConsider Client-A has two links with Client-B with different realm. If one of the link is disconnected with Client-B then Client-A sends the same request/update message with same session-id using redundant Link or realm.
Question:
1. Client-B will detect it as new request as HOP-by-HOP identifier will be changed for another realm but session-id will be same?
2. How client-B will identify one request/update as a unique request , it is using only diameter session-id or a combination of session-id and HOP-by-HOP identifier?
What will be the issue if New origin-state-id is lower than current one ?
ReplyDeleteThanks,
Robin
Informative blog....!!
ReplyDeleteTransportation Tracking Software