Summary: Diameter Interop 2006 ------------------------------- Meeting Minute Taker: Victor Fajardo Day 1 (4/24/2006) ----------------- Test Pair #1 * Performed connectivity and peering. Covered Sec 3.1.1.1 of the test suite except for the last test case (negative test case for agent ...). Skipped election and disconnection test. Covered test was successful. Test Pair #2 * Spent the day on fixing connectivity issues. No test cases performed. Some issues was encountered with case sensitivity checks in the origin-host. Test Pair #3 * Performed connectivity and peering test. Covered Sec 3.1.1.1 to 3.1.1.4 except for election. Also performed basic Cx testing - exchanged LAR/LAA, UAR/UAA and RTR/RTA. Covered test was successful. Test Pair #4 * Performed connectivity and peering test. Covered Sec 3.1.1.1 to 3.1.1.3 including election and disconnection. Also performed basic Sh testing - exchanged UDR/UDA. Covered test was successful. Test Pair #5 * Performed connectivity and peering test. Covered Sec 3.1.1.1 only. Performed basic credit control testing. Covered test was successful. Test Pair #6 * Performed connectivity and peering test. Covered Sec 3.1.1.1 only. Attempted basic Sh testing. Covered test was successful. Test Pair #7 * Attempted TLS connectivity and had initial setup problems with regards to matching key formats between test pairs. Also had to deal with key format conversion as well. Eventually, a successful CER/CEA exchanged was achieved. Test Pair #8 * Performed connectivity and peering test. Covered Sec 3.1.1.1 to 3.1.1.3 except for election. Covered test was successful. Test Pair #9 * Performed connectivity and peering test. Attempted to cover Sec 3.1.1.1 but had issues. Unclear on wither issue was fixed immediately. Some contention on the FQDN definition should include IP address. Contention on whether the RFC should endorse IP address as well as domain name in an Identity type. * There was confusion on when the e-bit should be set in an answer message. More clarification should be done in when to set the e-bit in relation to the type of error, i.e. protocol error only ? Also, there is a consensus that the UKNONW_PEER error should not be considered a protocol error but rather a permanent failure and that permanent failures (5xxx) should be moved to (3xxx) to allow some significance when using the e-bit as a hint in processing failures. General Discussion: * There was an issue when relying on DNS to resolve and check origin-host, there was a suggestion that maybe the peer table should also support IP address in addition to host. * There was a question on why a port number is present in the DiameterIdentity type. As a comparison, the peer table does not have a port number * There was a question on whether the session state machine is mandatory. The consensus is that it is not and it is only one of many manifestation of an fsm. * There was a question on receipt malformed avp in the open state. In which instance should the message be dropped and how does this relate to the error code. There was some consensus that more clarification is needed in this case. The question has been submitted to the issue tracker. * There was a question on what should be done when the answer message itselfs contains an avp with an error. What would be the suggested behavior in this case since you cannot send an answer to an answer. * There was a question on the CER ABNF should contain multiple vendor ids. Consensus is that it make sense only to have one vendor id. Related question is the mixture of auth and acct id in the vendor specific id. How does this relate the app id in the header and which one should be used. Day 2 (4/25/2006) ----------------- Test Group #1 * Concentrated in base protocol routing (Sec 3.1.2). Topology started with three(3) nodes with a relay in between. * There were some problems encountered in DPR/DPA exchanges but have been resolved. * There was a question on interpretation of the result code and what the behavior should be in terms of considering a peer as suspect OR when the connection should be closed. Should one or both peer attempt reconnection. There was a comment that reconnection maybe attempted on the realization by both nodes that it should be attempted based on result code plus other conditions. * Some implementations have a "auto" mode where reconnection is attempted based on configuration. It follows the same model as NASes however the current spec does not give much guidance on when reconnection should be attempted. * Some implementation also uses origin-host as an additional method of loop detection which is beyond the spec but generally useful. * There is a question on whether any incoming request can substitute for waiting for three(3) DWR/DWA exchange before considering a a peer as non-suspect. Test Group #2 * Continued with base protocol testing (Sec 3.1.1.1) and basic testing of Sh interface * There was a question on vendor specific id. The terminology in the spec is not very clear if the instances of auth and acct, i.e. if one appears, does the terminology say the other can appear as well. The ABNF and terminology does not seem to match. There was a comment that the ABNF is not capable of describing such relationship across avp definition. There was a question on what happens if both appears and which one should be reflected in the header. There was a question on which application does actually advertise auth and acct app at the same time. Test Group #3 * Continued with base protocol testing, connectivity and peering using TLS. * There was an issue with some implementation assuming TLS only, i.e. without support for SSLv1 (-- further clarification --) * There was a re-iteration that there are deployment challenges in using TLS with regards to matching certificates and managing differing key stores. * Attempted election testing. There was some questions on whether an election results in two(2) complete CER/CEA exchange as oppose to a CER/CEA/CEA exchange. This has to do with the loosing peer sending an ELECTION lost in the case of two(2) complete CER/CEA exchanges. The consensus is that the second case is specified and correct and the first is not. There was also a question if the loosing peer should close the connection rather than just the winning peer in the case that the loosing peer does not trust the winning peer to close. Test Group #4 * Continued with base protocol testing using TLS. * Attempted to exchange CCR/CCA with several other partners. * Same problem encountered in group #3 with regards to deploying certificates. * There was a question on duplicate detection. The spec seems to indicate that the server keeps state in order reply properly to retransmitted message. This presents some scalability issues when there is a lot of history that needs to be stored especially in high traffic rates. There was a question on whether a solution should be part of the base protocol or application. Some have an opinion that its an application issue. Others have opinions that at the least the base protocol should provide more hints to assist in this scenario. One solution suggested in the CCR/CCA case (maybe a generic solution too) is the addition of a cookie AVP that we potentially remove state in any middle nodes and push the problem higher in the app level. This reduce the requirement for redundancy in the middle nodes. The class AVP maybe used for this purpose. Test Group #5 * Continued with base protocol testing especially election * Performed more extensive CC charging rather than just mere message exchanges. The test were successful. * There was some in-accurate naming with regards to the 'sub-session' in the CC test cases. Otherwise majority of the CC test cases was successfully performed. Test Group #6 * Continued with SSh interface testing. * There was some interpretation issues with regards to User-Data avp. CX/DX and Sh both has an ABNF AVP with "User-Data". The formats are the same but really different AVPs. The concern is that implementations with support both may get confused. The consensus is that it should not be a problem because the command codes are different. Maybe localized re-naming should be done to avoid the confusion. Test Group #7 * Continued with base protocol testing. * Continued with CCR/CCA exchanges and test cases. No major problems encountered. General discussion: * There was a question on how much checking should be done with regards to origin-host as a verification that your talking to the correct peer. The consensus seems to be to dropped the connection when the received origin-host is not what is expected. There was a question if the same scheme should be used for all other messages. If TLS is used, opinion is that you probably should. Also, maybe an attempt to verify the hostname against the certificate can also be done as well. * There was a question on also matching the realm name. Consensus is that this is not effective since the realm can be a few hops away. Day 3 (4/26/2006) ----------------- Test Group #1 * Continued with base protocol testing specifically relay testing. * There was a question regarding NULL bytes in a UTF8String. Some confusion on whether NULL should be allowed. There was some consensus that it should not be there based on the spec. However, the spec has conflicting references. * Successfully performed SCTP testing. * Performed Cx testing as well. Test Group #2 * Continued with base protocol testing. * Also performed more extensive CC testing other than encoding/decoding. The test proceeded successfully. Test Group #3 * Continued with charging application test. Successfully gone through at least 25 test cases. Found no problems. Skipped some of the cases because of overlap. Some of the cases require simulation to complete. * Additional test case was added. A delay was inserted prior to a CCA reply to test server behavior should be. The problem relates to having the server keep state to accommodate a delay in the CCA. Problems encountered with this test has been posted into the issue tracker. * There was more confusion with the e-bit as it relates to rcode in the 3xxx range. There is also language in Sec 7.1.3 where confusion can arise because of the textual grammar, i.e. to many 'only' in the first paragraph. Some people have used the e-bit as a means in which the relay could take further action on the message, i.e. further parsing of the message to investigate the failure. Some people also pointed out that the e-bit along with a BUSY result code can be used by the relay to indicate heavy load and that the sender use another relay. * As a summary of the e-bit issue, clarification is required on what hint should the e-bit provide along with some categorization of the error that is carried by the message. As an alternative, perhaps a more precise clarification of the e-bit usage should be introduced in the spec. Test Group #4 * Continued with base protocol testing concentrating on relay. The process was successful. The following issues encountered. The use of a proxy-info avp in the relay agent in order not to cache the h2h id. Some implementation uses this mechanism to avoid caching the id. However, some implementation is not reflecting the proxy-info. Consensus as well as spec states that the avp should be bounced back. * There was some contention between the definition of a relay and proxy. Maybe proxy-info is a is a proxy only functionality or that the concept of a relay is redundant since its a miniature proxy. At the least, it should be clear that the end-node should reflect the proxy-info but the language is confusing. * There was some contention if the p-bit is useful in the answer message. Consensus is that whatever is in the request should be reflected in the answer but the p-bit can be ignored in answer routing. Test Group #5 * Continued with base protocol testing concentrating on relay. * Bug fixing performed with regards to xml formats and schemas. * There was also a question regarding host-ip-address entry. Should it also be advertised beyond SCTP. In the TCP case, there was some opinion that it could also be used to corroborate the peer. However, there was a consensus that in TCP it has no semantic meaning for the IP address. Additionally, there is origin-host that can be used for checking the peer. Test Group #6 * Continued with base protocol testing and bug fixing. General discussion: * There was a question whether a zero(0) value for h2h id can be used. Consensus is that it is legal but maybe some clarity in the doc is required. * There was a question on whether the session-id should be a UTF8String. Consensus is that it is a recommendation because its a human readable string. Day 4 (4/26/2006) ----------------- Test Group #1 * Additional relay testing with three(3) relay nodes was performed for loop detection test. Everything was successful. * There was a question regarding an answer message containing an optional session id and how the receiver of the answer can determine the session it belongs to. One solution proposed is that you can h2h id to look up the corresponding request and determine session-id. Maybe a clarification is required or a solution recommended in the spec. Additionally, errors which has an e-bit set may have nothing to do with a session so it may not be relevant. * There was a question regarding the concept of a primary relay. Maybe its an implementation issue. Some uses a metric based ordering to determine next hops. Others use ordering in the routing table. * There was a question on whether destination realm takes precedence before destination host in attempting to route a message. Consensus is that the destination host gets checked first as described in the spec. Additionally, when using realm routing, one realm can point to several peers so destination host is used to fix the target within that realm. Test Group #2 * Performed more accounting test. * Encountered some padding issues. Some implementations did not pad to zero. * There was an issue regarding some implementation which waits for a connection before attempting to initiate a connection. In this case there will be no chance for election since one peer is reactive. Other implementation attempts a more pro-active connection scheme. General discussion: * Some implementation assign can assign a listening port per application. They can have multiple application which results in multiple listening ports.