These are unedited transcripts and may contain errors.
Notice: Use of undefined constant steno - assumed 'steno' in /var/www/html/ripe-60/steno-transcripts.php on line 24
The opening plenary session commenced on Monday, 3rd of May, 2010, at 2 p.m. as follows:
ROB BLOKZIJL: Good afternoon. Welcome to Prague, to the 60th RIPE meeting. I am Rob Blokzijl and I am the chairman of RIPE. I see people standing in the back. You couldn't afford a seat or...
Fine. So, without further ado, let us start this 60th RIPE meeting in Prague, it's our third meeting in Prague, the first one was in Prague, the capital of the Czech, and the second was in the capital of the Czech Republic and we are back in the capital of the Czech Republic the third time. Also we seem to be breaking records in Prague, because the meeting we had, RIPE 40, in 2001, we had 350 attendees, and at this, the 60th RIPE meeting, nine years later, we have, already, more than 450, close to 500 people registered, about 300 people really checked in really before the start of the meeting, I think this is very impressive.
I have a couple of short announcements and then our host this week is CZ.NIC, the domain administrator for the Czech top level domain and will say a few words, as well. I guess you all found out where lunches take place, it is divided over two spaces, I think most of you went straight across the hall in the rooms there but also downstairs in the restaurant we serve lunch, so that might be something to keep in mind tomorrow when it is getting crowded upstairs.
ROSIE.ripe.net is up in the air again for all information regarding this RIPE meeting, like the agenda updates to the agenda presentations, information about social events and so on. If you need any technical support, like problems with your laptops, your connectivity, or whatever, the RIPE NCC technical crew is present again and you can recognise them by the blue badges, this is white, I am not technical. And across the hall, deep down in the corridor, there is ?? are the technical room.
Terminal room, apart from having lots of tables and chairs and power and switches, also has a couple of windows, seven machines with Microsoft office and there is also a printer there, which you can use.
Right. Those were the technical details. There is one thing I want to say about a specific programme item; in the last year, or so, our friends from the ITU, the International Telecommunications Union, have been talking about the role they could play in allocating IPv6 address space. They have some weird ideas there and for quite a while we have been observing that and we have been going to some of their meetings, it's not always possible because as opposed to this community, they tend to meet behind closed doors, and our good friends in APNIC and in AfriNIC, at their recent meetings, discussed this ITU plans and came with community statements, so our attention to do similar thing here. I know it is short day and you may say "I haven't seen anything," that is all right, because ?? but, later this week, maybe even by the end of today, you will find, in your mailbox, a draft community statement regarding ITU IPv6 and what we think our response to that should be.
It will be sent to the general RIPE list and it will be sent to the Cooperation Working Group list, because it's on the agenda of the Cooperation Working Group. So you can read it beforehand, if you want to discuss it go to the Cooperation Working Group session on ?? I don't know when ?? Wednesday, and then at the closing parallel session on Friday, we will see whether at least in this room, and contributions from the mailing list, we have consensus that we can publish this statement. It's a very short statement, so don't be afraid. Roland promised me it would be short and to the point.
So, keep an eye on your mailbox.
Right. Now, it's my great pleasure to introduce two of our friends from CZ.NIC who are our hosts in this beautiful city, very nice environment and probably they are working on the weather improvement system. That should not bother you because you are sitting in a meeting for the whole day, anyway.
So without further ado, tomorrow Tomas, can I hand the microphone to you and I think your colleague and our friend Ondrej Filip also wanted to say a few words, if not you have to speak on behalf of ?? oh, he is here.
SPEAKER: Thank you very much for the floor for me. So, good afternoon again, ladies and gentlemen, my name is Tomas Marsalek and I am the chairman of.cz, which is the host of this meeting. I am very happy that of you more than 400 people, may, in this room or 300, are at Prague so finally bypass the ? problem but it's Monday. I cannot guarantee you the good weather, sunshine, for the whole week like this, you need to stay and listen and speakers and contributors on the meeting. I have only information that the good weather will start at Friday afternoon. So as Rob said, this is the third RIPE meeting at Prague. We also at .cz Internet Exchange organise a couple of Internet related meeting like ITF, we plan to organise the ICANN meeting also and maybe you will try to organise some small network in Europe group meeting.
So, I wish ?? I wish that you will have very good meeting, sometimes there will be good slot of good weather for some walking around. Also there will be social evenings and you will enjoy and you could enjoy alcoholic and non?alcoholic Czech beer, which is probably the best one. We are very good at ice hockey but sometimes we are loser but sometimes we are good at footballer but sometimes we are loser, we are good at Internet but still the best at the beer. The Czech Republic is still but it's not because the kids are drinking two beers a day, it's because you foreigners help us. So please help us to keep this position number one, but after the session, not during the session.
So OK, so that is all what I ?? it's my ?? end of my speech, so enjoy the meeting, the rest of the meeting. No ? I hope maybe there will be two hours sunshine for walking around. Thank you.
ROB BLOKZIJL: Ondrej Filip also from CZ.NIC.
ONDREJ FILIP: Thank you very much, I am sorry, can I not speak without slides so I prepared some of them. Welcome again. I will asked to somehow introduce the company but I honestly when I look at the numbers graphs, all of them look like that so you know we are great company so I would rather tell you what you can get from us on this ?? on this meeting room, what you can get from us now so basically, on your left?hand side, if you turn around, there is a books and you can get their every day some new gifts or you have a reason to go there and if you want to know something else, just ask, don't be afraid.
Also, if you are an old timer and still ?? you can still write by old hand could you send some post cards, you will get the post cards there so use them and you don't have to go to a Post Office, just left it there and surprisingly they will not scan it and go by e?mail they will go really to a Post Office on behalf of you. And if you are really computer old timer we have prepared a game for you, it's nibbles, you can play using ISP protocol. The game is on?line, through the whole l forum and every day will get some special gift and we will also honour a winner of the who will day and he will get a very nice ? he can play more and more time. That is basically it. And let me, you know, change little bit topic.
I was very, very close to the elections in this country and don't be afraid, I don't want to speak about politics, but, you know, the streets are full of some bill boards, some posters that are politicians and they are usual in very simple format, it's a smiling face and there is some very short promise like we will turn this country to paradise, the Czech way how to deal with any problem is usually to make jokes, make fun of it, so suddenly, after the campaign started, some new bill boards appeared not just on the Internet but surprisingly also on the streets, so I picked up three of them that I think that are relevant to this meeting. And this is first one. What is the relevance of RIPE meeting? Honestly we cannot promise we will show you Mr. Presley, that will be hard. We worked to prepare good evenings like culture evenings between the meetings and not just us but all the sponsors and partners of the event and really I hope you will have fun. So I hope you will enjoy it. With the culture and all the culture evening, there is another bill board that is somehow related to culture. I know that all of you are not somehow in touch with that because you all wake up at 6:00, you go for jogging, you just eat biofood and don't drink alcohol, but there are some people and I will not name them, but there are some people not just looking at culture but consuming alcoholic beverages during that. If you know such people tell them this is a country there is some work in progress on that issue.
And this is the last one. It's not abolish completely but we stop it somehow, we try to communicate responsible persons and it's hard because the Iceland I can do not understand IPv4 or IPv6, but we get somehow communication with us and told us you have two option you will have either ash clouds above Prague or rain clouds so. That is the answer to the weather, we are sorry but we rather decided to have that wet options. So again, welcome to Prague and to the Czech Republic and enjoy the meeting.
ROB BLOKZIJL: Right. This was by way of introduction, I will now hand over the chair of this session to Jim reed, who will continue with the programme of the plenary sessions.
Jim Reid: Thank you very much, welcome everybody. I have the job of co?chairing the first ?? or chairing this first session of the plenary and we have got a fairly heavy DNS flavour to the first session. As some of you will realise, we are living in interesting times in the DNS because the route zone is about to be signed and there will be some interesting presentations on that coming up and also the DNS Working Group later on this week. However first up we have Matt Larson who is examining to talk about the roll out.
MATT LARSON: Thank you Jim, Hello everyone. I am pleased to report this is not going to be the same presentation that you have probably all seen about half a dozen times for the past year. This has some new stuff in it.
I didn't that I would be an applause line but maybe that means we really do need the material. Do I want to have a quick recap, though: Just to let everyone know, this is and continues to be it, a cooperation between Verisign and ICANN at the behest and with the cooperation of and support of the USD OC. And another quick recap with the technical parameters here: The route zone is going to be signed with 248 Bit RSA key for the KSK and 1024 B for the zone sending key, we are using SHOT 56 are RSA for, the ?? the ZSK and KSK are split, Verisign will be managing the zone signing key and ICANN will be managing the key signing key and we have developed protocol to communicate between the two organisation toss get some zoning keys to ICANN to be signed and of course responding procedures around all that. The deployment is incremental which I think everybody is aware of. I heard some buzz in the hallways, I think everybody is acutely aware that in particular, this week, we have a big milestone; as everybody I think is aware, we have been rolling out incrementally something we are calling the deliberately unvalidatable route zone which is assigned route but with key material hidden, it can't be used it's guaranteed not to work.
In terms of the time?line for the deployment, we started in late January and as of this moment, 12 of the route server letters are serving the signed root, only J?Root remains unsigned. That will change this Wednesday the 5th at I believe 1,700 UTC, there is a two hour window starting then, so I will be home by then, I don't know about the rest of you, so I will, that was supposed to be a laugh line actually.
So anyway, that is, that is what we have going on so it's a big week but in preparation for this all kidding aside we do feel like there are not going to be surprises this week based on the data collection with upstanding cooperation with all the route operators. We have good data as to what has happened and Dwayne has done understanding analysis that have so I am going to turn it over to him to talk about that and what we have seen and to explain why we feel good about what is coming and why we are not particularly worried about this Wednesday.
Dwayne: So I know he said all kidding aside aside, but this is first slide is from a thread that I ran into the other day talking about Wednesday when J?ward gets signed and the message says "I break the DNS all the time, one little mistakes and no Internets, they are so going to break the root serveers.
However the follow?up is that this was one person says I happen to know one of the guys that is working on that very project at ICANN, I can assure it's in very good hands and I agree.
So, this short table shows the 66 dates that route server switched over and we have collected a lot data. For each event collected about 48 of P cap files and the following graphs are going to just show some of those results.
The first one I want to show is from the most recent DURZ event which is the time when five root servers switched over and after which J?Root was the only one that was not serving the signed zone so you can see the ?? vertical bars represent the maintenance windows for those servers that did switch over and in the period after that we see no real significant change in the load to J?Root which we think is a very good thing. This picture shows the same data but for all of the root servers, so as you can see, there is quite a spread but for the most part we see a very predictable pattern among all the root servers and also like to point that out this was a very sort of significant event for us because for the first time, we have simultaneous data from all 13 root servers and we are excited to have that much data to analyse and correlate and I would thick to thank all the root servers for making this happen, a big effort on their part.
This picture shows the same kind of thing, all the collection events starting before any of them switched over and ending with the most recent one. So, when we first started looking at this we will a little bit concerned there was of a shrinking trend where all of the roots were sort of becoming more equal in terms of their query load but as things have progressed further we can see that the query rates are increasing system wide. We don't have any reason to believe that is because of the DURZ; as far as we know, this is just sort of a natural trend due to, I don't know, seasonal factors or something like that or just load increases in general.
Perhaps more interesting is to look at the effect that the DURZ has had on TCP. So this picture shows again all of the 13 servers from the last event and their TCP query rates. It's a little bit messy here, and if I had a pointer I would try to point some things out. If you look closely you can see increases, for example, in the data for C?Root right at about 1,500 UTC on the 14th, it jumps up from 0 to 80 per second. F?Root has very significant jump, it's the yellow one at the top. So here is C?Root, here is F. Some of the other ones that converted at this time didn't make such significant jumps in their TCP rate. H stayed way down here and B ? ? one gets really fuzzy this is actually G?Root and so I kind of made another picture which zoomed in on this a little bit and the fuzziness seems to be because G?Root has this set of clients that are making a set of queries about every ten minutes that end up being TCP, this, you know, typical route server junk queries that are annex domain but for some reason it's having on a ten minute period and when looked at in this picture it comes out really fuzzy.
This is the TCP query rate for all of the events and you can see one particular root switched over here is L at the beginning and next is A and so on. These numbers are about what is predicted in terms of load, you know, about 100 queries per second is what we were predicting that these root servers would experience. Again you see an increasing trend here over time. However, when we show it ?? when we show the TCP as a percentage of the UDP query rate it's pretty even. There is into the real increase in that percentage, over time.
And the last slide I wanted to show you is, this shows only UDP priming queries received at the root servers. First of all address this big hump in the middle which ended up being from a single instance of a Cisco DNS product where they had been advised to set max cache TTL to 0 and that caused this load of priming queries to A?root, but for the most part the priming query rate did not change. It's very steady down there at the bottom.
That is it for me and I am going to hand it back to mat to do the non?picture slides.
MATT LARSON: So then, just a little bit to wrap up with some more details that aren't so statistics heavy. We know that TLD operators have been expressing interest when can they put the S records in the root and the plan is that is going to be handled with a revision to the existing root zone, e?mail base change template that you are familiar with, there will be a few more fields added for DNS records and the ?? we anticipate being able to accept those in late May or early June, so in other words, ahead of the intended July 1st date for the root zone going live, the intent is that before that TLDs will have the opportunity get DNS records in the root.
We have two really important documents associated with the deployment, and they are the DNSSEC practice statements, Verisign has one that describes how we treat the zone signing keys and ICANN has one that describes how they treat the key signing keys but it's more than just the treatment of the keys; it describes policy and practice for all aspects of this project, and we find that a lot of questions can be answered by reading these documents. And there is a revision coming out, it will be out shortly. They are already on the project website which I have a pointer to later on in the presentation. But I would encourage everyone to take a look at these documents because they are really the meat of the description of the project.
That being said, there is a great deal of other documentation as well at the root ? website starting from the original requirements we received to begin the project, there is a high level technical architecture that explains the whole thick and I don't need to read the list of all the documents for you and there is more coming so we have tried very hard to be open and transparent about everything we are doing and all that documentation is on the website, there is other stuff on the website as well, status updates, there is an FAQ, this presentation as well as all the other presentations we have given, are on the website as well, so we would encourage to you look at that.
Jim, do we have a time for a little bit of Q and A. Before we start that, let me say we have this e?mail address to communicate with the design team which is these people at ICANN and Verisign, so please feel free to drop us a line with any comments, questions, feedback, suggestions and I guess now we have a little bit of time for questions.
Jim: I have a question. Thanks. Jim Reid, speaking in a personal capacity. I think question is primarily for Dwayne, I notice you had some statics that showed small but probably insignificant spikes in TCP query rates. Were you able to characterise them as being done to specific host or application that might be generating these things. The concern I have is maybe there is a slight problem out here that might be become more apparent when everybody is switched over to the signed root zone.
SPEAKER: Yes, unfortunately, I didn't look at those in time for this, but certainly that is something that we could do but we have all that data, still, so we could go back and look at that and I will try to do that and if I find something terrible, I will let you know.
Jim: No other questions. OK, thank you Dwayne and Matt.
Our next speaker is Rick Lamb from ICANN who is going to tell us about the plans for sharing the root zone keys and the trusted community representative initiative, which is going to be the plan for actually having individuals share those keys. Rick, over to you.
RICK LAMB: Hi, I work at ICANN, I have been pursuing this DNSSEC thing for a couple of years so it's supposed to be something really short and easy but now it's my life. So I am here to discuss how we go about managing the root key and I think this is is a wonderful opportunity directly involve the Internet community. Basic goal of course is to improve the overall confidence and get people to use DNSSEC. Anybody can generate a signed root, could I see here and say trust me, mine is better that everyone els. There is no way to trust anyone unless we have a wide public participation so what we have decided to do here is to look for direct participation and these are directly off of a description document that we have on the net, direct participation by recognised members of the DNS technical community. There are a number of you here, actually most of you are here, I sometimes refer to the very kindly as the ?? the RIPE Mafia in DNSSEC, if it wasn't for you guys this wouldn't have happened.
Next slide. So we call these trusted community representative positions. We have, for lack of a better term, 14 Crypto Officers, there are two sites where all the KSK will be managed, there is one on the east coast and one on the west coast, just backup. So we will have seven for each side. We also have these seven recovery key shareholder and this is the case where something really bad happens and we lose everything, so we have a contingency for that. We also have a few backup Crypto Officers and Recovery Key Share Holders that will seek out because there may be travel conditions, situations where things don't work out, and but there is a very fair amount of time to ?? for people to set up travel for these things. This happens on a ?? new keys are signed on a quarterly basis so there is 90 days, there should be plenty of time, but just in case.
And this last bullet, this is still in a provisional state. So we are trying this out. It's a wonderful idea but we need to see how this works, if the logistics works OK for the people to make it for these sites. If keys can be generated successfully and there are not any problems, so this first one that we are going to do is very important to us. OK. So, there is something called a key ceremony, a term of art used in certificate authorities. It's a carefully transcripted for transparenciy event. It sounds where people get dressed up, it's not. Just a term of art. The scripts are created way beforehand and various people can vet these things, we have many key ceremony rehearsals can ICANN staff to iron the bugs out of this. So the point of these key ceremonies two things: We are going to generate a root key. That is one. We want direct community involvement in that. We also every quarter, like I said, will be signing a new ZSK that comes from Verisign.
So, for the Crypto Officers in this case, I mean I have described the facilities a little bit here, the facilities are, we have learned a lot with our joint effort with Verisign, they have been very open about sharing some of the detailed knowledge that they have actually accumulated with their certification authority so we have learned a lot from them and so we have various components that are necessary to enable the cryptographic devices, in this case the box that holds the key with various other components, other smart cards in this case. So the Crypto Officer actually in this case, holds a physical key to a safe deposit box and many, inside a safe inside one of the ICANN facilities. So, this prohibits ICANN from going rogue in some sense F for some reason you don't love my wonderful company organisation that I work for, this is is a way to keep us honest, because in order for us to do anything with this crypto box, we need those cards. Now it's not up here but I will state that the reason these are physical keys that people hold and the cards are still at the facilities; once people start relying on DNSSEC we cannot let the zone go unsigned. I mean at some point that will be equivalent to somebody not being able to get to a website and so for disaster recovery situations we need to be able to access, it's a very bad condition that we never want to hit but if for some reason the signature lifetime on the keys are about to expire and for some reason over the course of 90 days we have not been able to get anybody to show up we have to have a mechanism to override and make this happen, another set of ceremonies bring people back in and rekey locks and do all that.
This is the role of the recovery key shareholder, the keys insides are themselves encrypted, encrypt everything inside the boxes, so in the situation where we may lose some of our ?? both of our sites we need to be able to recreate or reconstitute this key. Sp that involves a couple of pieces. It involves the second bullet there which is a backup, it will be a smart card with the root key in encrypted form on that card, and then that is encrypted with a key that is made up of pieces of the Recovery Key Share Holders hold. They will something in camper?evident bags that look like this. You may have seen these things but they are often used by banks and stuff, but they will have these things in tamper?evident bags they will have to bring to whatever site we need to reconstitute the private key inside one of these crypto boxes. So last bullet, there is a chance, one is in LA, one of these facilities is right next to LAX, there could be an earthquake, it could go away; the other one is on the east coast, a little bill outside of Washington DC, some other man?made event may happen there, so there is this case where we need a deep backup.
So here are the requirements for these trusted community representatives. The Crypto Officers have to travel, be able to get to the US four times a year. Of course they shouldn't lose their key, it's just a safety deposit box key, nothing special about it. We give them two each, so I think that is good, but still, they cant lose a key. We have the recovery key shareholder. They are people that would be there when we initialise their crypto boxes, first time initialise the the box it's used to encrypt everything inside so hopefully we never need them. But if we need them, we are going to need them bad. So that is how we recreate everything. We will ask them to do an annual inventory, having the bag with the smart card next to a newspaper that, they take a picture of, but something like that, I know that sounds kind of strange but that is actually a method I have seen in a couple of places.
The final requirement there is if you already play in the space we like to try to expand more, greater involvement of people if you are already somewhere involved, if you work for ICANN, Verisign or department of commerce, you already have a bite of this apple, so, no.
The criteria for these people is they must be represented people of the DNS community, there is many of those here, and geographically distributed, we want this to be as fair as possible. This is part of the whole ICANN mantra, broad participation and so, these are the two primary criteria that we will use to choose who will be involved in this.
The statement of interest period is over, it started ?? it was relatively short because we have been on a very tight time frame here and we need to do background checks again. The statement of interest began 12th April and ended on 23rd of April, and wow a lot of interest and from the RIPE community in particular, we got 61 candidates like that and very few stinkers.
Here is a geographical distribution of what we have so far or what we finally caught with our net. I find this to be pretty reasonable. You know, some of the countries ?? some of the African countries who have difficulty getting people to this key ceremony and they stepped up to the plate, a number of people helped up to the plate to support them. We have some from APNIC there, you can see RIPE with 20, ARIN with 20 and LACNIC with five, so again the five RIR regions I think this is pretty representative.
We expect final selection after we do background checks, I mean even if we know a lot of these people we do have to do our due diligence and so we are going to do background checks on the final selections. Hopefully, we will get this done before late May but late May is the drop dead timer this because if we are going to have the key ceremony comfortably before July 1, we need to provide people with enough time to make travel arrangements for this. We will publish the selections, all candidate names and nagsalities in the name of transparency. If this key ceremony works out, in maid June, June 15th or 16th, on the east coast, in DC, it's actually a little call called Pepper Virginia, get restaurants, if it's successful this will no longer be provisional, one of the rare opportunities on the Internet where we can have direct public participation in something, not just witnesses, not just participating in the process, no, this is something that holds piece that is we need in order to perform this task and hope fly by doing this, we will build trust in the final result. In fact one of the things we generate during the key generation, I have a sheet of the paper that has the hash for the key that was generated, you see it was generated, you take?home home to and show to people this was the key generated, I saw it, there was no cards up their sleeves, there was a camera running, all this stuff is filmed and will be published, it's part of the audit record as well, this process also has a third party auditor involved, one of the big four. There is a standard audit process called SIS trust, very painful but a whole different mindset from the Internet mindset and it's very useful to bring to something as important as this.
That is it. Thank you all, to the people who volunteered for this, and there is the site for that. I am going to look at my notes to make sure I didn't forget anything here. I do want to emphasise the level of cooperation that we have gotten, not only from Verisign from the department of commerce as well, some of this stuff we have to navigate some pretty serious bureaucratic waters to make happen and oddly enough everything is lining up and this is just really wonderful. You know the key ceremony stuff, a lot of that came from the SL folk at Verisign, very helpful, totally foreign world to me. And thank you again. Any questions?
Jim Reid: Thank you. No questions for Rick? OK. Just remind people when they come to the mikes state your names and affiliation, thanks.
AUDIENCE SPEAKER: Daniel Karrenberg, Internet citizen. Rick, I have a questions about discussions within ICANN about this whole thing because I don't follow the ICANN meetings and things like that.
RICK LAMB: Lucky you.
DANIEL: : Thank you. Is there any discussion going on before this is the first version is implemented, to make it even more fair and I use the even more fair in a euphemistic way, because your slides have it actually, you have to be able to travel to the US certain restrictions for some people in some parts of the world. Your sites are in the US, exclusively in the US. Is there any movement inside ICANN or any discussion to make this even more fair?
RICK LAMB: The requirement to have the sites in the S is a requirement that was handed to us from the department of commercial.
DANIEL: I am aware of that.
RICK LAMB: However, certainly we have heard this point before and there are discussions, I think we need to see how this first one goes, we have tried to make it as inclusive as possible, but, you know, I completely agree, it would be ?? personally agree that, it would make sense to distribute the sites as well, given some of the visa restrictions that the US has for some personnel and it's unfortunate, I mean, I have seen it, at the IETF, some of the difficulty, there is nothing we can do about the visa restrictions as long as these things are in the US.
DANIEL: So it's on the radar screen but at the moment it's on hold because the OC says it cannot be done.
RICK LAMB: On hold because we want to make this thing happen first.
DANIEL: That is fine. I am not criticising the way it's done right now, it's better to get something off the ground and started, but my question is rather is evolution in the cards?
RICK LAMB: Yes.
DANIEL: The answer I hear, we at ICANN with thinking about it but at the moment it is blocked by the US government, basically.
RICK LAMB: Blocked is a strong word.
DANIEL: I used the word.
RICK LAMB: You used the word. OK.
DANIEL: Thank you so much.
Jim reed: Would thereby some mechanism for people to make their views known about how provisional this draft scheme is or to make suggestions or recommendations enhancements such as considering those concerns about US centric based approach to managing these keys at the moment?
RICK LAMB: I mean, certainly we have that list, the root design, root sign at ?? what is it ?? root sign at ICANN.org. That is one way to make your interest known. Certainly through the generic ICANN process as well. We welcome all these comments, absolutely, there is ?? if there is enough interest in this, you know, in writing is better than just tapping me in the halls and telling me this because if we have a certain amount of momentum behind this, I think it will help a lot. I mean, just to ?? I am going to stick my neck out a little further here and probably get it cropped off, but what the hell, it's been a nice life; you know, there is ?? I mean, believe me or not, I mean there is no particular reason, other than contractual and just convenience reasons, as far as I understand, for these facilities to be in the US at this point. These units that the ?? I mean I am just to dispel any of of this weird black helicopter stuff, the units we are using to generate the keys and use the keys are made in England, they are these FIPS level 4 boxes used by a lot of commercial entities, anyway, I don't know if we are going down that but I was refreshed to see that the all based a lot of business sense more than anything else, but anyway.
Jim Reid: One more question.
AUDIENCE SPEAKER: From LACNIC N fact, Daniel went to the mike before me so he said more of the things that I wanted to say. But, just as a final comment, I think that I understand that some things has to be done in a given way in order to move forward as quickly but I think that is in some way we can compensate this US ?? with competition of the TCR group so I expect to see a more broad integration of that group and not only US based group.
RICK LAMB: I think you will. We have taken that into account, in fact.
Jim Reid: No more questions. In that case, thank you very much, Rick.
Next up is Richard Raszuk who is going to be talking about adding and not adding paths to routing tables.
ROBERT RASZUK: Welcome everyone. This talk is about BGP so much lighter topic than DNS, I think.
I present this had talk twice already, at NANOG and APRICOT but this presentation is a little bit changed and redone based on the input from audience and I think I made some simplifications, too, so I think even if you heard it already, it's nice to take a look at the options.
So what this basically the problem? The problem in the current networks, at least the demand from 9 the customers is fast convergence, not so much at the BGP level but fact connectivity restoration and this is driven pretty much like modern applications like voice, video application and streaming. In order to have fast connectivity restoration, you need two things: You need to have redundancy in the routing layer and fast trigger propagation of the failures. If you have both of those things together, you will achieve fast connectivity restoration in your network. So this talk is just about redundancy at the routing level. I go through a couple of scenarios of how to build the network which achieves the redundancy by design, not by extended flooding of BGP paths but by design, that is my first objective here, to show you what are the alternatives and this is applicable to both, the network switch are higher and flat. I go down a little bit more into details comparing the the networks and the reasons behind choice.
One of the proposals on the table at the IETF for example and which perhaps you already heard, is to use the BGP extension called add?path to distribute more BGP paths around. So just for your information I will take a look at that particular encoding and explain what is it really about.
To deploy another technique which allows to you send more than best path to your edges, we have came up with the idea called diverse path, and I will cover this, as well, to tell you how diverse path works, what are the differences between add?path and diverse path. But in general, my message is to, first, take a look at your own design of the network and see if the design itself may already be very helpful for you before you go down to flood BGP state end?to?end in an excessive manner.
At the end, I have some analogy to 3107 networks because some of you actually would like to used a paths just for the 3107 which is labelled BGP and last slide is topic regarding the aggregates, how it comes for free if you follow the hierarchy.
This talk is not about VPNs because there the topic of sending redundant paths doesn't really exist and the paths becoming unique because each, the only exception is when you have ASBR as redone soon that can only be addressed by changing let's say RD on the route reflector if it's really, really a problem unit work.
So let's take a look at typical ISP design of 1990s where the routing protocol of the was area 0 and areas around it. Some of the ISP networks are still in this model of design, some of them are flat, and I will try to cover both. But if you carefully look at this design you see that there is not much required, protocol?wise, to achieve what you really want, to have routing layer redundancy where needed, not everywhere but where needed and this is the key. One of the teachers which major routing vendors have shipped already is best external. Best external currently, it is one of the IETF drafts in IDR, allows to you advertise non?best path, basically external path to your overall best, and best external is actually divide into three different options: One is the you have a choice between your overall best which could be iBGP learned due to local preference so you then select the best path out of all of your external paths and send to your rest of iBGP peers. And another mode of best external is cluster based best external which actually works on the route reflectors and gets you the option to send the best path between your cluster domain, essentially your reflector clients, and other route reflectors connected over iBGP sessions. So in this model, the assumption is that you are using each best external because I think honestly that this is use envelope all cases of the network; and will go step by step to demonstrate how good does it give to you actually enable best external also on route reflectors. And as I mentioned before, you don't only need to have a routing redundancy, you have to have a way to tell that the event happened and for but even happened, the best machinery currently is IGP flooding, OSPF can really do good job if flooding bad events. Of course if you are assuming that BGP is fast enough for you, then perhaps this talk doesn't applicable because we are here talking about fast connectivity restorations in hundredths of milliseconds not in seconds. That is the goal and target of this talk.
So, if you look at the typical POP design, you have have exit points to the core so in this particular model, your core ABRs are route reflectors which are in the data path. I will ?? down the road tell you about the other option of your route reflectors being out of the data path just as the plain devices, but bear with me for a second.
Another thing is that some of your POPs may actually not have a local source of the routing table, of the IPv4, IPv6 prefixes, so in those cases, unless you really want to have end?to?end BGP path flooding, you have to have a way to keep the existing path, which was advertised, a little bit longer, so that is why having a static route to exit the POP to the core becomes very useful not to trigger invalidation of BGP too fast.
Now I have a couple of scenarios to discuss how network behaves when you have a default behaviour in scenario A, basically without best external on route reflectors and without the domain?wide propagation, as well as enabling particular features step by step. So, in this scenario A, you see that we have basically four POPs. POP 1, POP 2 and POP 3 are receiving the same prefix from three different peers or customers which are multihomed, and by default, we don't have best external enabled on route reflectors, so in all of our core reflectors you only have the best path P1 which is selected as overall best and on the connected ones because of the local best external on the ASBRs you have paths as well of your local POP. So, when P 1 which is assume to be overall best, goes away, BGP has to first of all withdraw the P1 and then reflectors with the redundant information will calculate another best path and propagate within your core and your POPs. So it takes times. The convergence, restoration it's a function of BGP convergence in this case and we don't want to have that long time to wait with the user traffic. So what are the options here: So let's go to the next step and just enable the signalling about the next POP down event.
In this set of slides, I just talk about the case when you have next hop unchanged on the edges, next case will be covered in the future, I am actually working on the solution fast propagation, currently. So this case assumes that I don't only lose ASBRs but also link to or peer ASBRs because if next hop unchanged I can easily react on those events. And as you can see, what it gives me in scenario B, it gives me much faster trigger to calculate BGP best path again, of course I can only calculate the BGP best path when I have redundant paths, so in this case I can calculate it on the top two RRs within POP 2 of course and on POP 3 RRs, as well as on the ASB R3s.
So if you look at this, it's all about improvement because you don't have to wait for BGP to tell to you send you the bad event; but you can actually, very quickly, trigger the local restoration when you have redundancy. So what is missing from this picture? You basically were missing additional paths. If I flip to scenario B and C you see the differences in the amount of paths which are kept on the route reflectors in all the POPs, because here, I do enable best external on the route reflectors which allow me to propagate BGP paths everywhere in the core. Now, this combined with the IGP fast flooding allows me to actually achieve as fast connectivity restoration as it's possible. Notice that all of my route reflectors can actually do PIC so prefixing convergence can be easily there, I already have pre installed paths in the FIB and I can switch over my traffic almost in tens of milliseconds to an alternative exit point. The same happens on the edges. The P 4, let's say, previously had P 1 path received, so because I had configured and here there is no other information, as far as alternative exit point, so it is very safe for P 4 to continue forwarding the traffic to its br exhibit points to the core because those ABRs would be heading back up information in place already the forwarding layer. That is why I did add it in one of my previous slides, this default route to prevent invalidation of next HOP when actually I don't have a choice to do any alternative point. All I can do is to drop it.
So, if you go a step further, the scenario C, it's also applicable to the C prime case of the flat IGP, because many networks, today, even this week I spoke to two operators in Sweden who demonstrated the network looks like this, had the core routers, IPv4 switching, in a full mesh and they are route reflectors in the data plain so all the edges are connected to do different route deflectors as the clients. If this is the case the situation is exactly the same as if you would have the hierarchical IGP design just because your forwarding can continue the same way. And last case of the scenario, is scenario when you have actually a DMZ so all of your external exhibit points go through single POP. In this case, you have two choices: You can split your external traffic to go different a different POPs or focus on making sure that the redundancy will be aachieved for this particular POP at the pop to core boundary and some networks because the trick called ghost look back which allows to you set name next on both ASBR or both RRs and attract traffic pretty much depending on the iBGP load balancing, but this is optional, not requirement. Here, you will still continue to send traffic to your RRs and RRs in the full mesh would know how to propagate it for the network. So in this case actually the connectivity restoration is very local because you only have exit points out of POP 3.
So, a bit of history now. Why networks got flattened? Pretty much one of the main reasons why networks got flat was the introduction of the MPLS. Being end?to?end you actually have less to worry about if you have a flat network. Plus, of of course, the propagation domain wide also it's easier for the flat network because if you have hierarchy you have to do leaking between the core and the POP of your next HOP down event. Another challenge for the hierarchical network is traffic engineering. It is still pretty much difficult to build an effective TSLD end?to?end but in each area quite nicely without any problems. Some networks didn't want to go flat and still wanted to deploy fancy application so is they used IP encapsulation which works equally well in the flat design or hierarchical design but in hierarchy sumisation. In you are mostly forced to have host routes to your BGP look backs and your LDP matching in an exact match to the IGP. Now, one little secret I want to till which I didn't know for many years, that in one of the implementations, I don't want to tell yet which one but you can know what my address is, you don't need to have a /22 you can have 24 and IGP route of 24 and it's an exact match and it still continues fine. So that is one of the things which is nice to keep in mind.
So, consequences of deploying actually the flat network and deploying the encapsulation par dime is that there is no need to do any IP look up in the core so that is another step why route reflectors became a contraplain devices because they were not needed any more to do any IP look ups. In this case, now offset of new challenges for ISP environment, for ISP design. It's very tricky to do really hot potato routing, if you really want to get your traffic out of your network the closest exit point your normal choice is to use IGP metric to next HOP which is as you know the step number nine in the but if you do that on the route reflector which is on the stick somewhere in the middle of the network it is guaranteed that the choice reflector makes will not be optimal choice for all of the clients. So that is something to keep in mind about this particular topic.
And it's very difficult in this design to do the virtualisation because you have no places physically in the core which can do the IP lookup on which you can visualise BGP paths and that is I think the key message here, that you have in general, in the design of BGP in your AS two options: Either you visualise the paths in the core or you send all the paths everywhere. So let's take a look at how it looks. On the left?hand side, you see the scenario for model C when you have deployed the network as I present it before. On the right side, you have the same analogy where you actually sending all the paths everywhere, so it is not so much of the BGP behaviour which has to be really carefully watched; it's about increase of BGP state everywhere in the network, including the core routers and all the edges. That is what really, it's something to consider.
Now, the same analogy works also with the full mesh ?? with the flat model. You see amount of BGP state required to achieve the same functionality, if you have core routers in the data plane acting as RRs or if you have flat network with end?to?end tunnelling enabled. Some networks use tunnelling just for the applications of tunneling, for example if you have to use tunneling for layer PVNs for 2, VPLS you can do that, fine, but you don't necessarily need to do the same transport for IPv4 or IPv6, so these are significant different applications which you can use different forwarding for. It's quite straightforward to allocate a different next HOP and do not establish end?to?end LSPs for those next HOPs.
What I presented on the previous slide is half of the paths because each path, each will be coming from two reflectors so it's not like I would only have one path for an exit point but I would have actually two, just the normal nature of BGP distribution of paths. And now, another question is, how many paths is good enough? Should I actually enable add?paths in the mode of sending all paths everywhere; should I basically select five or maybe two or three? So these are kind of open questions and I know that currently there are some nice PhD students working on the thesis to answer those questions. In our conversations, in our understanding of the current deployment, we think that two, maybe three, paths, it's all what you need to actually satisfy the fast connectivity restoration and satisfy the load balancing requirements. One thing which I forgot to mention in the previous diagram, in this case of the model C, you actually, because you have all the paths in all route reflectors, and route reflectors in the core are fully matched you don't have a problem with ? any more, that is another usually good suspect to look at sending more than best path around the network.
So, just for you who are not familiar with add?paths, what the add?path is, what is he talking about? It is just a different encoding on the wire of BGP update message. In general, BGP used to, why add?paths took sometime to ?? well still in progress to actually deploy. First of all it, requires full upgrade of your network, so be prepared to upgrade not only the core reflector but also all the edge, because otherwise, it will not be understood by the clients. And there is just a new encoding on the wire. You are adding not only ?? you are not sending the network but also the path. So from implementation point of view of BGP, implementations were designed to keep state on a per network basis, per prefix basis to what was sent to given peer. Now, they have to do a magic or actually 12 more bytes per each path to keep the reference of which path was sent to which peer. That is the fundamental change in the way BGP code has been designed since day one and that is why it took so much time and it's still taking time to develop. It's into the rocket science but it's significant change. The same has has been done as far as encoding for add?paths for 3107, for labour he will BGP address family. So when we talked about customers because this topic is as old as the fast convergence topic so it started around 2001, 2002, we started to talk to customers and one of the easiest way to provide them if they really need to distribute more paths between the edges, to allow them to build a mechanism which does it without requiring network?wide upgrade, and one of the methods to do it is diverse path. I currently have a draft in Working Group in IETF for diverse path, but we are going to ship it in Cisco in two stages: One will be different box, basically you are deploying different RR, like presented on this picture, and tell it, configure ?? calculate second best or third best and advertise to your clients. So there is no free lunch, how does it work? Basically, each client now has to establish additional iBGP session to the shadow box. That is the price you pay for deploying this technology. But I think coming up with one configuration of additional iBGP session is not a big problem. So this is one model of operation. In this model, you actually also have to keep in mind one important thing: The iBGP metric, because you have again two choices. One choice is if you really wanted to use hot potato routing in your network you have to match the IGP view of the network of the best path RR and the shadow RR so they both consistently do know what is the overall best and what is second best. The alternative approach is to disable on both and shadow RR and that will also be provided as a configuration knob, especially this knob it's kind of useful, one of the vendors' implementation use the common practice to have a default route in the table very used for resolution of BGP routes and that effectively achieves what ignoring iBGP metric does.
Another model of operation of diverse bath is to use the same route reflector as today, except that you would have to basically upgrade the same route reflector to add new code and from the client's point of view, it's still no change; they have to enable another iBGP session to a different look back address. That is all clients have to do. On the route reflector in this design 2 model, on the other hand, there is additional work required for us, I mean for vendors, to support multiple sessions from a given peer. So, that support will be added in the second phase and by just upgrading the code of the existing route reflector you can basically specify to send second path, third path, all send second best and second ?? your choice.
Just to add, the diverse path it's also useful if you would like to actually, instead of fully meshing your core or your POP, to build the sort of hierarchy for a given area. So you can design the network instead of meshing 50 routers, let's say, in the core, you can connect them to first best path RR, second path RR and the third path RR here respectively, red, yellow and green, and all of those planes of RRs being even on the same box, can effectively send you all three BGP paths between the route reflectors.
So, knew little bit of analogy, how does this talk correspond to the scaling of large MPLS networks? Some of the customers came to us that ?? and why, actually, I like hierarchy, why it seems to be the solution not only for IPv4 but also for MPLS so, some of the providers came to us and said we are growing, we need more and more PEs and number of 10,000, 30,000, has been actually on the table. So if you are talking about such a scale, putting them in flat network, I think it's a little bit of a challenge. So the solution for that is to actually introduce hierarchy and introduce hierarchy in the network means that you basically moving back from the flat area to the hierarchical area?based network. And the basic solution is, still run your LDP within each area, including core, but instead, adding another label of hierarchy on top of that. So you basically will have the 3107 iBGP between ABRs which would act as RRs in the data plane and your areas around. In this particular case, in the data plane, it is MPLS so it is LFIB, not FIB, but it doesn't really matter. In the analogy to the IPv4 classic ISP model, it's very similar and as you see from this comparison, it is almost the same. This was just a slide I made to present what is the option to actually use the same technique as previously described for IPv4 Internet routing for the 3107 networks. So, you can visualise your 3107 advertisements, instead of sending all BGP paths with labels, edge to edge, everywhere, you can provide the same level of hierarchy on the boundary between the areas and the core to achieve end?to?end simplification of number of state kept in the network.
So, what are the conclusions, so far? I think that before going full speed in deploying whatever scheme of multiple path distribution in BGP you would like to, you should take back and see if you can visualise routing in the network you have currently. Both in the hierarchy network and in the flat network. Only if that answer is no for some reason, for example you really, really want to use end?to?end traffic engineering and label switch everything in the core, then you can take a look at what technique to use to distribute modern bath path. As I mentioned your choices currently are add?paths or diverse path distribution.
One additional feature, I would say, of the hierarchy model in BGP, that when actually the IPv4 address exhaustion starts and when people actually will be selling chunks of less than /24s between themselves, you can easily see that the number of routes can grow, and some people expect millions of routes to be appearing, not to mention the v6 also will grow so it is feasible that some of your edge routers may not keep up in the FIBs with all of the routing information so, one option is to go to a vendor and buy new router. Of course, I like it. But the second option is to inject to a FIB a default route or set of routes, chunks of the address space, and let the guys at the POP two core boundary do the actually IP lookup and then forwarding and that is one of the work also in the growth Working Group with Paul Francis and Lee is a called aggregation. The point I am bringing here is that if you use one model which I presented in option C, this one basically comes for free. You don't have to do anything else and you can still fully utilise this particular option to scale your edges.
So, the final conclusion: I think that it's not about one option of flooding BGP paths is better than the other. I think that flooding all BGP paths edge to edge, network?wide, it's something that you should think twice about before doing. For example, on a different side, if I look at the IX model, route servers can very well used a paths if customers would like to get all the paths and do the local policy directly. Otherwise, today common practice is that customers of the IX gives IX operator a set of requirements and then the policy has to be executed on the route server on a per client basis, but that is a model which works fine. Maybe it should work the same way. But if some IX clients want to have R routes I think the straightforward way is to used a paths and send them all they need F they need second or third you can use also diverse path in this scenario, enabling route server to select second and third best BGP path for a given and send to a given IX client.
And the slide of acknowledgment, a lot of people who contributed to this technology, of course even the list is not complete but this is just for references, it's not just myself taking the credit. And this is nothing ?? this is fully standardisation work in the IETF, sometimes with customers sometimes with research communities, sometimes with other vendors. And you can basically follow up on those slides if you like to have more details regarding each of the particular technology. If you would like to ask me, I am here today until 6:00, I think, you can ask me on?line, off line, or now, and the URL is the place which I will keep any updates or any white papers or configuration guides for you to take a look at. Thank you very much.
Jim Reid: Thank you very much. We still have time for any questions. Has anyone got any comments? OK then. Well I guess that is us done. Time for the coffee break or the beer break, as Ondrej would have us believe.